Structural Biology and Crystallization Communications Expression, Purification, Crystallization and Preliminary Crystallographic Analysis of a Putative Clostridium Difficile Surface Protein Cwp19

Cwp19 is a putatively surface-located protein from Clostridium difficile. A recombinant N-terminal protein (residues 27–401) lacking the signal peptide and the C-terminal cell-wall-binding repeats (PFam04122) was crystallized using the sitting-drop vapour-diffusion method and diffracted to 2 A ˚ resolution. The crystal appeared to belong to the primitive monoclinic space group P2 1 , with unit-cell parameters a = 109.1, b = 61.2, c = 109.2 A ˚ , = 111.85 , and is estimated to contain two molecules of Cwp19 per asymmetric unit.


Introduction
Clostridium difficile is a Gram-positive spore-forming primarily nosocomial pathogen that is the aetiological agent in antibioticassociated diarrhoea and pseudomembranous colitis (Bartlett, 2010). Changes in epidemiology and disease severity, particularly in strains that have emerged over the last ten years, e.g. the 027 ribotype, highlight the need to understand more about this worldwide pathogen (Freeman et al., 2010).
The elucidation of structural information for C. difficile proteins has understandably been directed towards the main virulence factors, the toxins (Albesa-Jové et al., 2010;Ho et al., 2005;Pruitt et al., 2009Pruitt et al., , 2010Sundriyal et al., 2009). Despite adherence and subsequent colonization by C. difficile representing key milestones in infection, there are considerable gaps in the understanding of how the surface proteins of C. difficile interact with both themselves and the environment to mediate these key steps. To date, there is only one report of high-resolution structural information for a C. difficile surface protein: the low-molecular-weight subunit of the S-layer (PDB entry 3cvz; Fagan et al., 2009).
The C. difficile S-layer is derived from post-translational cleavage of SlpA into low-molecular-weight and high-molecular-weight subunits (LMW SLP and HMW SLP, respectively). HMW SLP contains three PFam04122 repeats which putatively mediate attachment to the bacterial cell surface (cell-wall-binding domains; CWBDs). A total of 28 other proteins in the C. difficile 630 genome have been found to contain these CWBDs at the N-terminus or the C-terminus, with a 'functional domain' at the other terminus (Sebaihia et al., 2006). Recently, Dang et al. (2010) identified one such CWBD-containing protein, Cwp19 (CD2767; C. difficile 630 genome numbering; Fagan et al., 2011;Sebaihia et al., 2006), during a pull-down assay of ABPlabelled Cwp84. Cwp19 has an N-terminal DUF187 domain (together with three C-terminal CWBDs) which belongs to a glycosyl hydrolase clan of enzymes that possess a TIM barrel (a conserved protein fold consisting of eight -helices and eight parallel -strands that alternate along the peptide backbone, as originally identified in the conserved glycolytic enzyme triosephosphate isomerase). Other members include -amylases and cellulases.
To understand the molecular structure of this protein, the N-terminal domain of Cwp19, lacking the CWBDs, has been expressed, purified and crystallized for structural studies.

Expression and purification
The cwp19 construct was transformed into Escherichia coli BL21 (DE3) Star (Invitrogen). A single colony was used to inoculate 50 ml Terrific Broth (TB) medium (Sigma) with 50 mg ml À1 kanamycin supplemented with 0.5% glucose and grown overnight at 303 K. The starter culture was then inoculated into 950 ml of the aforementioned supplemented TB medium and grown until the OD reached $0.6. Cultures were then cooled to 289 K, induced with 1 mM IPTG and grown for a further 16 h before harvesting by centrifugation. Cell pellets were either used directly or frozen at 253 K.
The cell pellet was thawed on ice, resuspended in immobilized metal-affinity chromatography (IMAC) binding/wash buffer (50 mM Tris, 0.5 M NaCl, 20 mM imidazole pH 8.0), sonicated and centrifuged to remove cell debris. IMAC was performed on an Ä KTA design FPLC (GE Healthcare) using a HisTrap HP (GE Healthcare) column equilibrated with binding/wash buffer. Elution was performed using an imidazole gradient (elution buffer: 50 mM Tris, 0.5 M NaCl, 0.5 M imidazole pH 8.0). Early elution peak fractions were dialysed into 50 mM Tris, 150 mM NaCl pH 8.0, 0.2 mm filtered and then concentrated in a Vivaspin-20 10k MWCO spin concentrator to approximately 167 mg ml À1 (as measured by the Bradford assay using 1 mg ml À1 BSA as the standard). Purity was assessed by SDS-PAGE and anti-His 6 Western blot.

X-ray data collection and processing
A total of 250 images were recorded from a single crystal of rCwp19 27-401 using a Quantum-4 CCD detector (ADSC Systems, California, USA) with an oscillation angle of 1.0 per image, a crystalto-detector distance of 300 mm and an exposure time of 3 s per image at 100 K (no cryoprotectant was used) on the PX beamline I04 at the Diamond Light Source (Didcot, Oxon, England). The diffraction data were processed using the iMOSFLM X-ray data-processing package (Battye et al., 2011) and were scaled using SCALA (part of the CCP4 program suite; Winn et al., 2011). Data-collection and processing statistics are listed in Table 1. Molecular-replacement trials were attempted using the PHENIX suite of crystallography programs (Adams et al., 2010).  Table 1 Statistics for the processing of X-ray data from the rCwp19 27-401 crystal in various possible space groups using iMOSFLM.
physiology and pathogenesis of C. difficile has therefore only started to be understood and requires further work. To obtain pure rCwp19 it was necessary to express only the N-terminal functional domain, residues 27-401 (minus the predicted signal peptide, residues 1-26), containing the predicted glycosidase catalytic core. The full-length protein (including the CWBDs but also lacking the signal peptide) exhibited extensive truncation/degradation and purification issues. IMAC purification yielded a pure (>90%) 47 kDa species in one step, particularly early in the elution peak (Fig. 1). rCwp19 27-401 had a tendency to dimerize when purified or dialysed in phosphate buffers. However, we could concentrate the protein to a final concentration of 167 mg ml À1 .

Space-group ambiguity
The X-ray diffraction data for the crystal of rCwp19 27-401 were analyzed by processing the data in all suggested space groups using the iMOSFLM software suite (Battye et al., 2011). The data were processed in centred orthorhombic, centred and primitive monoclinic and primitive triclinic space groups. The final data-processing statistics for all of these possible space groups are given in Table 1. POINTLESS (Winn et al., 2011) suggested the primitive monoclinic system as a possible space group for the rCwp19 27-401 crystal; however, we also analysed the data for the presence of pseudotranslational symmetry (Adams et al., 2010;Winn et al., 2011;Vagin & Teplyakov, 1997;Vaguine et al., 1999) and complete/partial merohedral twinning (Padilla & Yeates, 2003;French & Wilson, 1978;Adams et al., 2010;Winn et al., 2011). These analyses were performed for data processed in centred orthorhombic, primitive monoclinic and primitive triclinic space groups using TRUNCATE (Winn et al., 2011;French & Wilson, 1978), phenix.xtriage (Adams et al., 2010), the L-test (Adams et al., 2010;Padilla & Yeates, 2003) and the H-test (Lebedev et al., 2006). Patterson maps were calculated using MOLREP (Vagin & Teplyakov, 2010) and POLARRFN from the CCP4 package (Winn et al., 2011).
3.2.1. Twinning analysis. TRUNCATE analysis showed normalized structure amplitudes hEi of 0.928 and 0.889 for the centred orthorhombic and primitive monoclinic space groups, respectively. The expected value for an untwinned data set is 0.886 and that for a perfectly twinned data set is 0.94. Thus, TRUNCATE indicated the presence of partial twinning in the centred orthorhombic space group with a twin fraction of 0.218. Twinning was not detected by TRUN-CATE in the primitive monoclinic space group.
The L-test analysis (    orthorhombic system. For untwinned data and where pseudosymmetry may be absent, the Z score is expected to be <3.5; this is not the case for the primitive monoclinic space group. The mean |L| values were 0.334 and 0.432 for the centred orthorhombic and primitive monoclinic systems, respectively. For a perfectly twinned case this value should be 0.375 and for an untwinned data set the value should be 0.500. In the present case, the value for the primitive monoclinic space group is closer to that for untwinned data. A similar L-test analysis for the primitive triclinic system resulted in a mean |L| value of 0.442 and a multivariate Z score of 3.593. The H-test (Lebedev et al., 2006) analysis gave a twin fraction of 0.022 for both the primitive monoclinic and primitive triclinic space groups. In the case of untwinned data the expected mean |H| value should be 0.50; values of 0.482 and 0.499 were found for the primitive monoclinic and primitive triclinic space groups, respectively. The H-test was not performed for the centred orthorhombic system as there are no twin laws available for this space group.
The various twinning tests may appear to have erratic or high twinfraction results because the data do not scale well in centred space groups (C2 or C222; Table 1). However, twinning may be absent in the primitive monoclinic space group.
3.2.2. Pseudotranslational symmetry analysis. The presence of noncrystallographic symmetry (NCS) was tested for using MOLREP (Vagin & Teplyakov, 2010) and phenix.xtriage (Adams et al., 2010). Both indicated the presence of pseudotranslational NCS in the centred orthorhombic and primitive monoclinic space groups. A strong off-origin peak was found in all these space groups. In the primitive monoclinic and primitive triclinic systems the strength of L-test analysis for space groups C222/C222 1 (a) and P2/P2 1 (b). Curved line, perfect twin; straight line, untwinned; blue line with marks, observed data.

Figure 5
Self-rotation Patterson maps for space group C222 as calculated by (a) MOLREP and (b) POLARRFN ( = 90 ). the off-origin peak was 50% of the origin peak, whereas in the centred orthorhombic space group it was only 23%. The corresponding p-values (calculated using phenix.xtriage) are 0.00520, 6.8 Â 10 À5 and 7.2 Â 10 À5 for the centred orthorhombic, primitive monoclinic and primitive triclinic systems, respectively (a p-value of <0.05 indicates the presence of pseudotranslational NCS). A selfrotation function was also calculated in the centred orthorhombic (Figs. 5a and 5b), primitive monoclinic (Figs. 6a and 6b) and primitive triclinic (Fig. 7) space groups using MOLREP and POLARRFN (Winn et al., 2011).
3.2.3. Data-processing statistics and point-group analysis. The X-ray data-processing statistics indicated that the centred orthorhombic space group had an overall hI/(I)i of 3.6 and an overall merging R of 0.50, compared with the primitive monoclinic space group which had an overall hI/(I)i of 6.3 and an overall merging R of 0.135. The corresponding values for the centred monoclinic space group were 2.8 and 0.489 for the overall hI/(I)i and overall merging R, respectively. For the primitive triclinic system these values were 5.4 and 0.100 for the overall hI/(I)i and overall merging R, respectively. Similarly, the overall R p.i.m. (Evans, 2006;Leslie, 1992) values were also high for the centred orthorhombic and centred monoclinic space groups compared with the primitive monoclinic and primitive triclinic systems (Table 1).
Analysis of systematic absences (Adams et al., 2010) confirmed the presence of a twofold 2 1 screw axis in both the centred orthorhombic and primitive monoclinic space groups. There were three and two violations with hI/(I)i > 3.0 for the centred orthorhombic space groups C222 and C222 1 , respectively, whereas for the primitive monoclinic space groups P2 and P2 1 there were zero and four violations with hI/(I)i > 3.0, respectively. However, the likelihoods for the centred orthorhombic and primitive monoclinic space groups are 7 and 1.7, respectively (as calculated using phenix.xtriage; Adams et al., 2010).
A point-group test performed by phenix.xtriage (Adams et al., 2010) suggested the reprocessing of data that were processed previously in the centred orthorhombic space group, which could have resulted as a consequence of over-merging of pseudo-symmetry and/or twinned data, i.e. this is possibly not the correct space group. A similar point-group test was carried out for data processed in the primitive monoclinic space group, which suggested this could be the correct space group, with unit-cell parameters a   Self-rotation Patterson maps for space group P1 as calculated by MOLREP.
system also suggested a primitive monoclinic space group with identical unit-cell parameters and a likelihood score of 3.0.
Based on the various analyses performed, the data-processing statistics and suggestions from POINTLESS (Winn et al., 2011) and phenix.xtriage (Adams et al., 2010), we conclude that the crystal of rCwp19 27-401 could belong to a primitive monoclinic space group. In addition, phenix.xtriage analysis of data processed in the primitive monoclinic space group detected the presence of pseudo-translational noncrystallographic symmetry (which could be the reason for the elevated intensity ratios observed) and twinning could be present. Hence, twin laws are applicable to this crystal symmetry and this could be the reason for the departure of the intensity statistics from normality.