Crystal structure of the type IV secretion system component CagX from Helicobacter pylori

The structure of the C-terminal domain of CagX is presented and structural comparisons with TraO, its homologue from another bacterial T4S system, reveal distinct and conserved features.


Introduction
Helicobacter pylori is a Gram-negative bacterial pathogen which lives in the stomachs of more than half of the population of the world and causes chronic gastritis, peptic ulcers, gastric adenocarcinoma and gastric lymphoma (Blaser, 1997;Blaser & Atherton, 2004;Covacci et al., 1999;Parsonnet et al., 1994). The cytotoxin-associated gene pathogenicity island (cagPAI) contains a 40 kb foreign DNA region which is considered to be the main genetic determinant of more virulent H. pylori strains (Backert et al., 2002;Blaser & Atherton, 2004). The cagPAI carries 27 genes which encode several effector proteins and one type IV secretion (T4S) system. T4S systems are important nanomachines in Gram-negative bacteria which play important roles in various biological processes from the transfer of virulence factors into eukaryotic cells to the conjugative delivery of genetic material and the uptake or release of DNA (Cascales & Christie, 2003). The T4S system of H. pylori translocates the major effector protein CagA into gastric epithelial cells. Subsequent phosphorylation of intracellular CagA leads to epithelial cell elongation and disruption of tight junctions, which results in the pathogenicity of H. pylori (Backert et al., 2002;Blaser & Atherton, 2004;Odenbreit et al., 2000;Segal et al., 1997;Backert & Selbach, 2008). The T4S system has been visualized at the surface between H. pylori and gastric epithelial cells, forming a needle-like structure crossing the inner and outer membranes (Zanotti & Cendron, 2014). It has been proposed that the transmembrane core complex of the H. pylori cagPAI T4S system is formed by CagY, CagT and CagX, while the external pilus is thought to be composed of a large number of copies of CagC and CagL (Fischer, 2011;Terradot & Waksman, 2011).
As an essential component of the cagPAI T4S system core complex, CagX plays a critical role in CagA translocation into the host cell. Genetic and functional studies have indicated that the ability of H. pylori to translocate CagA into gastric cells is abrogated by inactivation of CagX (hp0528; Akopyants et al., 1998;Censini et al., 1996;Li et al., 1999;Fischer et al., 2001). The C-terminal domain of CagX has been shown to be responsible for its interaction with CagT by co-immunoprecipitation, MBP pull-down and yeast two-hybrid assays (Gopal et al., 2015).
CagX is presumed to be structurally and functionally homologous to VirB9 from the well studied VirB/D4 T4S of the plant pathogen Agrobacterium tumefaciens, although the two proteins have low sequence identity (Bayliss et al., 2007;Christie et al., 2005;Censini et al., 1996). While VirB9 forms a heterotrimer with VirB7 and VirB10 in the VirB/D4 T4S system, it is proposed that CagX associates not only with CagT and CagY, but also with CagM and Cag, to form a transmembrane core complex in the cagPAI T4S system (Kutter et al., 2008;Pinto-Santini & Salama, 2009).
Here, we report the expression, purification, crystallization, structure determination and analysis of the C-terminal domain of the CagX protein (CagXct). This important component of the cagPAI T4S system folds into a -sandwich domain containing nine -strands. It is the first structure of a component of the transmembrane core complex of the cagPAI T4S system to be determined, and is only the second threedimensional structure of a VirB9 homologue. The crystal structure was determined by the molecular-replacement method and refined at a resolution of 1.4 Å . Comparison of the CagXct structure with that of the C-terminal domain of another VirB9 homologue, TraO, which is a part of the outer     membrane complex encoded by the Escherichia coli conjugative plasmid pKM101, suggests the possible conservation of some protein-protein interactions between the pKM101 T4S system and the cagPAI T4S system.

Cloning of expression constructs and protein expression
The gene encoding the soluble fragment of Hp0528 (amino acids 396-498; CagXct) was amplified from Hp26695 genomic DNA using the primer pair FCagXct/NCagXct and subcloned into pET-32a vector via EcoRI and XhoI (Table 1). The constructed pET-32a-CagXct was transformed into E. coli BL21 (DE3) cells. Expression of CagXct was performed in LB medium containing 100 mg ml À1 ampicillin and the culture was incubated at 37 C and 220 rev min À1 until an OD 600 of 0.6-0.8 was reached. 0.3 mM isopropyl -d-1-thiogalactopyranoside (IPTG) was added to induce the expression of recombinant CagXct and the culture was left to shake for 12 h at 16 C and 180 rev min À1 . The expression levels of CagXct were monitored by SDS-PAGE (Fig. 1).
The purified CagXct was concentrated and crystallized using the sitting-drop vapour-diffusion method in 96-well Intelli-Plates (Art Robbins Instruments). The crystallization trials were set up with an ARI robot (Art Robbins Instruments) using the following screens: Crystal Screen and Crystal Screen 2 (Hampton Research) and The PEGs, JCSG and Classics Suites (Qiagen). The drops had a total volume of 1 ml and consisted of a 1:1 ratio of protein solution to precipitant. The best crystals of CagXct grew from protein solution concentrated to 70 mg ml À1 with a precipitant consisting of 27% polyethylene glycol 4000, 100 mM HEPES pH 7.5, 10% 2-propanol.

Data collection and processing
Data were collected at 100 K on the BL17U1 beamline of the Shanghai Synchrotron Radiation Facility (SSRF) at a wavelength of 0.9792 Å . The data were initially processed with HKL-2000 (Otwinowski & Minor, 1997) and the CCP4 suite  in the orthorhombic space group P2 1 2 1 2 1 , with unit-cell parameters a = 33.1, b = 61.6, c = 48.4 Å , = = = 90 . Owing to problems in molecular repacement (MR) and refinement in this space group, the data were later reprocessed as triclinic using DIALS (Waterman et al., 2016) in the xia2 pipeline (Winter et al., 2013). The overall DIALS R merge value decreased from 0.106 in P2 1 2 1 2 1 to 0.050 in P1.
After the correct space group had been established by the downstream refinement as monoclinic P2 1 with unique b = 61.6 Å , = 90.23 , the data were reprocessed to 1.4 Å resolution in this space group with XDS (Kabsch, 2010) in the xia2 pipeline (Table 2).

Structure solution and refinement
A promising MR solution was originally found in P2 1 2 1 2 1 with the MrBUMP/Phaser pipeline (Keegan & Winn, 2008;McCoy et al., 2007) using the structure of chain B (TraO) of the E. coli pKM101 plasmid-encoded outer membrane complex as a search model (Chandran et al., 2009; PDB entry 3jqo; 19% sequence identity). The model, which contained one CagXct molecule per asymmetric unit (41% solvent), was refined using REFMAC5 (Murshudov et al., 2011;Winn et al., 2001) and rebuilt using Coot (Emsley et al., 2010). Missing protein side chains were clearly visible in the electron density; however, the rebuilt structure could not be refined to an R free value of below 0.38. An inspection of systematic absences proved to be inconclusive regarding the screw/rotation character of the crystal axes, therefore MoRDa (Vagin & Lebedev, 2015) was used to conduct MR and refinement in every possible orthorhombic space group. Surprisingly, high-scoring solutions were found in several space groups; however, the P2 1 2 1 2 1 solution appeared to be the best since the MoRDa solutions in other space groups had higher R factors.
Intensity  CagXct crystal belonged to a lower symmetry space group and was twinned. A distinct crystallographic dyad could not be determined by the processing statistics, since the merging R factors were close to 10% for each crystal axis.
To establish the true space group, the data were reprocessed in P1. The MR solution (four protein monomers) in this space group was found using MoRDa with PDB entry 3jqo chain B as the model. The solution had a MoRDa Q-factor of 0.71, a probability of correct solution of 99% and refined to an R free of 0.40 without any model rebuilding, with most of the aminoacid side chains absent. The CCP4 program Zanuda (Lebedev & Isupov, 2014) was applied to the partially refined P1 model, which had an R free of 0.32. The true space group, in which the model refined to the same R values, was monoclinic P2 1 , with unique b = 61.6 Å , = 90.23 . The R free values were 0.38 or higher in all other monoclinic and orthorhombic space groups. SFCHECK (Vaguine et al., 1999) analysis of data reprocessed in P2 1 clearly identified a nonmerohedral twinning operation (Àh, Àk, l) with an obliquity of 0.23 and a twinning fraction of 0.46. Twin isotropic B-factor refinement of the CagXct structure was performed with REFMAC. The atomic coordinates and structure factors for the CagXct structure have been deposited in the Protein Data Bank with accession code 5h3v.

Cloning, overexpression, purification and crystallization
The soluble fragment (residues 396-498) of CagX was successfully cloned and expressed in E. coli strain BL21 (DE3) with a His tag and a Trx tag. The tags were removed by limited proteolysis of CagXct bound to a nickel immobilized metal ion-affinity column using TEV protease. The protein was further purified by a second run of nickel immobilized metal ion-affinity chromatography and size-exclusion chromato-graphy. CagXct was concentrated to 70 mg ml À1 and crystallized by the vapour-diffusion method from PEG and 2-propanol.

Quality of the model
The CagXct crystal structure was solved by MR and subjected to isotropic B-factor twin refinement in REFMAC5 at 1.4 Å resolution after the true space group had been established as monoclinic P2 1 with a angle close to 90 , with the crystal forming a nonmerohedral twin. The quality of the electron-density maps was mostly acceptable, although some ripples were observed in the F o À F c map calculated using the detwinned data (Fig. 3). These ripples are likely to be caused by crystal twinning since similar electron-density features have been reported for other twinned and order-disorder crystal structures (Lebedev et al., 2006;Rye et al., 2007). The CagXct model was refined to an R factor of 0.200 and an R free of 0.249 in space group P2 1 (Table 3). It contains two protein monomers with all residues built into the electron density, one PEG and two 2-propanol molecules and 129 waters. Many residue side chains were modelled with alternative conformations. The CagXct model contains no residues in the cis-conformation. Asp481 in both monomers of CagXct is a Ramachandran plot outlier, while Gly258 in the equivalent position in the TraO structure has similar main-chain torsion angles. The two monomers of CagXct comprising the asymmetric unit can be superimposed with an r.m.s.d. of 0.38 Å over all 103 C atoms.

Overall structure
CagXct folds into a -sandwich domain formed by two antiparallel -sheets containing nine -strands (Fig. 4).
-Sheet 2 contains strands 2, 3, 6 and 7 and has topology 1, 2x, À1. Electron-density maps around -sheet 2 of CagXct monomer A. The 2F o À F c map (blue) is contoured at 1.4 and the F o À F c map is contoured at 3.0 (green) and À3.0 (red). The difference density ripples (red and green) are likely to be owing to crystal twinning. This figure was prepared using PyMOL (Schrö dinger). The two independent CagXct monomers do not form any oligomer between themselves or with their crystal symmetry mates, which is in line with a monomer being the main species in solution, as can be seen from the size-exclusion chromatography trace (Fig. 2a).

Comparison of CagXct with TraO
A DALI search (Holm & Rosenströ m, 2010) reveals that the most similar structure to CagXct is the C-terminal domain of the VirB9 homologue TraO (TraOct) from the crystal structure of the pKM101 plasmid-encoded T4S outer membrane complex (Chandran et al., 2009; PDB entry 3jqo; chain B was used as an MR model with a sequence identity of 19%). This $0.6 MDa complex represents a 14-fold rotational symmetry ring spanning the outer membrane which is formed by the pKM101 T4S system proteins TraOct, TraN and TraFct. An NMR structure is also available for TraO in complex with another component of the pKM101 T4S system, the VirB7like TraN (PDB entry 3jqo; Chandran et al., 2009).
CagXct and TraOct are remarkably similar, despite their low sequence identity (Figs. 5 and 6). The region 177-270 of the TraO monomer (PDB entry 3jqo; chain B) aligns with the CagXct monomer with an r.m.s.d. of 1.4 Å over 95% of the C atoms. Such a high level of structural similarity may explain the success of MR structure solution using the TraO model.
To maintain structural similarity, many buried protein residues which form the protein core of TraO are conserved or substituted by residues with similar properties in CagX. Remarkably, most of these conserved residues retain the same side-chain conformation (Fig. 6). Generally, it would appear that the conservation of the overall fold/shape of the VirB9like domain is more important for the function of this member of the T4S system than the preservation of specific individual residues that may be involved in interactions with other proteins forming the outer membrane complex of the T4S system.

Conservation of protein-protein interactions in T4S
The structure of the outer membrane complex of the T4S plasmid pKM101 (Chandran et al., 2009) provides insight into the general architecture of bacterial T4S systems. A TraOct domain in this ring structure interacts with two neighbouring TraO monomers, two TraF monomers and a single TraN monomer. Since relatively little is known about the interactions of CagX with its partners in the H. pylori T4S system, it is interesting to map the sequence/structure conservation features between TraO and CagXct onto these monomermonomer interactions.
TraN is a long peptide which winds around TraO. It adds an additional -strand to -sheet 1 of TraO, which is observed in both the binary complex TraO-TraN and in the heterotrimeric outer membrane complex. Most amino-acid residues of strand 1, which runs antiparallel to TraN in TraO, are not conserved in CagX; however, the main chains of the matching residues of Anmino-acid sequence alignment of CagXct and TraOct. The secondarystructure elements are indicated above and below the alignment, respectively, as -strands and -helices (3 10 -helices). Conserved residues are shown in red boxes; matching amino acids with similar properties are shown in blue boxes. The secondary-structure assignments were carried out and the figure was produced using ESPript3 (Robert & Gouet, 2014). both these T4S proteins have the same conformation and solvent accessibility. Thus, it appears likely that the -sheet interaction between TraO and TraN is reproduced in the Cag-CagT interface.
Interactions between different TraO monomers in the 14-fold ring of the outer membrane complex of T4S plasmid pKM101 are not extensive and the residues involved in these interactions do not appear to be conserved in CagX.
Interactions between TraO and the two monomers of TraF in this complex are more extensive; however, there is little conservation of amino-acid residues in this interface. Interestingly, one of the 2-propanol molecules binds to the main-chain N atom of the Ramachandran plot outlier Asp481 in monomer A in the CagXct structure. The main-chain N atom of the equivalent Gly258 in TraO forms a hydrogen bond to the main-chain O atom of Gly364 in TraF, with the positions of Gly364 and the 2-propanol ligand O atoms overlapping. This may suggest the preservation of another main-chain interaction in the CagX-CagY interface.
The sequence pattern 'xVxVxN' (where x represents a binding residue) is shared by both structures in the 9 strand in TraO and CagXct, suggesting that the conserved nonbinding residues in this pattern may play a role in maintaining the proper distance with the 1 strand in a spatial configuration to ensure binding to a protein partner (Fig. 7). However, the binding residues are not conserved in 9; Leu484, Thr486 and Ile488 in CagXct correspond to Val261, Gly263 and Arg264 in TraO (Figs. 5 and 7). The diversity in binding residues should be determined by their different interacting protein substrates.
CagXct is only the second VirB9 homologue for which a three-dimensional structure has been elucidated. Its structure will provide additional insight into binding and translocation mechanisms of the transmembrane core complex in H. pylori and other T4S systems.

Figure 7
Comparison of the protein-binding region of the 9 strand (shown as sticks; 1 strands are shown as cartoons) in CagXct (cyan) and in TraO (magenta, only the residues of TraO are labelled). Apart from the binding residues, there is a high sequence-similarity pattern: 'xVxVxN' (these residues are shown in red; x represents the binding amino-acid residues).

Figure 6
A stereo diagram showing the structural superposition of CagX (ice-blue worm model) and TraO (yellow; PDB entry 3jqo). Side chains of residues forming the conserved protein core are shown with C atoms in coral (CagX) or green (TraO). Residue numbers of TraO are given in parentheses after these of the CagX residues.