Structure of the virulence-associated protein VapD from the intracellular pathogen Rhodococcus equi

VapD is one of a set of highly homologous virulence-associated proteins from the multi-host pathogen Rhodococcus equi. The crystal structure reveals an eight-stranded β-barrel with a novel fold and a glycine rich ‘bald’ surface.

Rhodococcus equi is a multi-host pathogen that infects a range of animals as well as immune-compromised humans. Equine and porcine isolates harbour a virulence plasmid encoding a homologous family of virulence-associated proteins associated with the capacity of R. equi to divert the normal processes of endosomal maturation, enabling bacterial survival and proliferation in alveolar macrophages. To provide a basis for probing the function of the Vap proteins in virulence, the crystal structure of VapD was determined. VapD is a monomer as determined by multi-angle laser light scattering. The structure reveals an elliptical, compact eight-stranded -barrel with a novel strand topology and pseudo-twofold symmetry, suggesting evolution from an ancestral dimer. Surfaceassociated octyl--d-glucoside molecules may provide clues to function. Circular-dichroism spectroscopic analysis suggests that the -barrel structure is preceded by a natively disordered region at the N-terminus. Sequence comparisons indicate that the core folds of the other plasmid-encoded virulenceassociated proteins from R. equi strains are similar to that of VapD. It is further shown that sequences encoding putative R. equi Vap-like proteins occur in diverse bacterial species. Finally, the functional implications of the structure are discussed in the light of the unique structural features of VapD and its partial structural similarity to other -barrel proteins.

Introduction
Rhodococcus equi is a soil-borne bacterial pathogen that causes disease in a range of animals, including pigs, sheep and cattle. It is also an opportunistic pathogen among immunecompromised humans such as AIDS patients, in which it causes tuberculosis-like symptoms. However, it is most closely associated with disease in young horses, in which it causes a severe pyogranulomatous pneumonia (Muscatello et al., 2007). Mortality in infected foals of up to six months of age can approach 80% if the disease is not treated, and even with antibiotic treatment mortality approaches 30%. There is currently no commercially available vaccine licensed to prevent disease caused by R. equi (Muscatello et al., 2007).
It is believed that the major route of R. equi infection and disease transmission is through inhalation into the lungs of aerosolized dust formed from contaminated faeces as well as by aerosol transmission between foals (Muscatello et al., 2009). In the lungs, the bacteria enter alveolar macrophages by complement-mediated phagocytosis (Hondalus & Mosser, 1994). The phagosome in which R. equi resides would normally undergo a process of maturation, acquiring degradative and microbicidal properties through sequential fusion with a series of endomembrane compartments (early endosomes, late endosomes and lysosomes; Scott et al., 2003). However, R. equi is able to escape cell killing in the macrophage by preventing phagosome maturation to the phagolysosome stage (Meijer & Prescott, 2004). Analysis of R. equicontaining phagosomes for the presence of protein markers associated with the different stages of the endocytic pathway showed that early endosome markers are acquired and lost normally, whereas the acquisition of some late endosome markers is delayed or abolished (von ). The vATPase proton pump (required for phagosome acidification) is not acquired and there are alterations in the physical appearance of the phagosome (Fernandez-Mora et al., 2005). Collectively, this points to a block between the early and late endocytic stages. Having diverted the cell's destruction pathways, the R. equi cells begin to multiply within the membrane-enclosed vesicle and to exert cytotoxic effects. This leads to killing of the macrophage and dissemination of the pathogen through the body, notably to the gut.
The ability of R. equi to survive and replicate inside macrophages is linked to its possession of a large virulence plasmid (80 kbp). Strains cured of this plasmid are no longer virulent in foals or in mouse models of infection and cannot survive in macrophages cultured in vitro (Hondalus & Mosser, 1994;Giguè re et al., 1999). The plasmid contains a pathogenicity island with 26 coding sequences, including that for VapA (virulence-associated protein A). VapA was identified in early studies as a 15-17 kDa factor associated with virulence, against which antibodies in the serum of foals infected with R. equi invariably react (Takai, Sekizaki et al., 1991;Takai, Koike et al., 1991). It is the defining member of a family of proteins unique to the R. equi virulence-plasmid pathogenicity island, with the others being VapC, VapD, VapE, VapG and VapH (Takai et al., 2000). All of these proteins have secretion signals and a number have been observed to be exported from the cell (Byrne et al., 2001). VapA appears to be unique among the virulence-associated proteins of R. equi in being retained on the cell surface (Takai et al., 1992;Tan et al., 1995;Byrne et al., 2001).
VapA is associated with all R. equi strains isolated from infected foals; moreover, deletion mutagenesis experiments have shown that the presence of the vapA gene is essential for intracellular growth of the bacterium in macrophages (Jain et al., 2003). While this shows that VapA is required for virulence, this factor is not sufficient since VapA expression in the absence of the other virulence plasmid-encoded proteins does not confer virulence (Giguè re et al., 1999). A recent study found that VapA is required for diversion of the phagosome-maturation pathway and to prevent acidification of phagosomes (von . vapA, as part of the vapAICD operon (Byrne et al., 2008), is coordinately regulated with other genes of the pathogenicity island (Miranda-Caso-Luengo et al., 2011). Elevated temperature, low pH, oxidative stress and low levels of iron stimulate its transcription (Takai et al., 1992(Takai et al., , 1996Benoit et al., 2001Benoit et al., , 2002Jordan et al., 2003).
Extensive studies over two decades have illuminated the genetics and microbiology of R. equi and defined the roles of the major virulence factor VapA and its homologues in disease. However, the mechanisms by which these proteins promote cell survival in the macrophage are not known. To provide further insight into the role of the Vap proteins in the virulence of R. equi, we sought to determine the threedimensional structure of VapA by X-ray crystallography. Following unsuccessful attempts to crystallize VapA, we turned to other members of the Vap protein family. This led to crystals of VapD suitable for structure determination; here, we present the structure of a core fragment of VapD solved to 1.9 Å resolution. Sequence comparisons together with circular-dichroism data for VapA suggest that the other virulence proteins of R. equi have closely similar structures.

Cloning, expression and purification of VapA and VapD
Both VapA and VapD contain signal sequences. The N-terminal amino acid of mature VapA was experimentally determined as Thr32 (Tan et al., 1995). Using the SignalP server (Petersen et al., 2011), the N-terminal amino-acid residue of the mature VapD protein is predicted to be Gln31. For the sake of clarity, numbering of amino-acid residues throughout is based on mature proteins, not taking the signal sequences into account. Fragments of the vapA gene encoding residues 1-158 (VapA-full) and the vapD gene encoding residues 1-134 (VapD-full) were amplified by polymerase chain reaction (PCR) from genomic DNA of R. equi virulent strain 103S (de la Peñ a-Moctezuma & Prescott, 1995) using the primers listed in Table 1. The amplified fragments were purified, digested with restriction endonucleases NdeI and XhoI (New England Biolabs Inc.) and then inserted into the Escherichia coli expression vector pET-30a(+) digested with the same enzymes (EDM Millipore Chemicals; Table 1). For the production of the N-terminally truncated form of VapD (residues 20-134; VapD-core), PCR was performed using the primers listed in Table 1 Table 1 Primers used in gene-cloning experiments.

Protein construct/vector Primers
VapA-full/pET-30a(+) expression vector pET-22b by the In-Fusion method (Clontech Laboratories). These constructs directed the expression of full-length VapA (VapA-full), full-length VapD (Vap-full) and the N-terminally truncated form of VapD (Vap-core), each with the amino-acid residues LEHHHHHH attached at the C-terminus. Site-directed mutagenesis was carried out using the primers listed in Table 1 to substitute methionine residues at positions Leu100 and Val123 in VapD-core. The sequences of the constructs were verified by sequencing the plasmids using T7 promoter primer. The expression strain E. coli BL21 (DE3) (EDM Millipore Chemicals) was used for the production of VapA-full, VapDfull and VapD-core. In each case, transformed cells were grown at 37 C in lysogeny broth supplemented with antibiotic to an optical density of 0.7-1.0, induced with 1 mM isopropyl -d-1-thiogalactopyranoside and then cultured for a further 3 h at 37 C. Selenomethionine-substituted truncated protein (VapD-core-SeMet) was produced by the same method using expression strain E. coli B834 (DE3) with the cells cultured in minimal medium containing selenomethionine with 50 mg ml À1 ampicillin (Studier et al., 1990). To purify VapD-full, VapD-core or VapD-core-SeMet, cells were Comparison of Vap sequences. (a) Amino-acid residue sequence alignment of the mature Vap proteins encoded by the plasmids pVAPA1037 (Takai et al., 2000) and pVAPB1593 (Letek et al., 2008). Invariant residues in the alignment are shown in white on a red background; conserved residues are in blue boxes. The secondary structure for VapD is indicated above the alignment. Black triangles highlight four conserved amino-acid residues involved in the amide cluster referred to in the text. (b) Alignment of the sequences of R. equi Vap homologues from diverse species. The aligned sequences with their UniProt accession codes are VapD (ReVapD; B4F3C5) and VapA (ReVapA; B4F3C2) from R. equi and homologues from Xenorhabdus bovienii (Xbovi; D3V6D4), Halomonas titaniciae BH1 (Htita; L9UBC3), Lacinutrix sp. strain 5H374 (Lacin; F6GHM2), Escherichia coli H263 (Ecoli; 9VNI1) and Clostridium sp. DL-VIII (Clost; G7LYQ8). For this comparison the 30-residue signal peptide of VapD is retained, so the numbering is different to that used in (a). harvested by centrifugation, resuspended in extraction buffer consisting of 20 mM HEPES pH 7.5, 500 mM NaCl, 20 mM imidazole, to which an EDTA-free protease-inhibitor cocktail tablet (Roche Diagnostics, USA) had been added, and lysed by sonication. Cleared lysate was applied onto a nickel-affinity chromatography column (GE Healthcare) which had been pre-equilibrated with binding buffer (20 mM HEPES pH 7.5, 500 mM NaCl, 20 mM imidazole), and protein was eluted with a linear imidazole gradient (20-500 mM). Eluted fractions were analysed by SDS-PAGE and those containing overexpressed protein (approximately 90% pure) were pooled and concentrated and then applied onto a 16/60 S75 Superdex gel-filtration column (GE Healthcare) which had been preequilibrated with 20 mM HEPES pH 7.5, 150 mM NaCl. Eluted fractions containing pure Vap protein (appearing as a single band on SDS-PAGE) were concentrated to approximately 20-40 mg ml À1 protein and stored at À80 C. VapA-full was purified by the same method using a buffer containing 50 mM Tris-HCl pH 8.5 instead of 20 mM HEPES pH 7.5.

Size-exclusion chromatography with multi-angle laser light scattering (SEC-MALLS)
For determination of the oligomeric state of VapD-core and VapD-full, the proteins were analysed by SEC-MALLS. Samples of protein at concentrations of 2.5 and 4.0 mg ml À1 for VapD-core and VapD-full, respectively, were loaded onto a Superdex S75 10/300 gel-filtration column equilibrated at 0.5 ml min À1 with a mobile phase consisting of 50 mM Tris-HCl pH 8.0, 150 mM NaCl. The eluate was passed through an SPD20A UV-Vis detector, a Wyatt DAWN HELEOS II 18-angle light-scattering detector and a Wyatt Optilab rEX refractive-index monitor with the system driven by a Shimadzu HPLC system comprising an LC-20AD pump. The data were processed and molecular masses were calculated using the Astra V software (Wyatt) as described previously (Colledge et al., 2011).

Circular-dichroism spectroscopy
Circular-dichroism (CD) spectra were recorded at 20 C with a Jasco J-180 CD spectrophotometer using a 0.1 cm pathlength quartz cell as described previously (Levdikov et al., 2012). Experiments were carried out in 20 mM Tris-HCl buffer pH 7.5. The protein concentrations in the samples were 0.2 mg ml À1 . Random error and noise were reduced for each spectrum by averaging three scans in the wavelength range 260-195 nm. The signal acquired for the buffer used for dilution of the proteins was subtracted from the spectra acquired for the proteins.

Limited proteolysis
Limited proteolysis was carried out on VapD-full in 10 ml reaction mixtures with a final protein concentration of 1 mg ml À1 in 50 mM Tris-HCl pH 7.5-8.0, 50-150 mM NaCl. Chymotrypsin was added to give chymotrypsin:protein weight ratios of 1:50; 1:100; 1:200 and 1:400. The time of incubation was varied from 15 min to 2 h. Each reaction was quenched by adding 1-10 mM PMSF. The reaction products were analysed by SDS-PAGE.

Protein crystallization, data collection, structure solution and refinement
Screening for crystallization conditions for VapA-full, VapD-full, VapD-core and VapD-core-SeMet was carried out using a robotic (Mosquito) nanolitre sitting-drop format with commercially available 96-well screens. Promising crystallization conditions, which were only obtained with the VapD protein constructs, were then optimized by hand using the hanging-drop vapour-diffusion method in a 24-well format. X-ray diffraction data were collected at 100 K at the Diamond Light Source experimental stations I02 and I04. Data sets were integrated using XDS (Kabsch, 2010) and were scaled/merged with AIMLESS Evans & Murshudov, 2013). The structure of VapD-core was solved by the singlewavelength anomalous dispersion method (SAD) with data collected at a wavelength optimized for the f 00 signal of selenium using heavy-atom phasing and density modification as implemented in SHELXC/D/E (Sheldrick, 2010). Automated model building was then carried out using Buccaneer (Cowtan, 2006). The resulting model, which constituted almost the entire protein chain, was refined against native data for VapDcore using maximum-likelihood methods as implemented in REFMAC5 (Murshudov et al., 2011). This was interspersed with manual corrections to the model using Coot (Emsley et al., 2010). The refined model was then used to solve and partially refine the structure of the (isomorphous) VapD-full to investigate the conformation of the N-terminus of the protein.
Multiple sequence alignment was performed using ClustalW (Thompson et al., 1994), secondary-structure predictions were carried out using the PSIPRED protein structure-prediction server (McGuffin et al., 2000) and three-dimensional structural alignments were made using secondary-structure matching as implemented in Coot (Krissinel & Henrick, 2004

Sequence comparisons and structure prediction
An amino-acid sequence alignment of 12 virulenceassociated proteins (Vaps) from two different host-specific virulence plasmids of R. equi, the horse-specific plasmid (pVAPA1037; VapA and VapC-I) and the pig-specific plasmid (pVAPB1593; VapB and VapJ-M), is shown in Fig. 1(a). These sequences share approximately 26% identity and 63% similarity with that of VapD, indicating that their structures are very similar. In silico secondary-structure analysis of the sequence of the Vap proteins predicted that the 20-50-residue variable regions following the signal peptides at their Ntermini are natively disordered, while the remainder of the chains form -strands and a single -helix (McGuffin et al.,  2000). The sequence alignment highlights some interesting research papers features including a conserved glycine-rich C-terminus and a 5-6-residue insert at residue 50 in VapD and VapE. This insert occurs between predicted secondary-structure elements and would not be expected to disrupt the overall fold. The Vap proteins have no homologues among proteins of known structure deposited in the Protein Data Bank and thus their three-dimensional structures cannot be predicted or modelled.

Expression, characterization and crystallization of VapD
The vapA and vapD genes of R. equi encode 189-residue and 164-residue polypeptides, of which the first 31 and 30 residues, respectively, constitute a signal peptide that directs secretion of the proteins from the cell. For biochemical and structural characterization, the mature forms of VapA (residues 1-158; VapA-full) and VapD (residues 1-134; VapD-full) were produced in E. coli and purified by nickel-affinity chromatography followed by size-exclusion chromatography. The VapA protein preparations were often heterogeneous as judged by nondenaturing polyacrylamide gel electrophoresis and we were unable to crystallize this protein. VapD-full yielded weakly diffracting crystals. Although these crystals were initially unsuitable for structure determination, we focused our future efforts on VapD as it gave homogeneous protein preparations. In view of the predicted disorder at the N-terminus, a limited proteolysis experiment was carried out in the presence of chymotrypsin (Fig. 2a). Electro-spray mass-spectrometric analyses of the protein before and after chymotrypsin treatment showed a molecular-mass difference of 1531 Da, indicating that cleavage had taken place after Leu14 to remove 15 amino-acid residues from the protein. Guided by this observation, a new construct was made to direct the production of VapD with a truncated N-terminus (VapD-core; Á1-19).
The molecular characteristics of the full-length and core forms of VapD were assessed using circular-dichroism spectroscopy and size-exclusion chromatography with multi-angle laser light scattering (SEC-MALLS). In the CD spectrum, the molar ellipticity of VapD-full exhibits a shallow minimum at 215-220 nm consistent with the predominance of -stranded structure (Fig. 2b). The spectrum for VapD-core is very similar, suggesting that the structure is not significantly perturbed by the N-terminal truncation (Fig. 2b). In the SEC-MALLS experiments, samples were fractionated on a gelfiltration column and the absorbance at 280 nm and the refractive index of the eluate were monitored together with the multi-angle laser light scattering of the sample. This enables the weight-average molecular weight (M w ) of species in the eluate to be calculated continuously. Using this analysis, both constructs of VapD were shown to behave as monomers in solution, with experimentally determined molecular masses  Table 2 Crystallization conditions, data-collection and refinement statistics.

Se derivative
VapD-core native VapD-full native Crystallization conditions Protein solution † 12-20 mg ml À1 VapD-core-SeMet in 20 mM HEPES pH 7.5, 500 mM NaCl, 5 mM TCEP-HCl 20 mg ml À1 VapD-core in 20 mM HEPES pH 7.5, 500 mM NaCl 12 mg ml À1 VapD-full in 20 mM HEPES pH 7.5, 150 mM NaCl  (hkl) is the intensity of the ith measurement of a reflection with indexes hkl and hI(hkl)i is the statistically weighted average reflection intensity. } R cryst = P hkl jF obs j À jF calc j = P hkl jF obs j, where F obs and F calc are the observed and calculated structure-factor amplitudes, respectively. † † R free is the R cryst calculated with 5% of the reflections chosen at random and omitted from refinement. ‡ ‡ Average geometric restraints are given in parentheses. of 15.3 and 13.4 kDa for the full-length and core proteins, respectively (Fig. 2c).
Truncation of the N-terminus of VapD and modifications to the crystallization conditions led to improvements in crystal size and, more importantly, diffraction quality. The VapD-core construct was then adapted to introduce internal methionine codons for the purpose of structure solution. Methionine substitutions were made at Leu100 and Val123, guided by the presence of Met residues at the corresponding positions of VapL and VapC, respectively (Fig. 1a). Crystals of VapD-core and VapD-core-SeMet were obtained under similar crystallization conditions (Table 2). In both cases, octyl--d-glucoside proved to be an essential crystallization component.

Structure determination and refinement
Data collected from a single crystal of VapD-core-SeMet extending to 2.01 Å spacing (Table 2) were used to solve the structure by single-wavelength anomalous dispersion (SAD). Two Se sites were found using SHELXD (Sheldrick, 2010), consistent with the presence of one VapD molecule in the asymmetric unit of the crystal. The protein structure was then built and refined against native data extending to 1.9 Å spacing collected from a single crystal of VapD-core. During refinement, large peaks of positive difference electron density appeared indicating the presence of octyl--d-glucoside bound between the protein molecules. Three octyl--dglucoside molecules were built into the structure, and a fourth potential site was identified but not modelled as the weakness of the electron density suggested low occupancy. The refined structure consists of residues Pro22-Glu134 (Fig. 3a). Residues 20-21 at the N-terminus and the C-terminal His 6 tag are not defined in the electron-density maps and are assumed to be disordered. The maps are otherwise of very good quality (Fig. 3b). The modelled solvent content is lower than might be expected for a protein of this size owing to the bound octyl-d-glucoside molecules, which cover a significant proportion of the surface area of the protein.
At this point, following modifications to the crystallization and cryoprotection protocols, we were able to collect an X-ray diffraction data set extending to 2.1 Å spacing from a single crystal of VapD-full (Table 2). Preliminary refinement against these new data showed that residues preceding Pro22 were not defined in 2F o À F c and F o À F c electron-density maps and that the structural model could not be extended beyond that of VapD-core. These data are consistent with the prediction that the N-terminal 20 residues of VapD are natively disordered.

The protein fold of VapD-core
The protein fold of VapD-core consists of a compact eightstranded -barrel which is elliptical in cross-section (Figs. 3a and 3c). The strand ordering for the barrel is 1-2-3-8-5-6-7-4. At one end of the barrel the turns between strands are very short, giving rise to a smooth, rounded surface with a distinctly apolar character (Fig. 3c). By contrast, the other end of the barrel has some longer inter-strand regions which protrude from the barrel in the form of a nine-residue -helix with two flanking loops (2-3 and 6-7; Figs. 3a and 3c). This more complex end of the protein is richer in charged and polar surface residues (Fig. 3c). Notably, in the 2-3 loop there is a group of acidic side chains (Asp47-Asp-Ala-Asp-Glu) followed by three basic side chains (Lys52-Lys-Gly-Lys). This loop-forming segment of the polypeptide appears as an insertion in the multiple sequence alignment that is restricted to VapD and VapE (Fig. 1a). It is the least well ordered part of   the VapD-core protein, as is shown by the lack of electron density for some of the side chains. Other parts of the protein surface are strikingly hydrophobic (Fig. 3c).
The C-terminal segment of the polypeptide (residues 124-134) contains six glycines (Figs. 1a and 3d). As the side chains of Ser124, Ile126 and Glu134 project outwards, the only contribution to the protein core from this segment of the polypeptide is made by the side chain of Trp133.
The closed -barrel architecture of the VapD fold can be classified by the number of strands n = 8 and the Shear number S = 10 (Murzin et al., 1994a). Barrels of this class are common and are found in both globular and membrane proteins. Barrels within the same class display similar geometrical features but may have different topologies (Murzin et al., 1994b). The topology of the VapD barrel has not been observed before. The strand order of its antiparallel -sheet is 1-2-3-8-5-6-7-4, which differs from the meander topology 1-2-3-4-5-6-7-8 typical of this class of -barrel proteins by the transposition of strands 4 and 8. There are three inter-strand connections capping the ends of the barrel: two (between strands 3 and 4 and strands 7 and 8) are at one end and one, containing the short -helix (between strands 4 and 5), is at the other end. The VapD barrel is flattened in appearance. The strands 4 and 8 are strongly coiled to create 'corners' between 'flat sides' (Murzin et al., 1994b). The coiling of these strands is facilitated by conserved glycine residues (68, 74, 125, 127, 129 and 131) that adopt extended conformations with ', values outside the normally allowed -sheet region. These 'Gly kinks' sharply change the directions of the strands but allow regular hydrogen bonding to be maintained on both sides of the kinked strands (Murzin et al., 1994b).

The core of the barrel
As a closed -barrel structure, VapD has a dense core formed by the side chains of alternate residues on the  -strands, which project into the protein interior (Fig. 4a). Trp133 is at the heart of this core, its bulky side chain forming apolar contacts to residues from six of the -strands and the -helix. In addition, its indole NH forms a hydrogen bond to the hydroxyl of Thr88. This is a rare polar interaction in a strikingly hydrophobic protein core. The side chain of Trp133 also packs between the rings of Tyr85 and Phe57, and this aromatic cluster extends to Phe104, Phe42, Phe91, Phe93 and Phe72. The uppermost layer of the protein interior, when viewed as in Fig. 4(a), features a number of buried polar side chains, including those of Gln40, Ser35, Thr65 and Tyr93, that form a network of buried hydrogen bonds (Fig. 4b). Although just five residues long, the 7-8 loop is the longest of the loops at the upper end of the molecule. Asp70 appears to play an important role in satisfying the hydrogen-bonding requirements of the main-chain portion of this loop and in determining its conformation, as shown by the charge-dipole interactions between its side-chain carboxylate and the main-chain amide groups of the four successive residues Ala120, Gly121, Thr122 and Val123 (Fig. 4b). The carboxylate also forms a polar interaction with the most buried member of a set of three water molecules that form a short channel to the surface. Despite its important structural role in VapD, this residue is not conserved in any of the other Vap proteins of R. equi.

Sequence considerations
In Fig. 4(c), the invariant residues in the plasmid-encoded virulence-associated proteins of R. equi (Fig. 1a) are mapped onto the structure of VapD. These residues are asymmetrically distributed in three dimensions, with the majority being located to the left of a diagonal running from the top left to the bottom right through the structure in the view shown in Fig. 4(c). There is a cluster of conserved residues with aromatic and aliphatic side chains that pack around Trp133 in the core, in addition to Thr88, whose side-chain hydroxyl forms a hydrogen bond with the indole NH.
Of the 30 invariant residues, 12 are glycines. Eight of these (three located on strand 3 and a further five located on strand 8) form something of a cluster and contribute to a rather featureless region of the surface. The hydrogen-bonded pair of glycine residues, Gly62 and Gly128, is of special note since nonglycine residues in these positions would have their side chains exposed. Such cross-strand pairs of glycine residues are very rare, presumably because  Core and conserved residues. (a) Stereoview of the VapD chain represented as a green worm highlighting the side chains of residues (which are labelled) in the protein core, defined here as residues with <10 Å 2 of accessible surface area. The C atoms and side chains of these residues are shown in cylinder format coloured by atom: carbon, grey; oxygen, red; nitrogen, blue. Side chain-side chain hydrogen bonds are indicated by dashed lines. (b) Local hydrogen-bonding networks in the otherwise apolar core of VapD: left, interactions of the side chains of Ser35, Gln40, Thr65 and Tyr93; right, the role of Asp70 in organizing the 7-8 loop through sidechain carboxylate-main-chain amide interactions with Ala120, Gly121, Thr122 and Val123. Also shown is the interaction of Asp70 with one of three buried water molecules. (c) View of the VapD chain as in (a) depicting the 30 invariant residues from the alignment of Vap proteins shown in Fig. 1. The C atoms of the invariant glycine residues are shown as spheres. they greatly decrease the stability of the -sheet. The conservation of these two glycine residues in the R. equi Vap family sequences therefore suggests a functional role. The glycine pair contributes to a 'bald' spot on the side of the barrel which is devoid of side chains and is similar in size to the face of a porphyrin ring. The 'bald' spot in VapD has a smooth flat surface that constitutes a binding site for two octyl--d-glucoside molecules (see below), which were introduced in the crystallization solution. Interestingly, in the only other -barrel structure containing such a cross-strand pair of 'surface' glycine residues that we have found so far, the histidine porin OpdC (PDB entry 3sy9), there are again two lipid molecules bound to the bald spot around Gly64 and Gly104 in chain B (Eren et al., 2012). The area around the 'bald' spot in VapD is made up of hydrophobic and uncharged polar residues. However, many of the polar groups here form hydrogen bonds to one another, effectively enhancing the nonpolar character of this extended area. In particular, most of the main-chain groups in the capping loops 3-4 and 7-8 and the -turns 1-2 and 5-6 at one end of the barrel are hydrogen-bonded to each other or to conserved side-chain groups. Moreover, adjacent to the bald spot are the conserved residues Asn94, Asn101 and Asn103, which together with Gln92 in VapD form a closed network of hydrogen bonds (Fig. 3b). Gln92 is conserved but not invariant, frequently being replaced by the isosteric Glu residue in other Vap proteins (Fig. 1a).

Crystal packing and octyl-b-D-glucoside binding
VapD crystallizes in space group F432 with one molecule of protein per asymmetric unit and a solvent content of 43%. As mentioned above, the appearance of well diffracting crystals was dependent on the presence of octyl--d-glucoside (OG) in the crystallization solution. As shown in Fig. 5(a), the three OG molecules that were defined in the asymmetric unit of the structure mediate important contacts between molecules in the lattice. Two of these molecules, OG1 and OG2, make bridging interactions between VapD chains around the crystallographic threefold symmetry axes, generating clusters of six molecules (Fig. 5a). Four OG3 molecules assemble close to the crystallographic four- Octyl--d-glucoside binding and crystal packing. (a) Top, the packing of molecules around the crystallographic threefold axis with adjacent molecules in the lattice depicted as translucent ribbons coloured by chain. OG1 and OG2 are shown in cylinder format with C atoms in grey and O atoms in red. Bottom, packing of molecules around the crystallographic fourfold axis with OG3 molecules represented as above. (b) Detail of the binding of OG1 and OG2. Residues surrounding the glycolipid are shown in ball-and-stick format coloured as above except that C atoms are shown in green for one VapD molecule and in light blue for its symmetry mate. Selected residues are labelled. Residues from the symmetry-related molecule are indicated by primes ( 0 ). Polar interactions are indicated by dashed lines. fold symmetry axes, where they similarly mediate lattice interactions (Fig. 5a).
The interactions of OG1 and OG2 in the VapD crystal are shown in detail in Fig. 5(b). OG1 resides between adjacent molecules, with the least-squares plane of its pyranose ring lying parallel to both the flat glycine-rich surface, featuring Gly62-Gly63 and Gly127-Gly128-Gly129, of one VapD chain and the planar guanidino group of Arg111 from a neighbouring molecule (Fig. 5b). Multiple polar interactions are formed by the hydroxyl groups of the glucose and Thr130 from one VapD molecule and Arg111 of the neighbouring molecule. A water molecule makes bridging hydrogen bonds from the sugar of OG1 to the 2-OH of OG2, the 5-OH and 6-OH groups of which form hydrogen bonds to Ser107 and Ser90, respectively, while the 3-OH forms a charge-dipole interaction with the side-chain carboxylate of Asp109 in the neighbouring molecule. The eight-carbon aliphatic chain on OG1 extends away from the sugar across the glycine-rich surface. The C 8 chain of OG2 initially packs beside that of OG1 before folding back towards the sugar and packing across the surface of Phe104.
The sugar of OG3 is somewhat disordered and does not form direct interactions with the protein molecules. Instead, the VapD interactions are mediated by the aliphatic chains which pack between the aromatic side chains of Phe71 and Trp73. The indole side chains of symmetry-related Trp73 side chains slot between the four OG3s arranged around the fourfold axis.

The b-barrel structure of VapD is likely to be shared by VapA
The sequence alignment shown in Fig. 1(a) suggests that the -barrel structure of VapD will be shared by the other Vap proteins of R. equi, including the major virulence factor VapA. To provide experimental support for this assertion, we recorded a circular-dichroism spectrum of VapA (data not shown). Analysis of this spectrum using tools provided through the DichroWeb server (Whitmore & Wallace, 2008) suggests that VapA is rich in strands (45%) and turns (22%), with a small proportion of helix (7%). In addition, there is a significant proportion of unordered structure (26%). This spectrum is similar to that recorded for VapD and is consistent with the secondary-structure composition of the VapD crystal structure.
3.9. R. equi Vap-like proteins are widely distributed It was thought for many years that sequences encoding Vap proteins were restricted to R. equi strains harbouring virulence plasmids. However, when comparing the Vap sequences with recent entries in GenBank, it is apparent that R. equi-like vap genes are widely distributed. Thus, they occur in E. coli, Clostridium spp., Halomonas spp., Lacinutrix spp. and Xenorhabdus bovienii, representing the diverse phyla of Firmicutes, Proteobacteria and Bacteroidetes. The sizes of the encoded (putative) proteins fall within the range (150-206 residues) of those found in the R. equi Vaps, with sequence identities ranging from 27 to 42%. It is apparent from the alignment shown in Fig. 1(b) that, as for the Vap proteins of R. equi, the Vap homologues from these diverse species each possess a glycinerich sequence followed by a tryptophan at their C-termini. However, none of these putative proteins has a recognisable secretion signal peptide at its N-terminus, suggesting that they may be cytoplasmic. The functions of the putative Vap protein homologues have yet to be determined. Interestingly, they are not plasmid-encoded; moreover, they appear to occur as single genes rather than as clusters of homologous genes as found in R. equi.  mechanisms by which the pathogen evades the host cell's defences are currently unknown, although studies suggest that R. equi is able to halt the normal cell-killing processes of the macrophage by preventing maturation of the phagosome at the early-to-late endocytic stage (Zink et al., 1987;Fernandez-Mora et al., 2005;Toyooka et al., 2005). The important but currently unknown modes of action of the Vap proteins in R. equi pathogenicity make them attractive targets for crystallographic studies. Given the high sequence similarity shared by these proteins, it is very probable that they exhibit the same essential structural features as VapD, including the eight-stranded -barrel, the 4-5 -helix and the 6-7 loop. VapE is the only other protein expected to have the protracted loop between strands 2 and 3 seen in VapD. In the case of VapE, this loop is glycine-rich (Fig. 1a). The conserved C-terminal tryptophan residue is a dominant feature of this protein structure. This residue may have a functional role as well as being an essential structural component. For all of the Vap proteins, the hypervariable regions at the N-termini almost certainly lack ordered structure, although the variety in their length and sequence hints at different functionalities or different interacting partners.

Structure comparisons and functional implications
Despite its novel topology, the VapD fold shows extensive although partial similarity to other -barrel folds that is detectable by popular structure-similarity search tools such as DALI and PDBeFold. The top-scoring hits are -barrels of the same structural class (n = 8; S = 10) with simple meander topology, six of the eight stands of which (2, 3, 4, 6, 7 and 8) can be structurally aligned with six strands in the VapD barrel (1, 2, 3, 5, 6 and 7, respectively). Amongst these hits are -barrel proteins implicated in bacterial virulence (Fig. 6).
One such protein, OmpX ( Fig. 6a; PDB code 1qj8; Vogt & Schulz, 1999), belongs to a family of membrane proteins that plays roles in (i) bacterial adhesion to, and entry into, mammalian cells and (ii) resistance to attack by the human complement system. Although the Vap proteins of R. equi are not membrane-spanning, the presence of bound OG molecules in the crystal structure of VapD suggests the possibility of a transient association with the complex mycolic acid-rich cell envelope of this Gram-positive bacterium.
Bradavidin 2, an avidin-like protein from Bradyrhizobium japonicum, binds biotin in a cavity in the barrel interior with the 3-4 loop serving as a lid ( Fig. 6b; Leppiniemi et al., 2013). The absence of a cavity in VapD suggests that it is not involved in small-molecule binding. Curiously, in bradavidin and other avidin homologues ligand binding is associated with dimer and tetramer formation and in other instances biotin is bound at subunit interfaces. This is interesting in view of the mode of OG1 and OG2 binding in the subunit interfaces between pairs of VapD molecules in the crystal.
Like VapD, the periplasmic lysozyme inhibitor PliC from Salmonella typhimurium (PDB entry 3oe3; Leysen et al., 2011;Fig. 6c) is secreted through the cytoplasmic membrane, although in this Gram-negative bacterium PliC is retained in the periplasmic space. It is thought that PliC inhibits the bactericidal action of lysozymes which, following permeabilization of the outer membrane, degrade cell-wall peptidoglycan as part of the innate immune response of animals. The structural similarity, their extracellular localization and their functional association with bacterial virulence suggest the interesting possibility that the Vap proteins of R. equi may act as inhibitors of enzymes involved in endosome-associated host defence.
The unique structural features of VapD conserved in the Vap proteins of R. equi provide similar functional insights. The 'bald' spot and its extensive surrounding nonpolar area to which two OG molecules are bound suggests the possibility that the R. equi Vap family members make functional interactions with large nonpolar surfaces. By analogy with the binding of antifreeze proteins to particular planes in ice VapD and internal symmetry. VapD molecules are shown in worm representation with residues 29-77 coloured blue and residues 89-134 coloured red; the remaining residues are in grey. The C atoms of residues 29-77 of one molecule were superimposed onto residues 89-134 of a copy of this molecule using the SSM Superpose routine implemented in CCP4mg. (a, b) Views of VapD showing the orientation of the molecules before and after superposition and the direction of the screw axis that relates them. (c) The pair of superposed VapD molecules. crystals, the 'bald' spot may direct VapD and its relatives to ordered lipid structures. An example of such a structure is the monolayer formed by trehalose 6,6 0 -dimycolate, a glycolipid of mycobacteria and R. equi known to be a virulence factor (Hsu et al., 2011;Schabbing et al., 1994;Sydor et al., 2013). Mycolic acids and their derivatives are the main components of the cell envelope of these bacteria. An attractive hypothesis is that the VapA family members facilitate reorganization of this envelope, for example by providing molecular surfaces suitable for the nucleation of the toxic TDM monolayer. Conversely, they may bind to ordered structures in the cell envelope and help to disrupt them.
Finally, the two surface residues Tyr39 and Ser90 form a contiguous patch with the 'bald' spot at Gly62 and Gly128, which is conserved not only in the R. equi Vap proteins but also in their relatives from other bacterial species. The role of the conserved hydroxyl groups of these tyrosine and serine residues is not clear, but they may contribute to the recognition of a larger ligand or receptor. Alternatively, they may be the sites of as yet unknown post-translational modifications.

Internal symmetry
The proteins in the Protein Data Bank with the closest structural similarity to VapD have antiparallel -barrel structures with the all-next-neighbour 1-2-3-4-5-6-7-8 topology. This topology is clearly distinct from that of VapD, where the 1-2-3-8-5-6-7-4 topology gives rise to multiple crossovers and a closed rather than an open barrel. As far as we are aware, this barrel topology has not been observed elsewhere and the VapD fold can thus be described as novel. Interestingly, the -sheet topology apparent in VapD has pseudo-twofold symmetry. The two halves of the VapD barrel, from strand 1 to strand 4 (residues 29-77) and from strand 5 to strand 8 (residues 89-134), are structurally similar and can be superimposed onto each other with an r.m.s.d. of 1.8 Å for 42 pairs of C atoms (Fig. 7). This similarity extends to the turns and loops at one end of the barrel. This suggests the possibility that the VapD fold originated from an ancestral dimeric protein through the duplication of a gene encoding a four-stranded species followed by fusion.
In the VapD fold, the two halves of the molecule are related by a screw axis coinciding with the barrel axis. This is because twofold rotational symmetry about this axis is not compatible with Shear number S = 10 (Murzin et al., 1994a). This results in asymmetric interactions of the two capping loops (3-4 and 7-8) at the end of the barrel. The 3-4 loop effectively extends strand 3 by maintaining the regular hydrogenbonding interactions with the beginning of strand 8. In contrast, the 7-8 loop does not close the end of the barrel through main chain-main chain hydrogen bonding.

Conclusion
In summary, the structure of VapD with its closed -barrel does not immediately illuminate the function of the virulence proteins of R. equi, although unique aspects of its structure and its similarity to proteins of known function in other pathogens suggest future experiments. The binding of octyl-d-glucoside to the protein demonstrates the compatibility of the protein surface with glycolipids which are present in the outer membrane (Garton et al., 2002). It is tempting to speculate that the hydrophobic portion of the barrel becomes buried in the unusual mycolic acid-rich outer surface of the Gram-positive R. equi bacterium. However, among the virulence-associated proteins of R. equi, only VapA remains associated with the cell surface; the others are secreted. Thus, it may be that the hydrophobic surface of VapD merely facilitates the passage of the protein through this complex layer.