Structural characterization of the Streptococcus pneumoniae carbohydrate substrate-binding protein SP0092

The crystal structure of SP0092 was determined at 1.61 Å resolution and reveals a domain-swapped dimer with the monomer subunit in a closed conformation in the absence of ligand.


Introduction
Streptococcus pneumoniae (the pneumococcus) resides asymptomatically in the upper airway tract but can migrate to normally sterile locations to cause diseases such as otitis, pneumonia, sepsis, septicaemia and meningitis (Weiser, 2010;Bogaert et al., 2004). S. pneumoniae relies solely on carbohydrates as a source of carbon and, as these are limited in the nasopharynx, it dedicates over 30% of its transport systems to the uptake of carbohydrates, which are scavenged from host complex glycans (Burnaugh et al., 2008;King, 2010;King et al., 2006;Buckwalter & King, 2012;Bidossi et al., 2012). These transport systems include phosphotransferase systems, ATPbinding cassette (ABC) transporters and porins, which provide the potential to convey up to 32 different carbohydrates (Bidossi et al., 2012). In ABC transporters the ligand is translocated through the membrane by transmembrane permease domains activated by a pair of conserved cytoplasmic nucleotide-binding domains. In the case of type I and II ABC importers a substrate-binding protein (SBP) presents the bound substrate to the outward-facing side of the transporter, which selectively binds the ligand and transfers it to the transmembrane domains (Hopfner, 2016;Locher, 2016). SBPs are formed by two / domains connected by a hinge region, which are interdependent in the apo form (Tang et al., 2007). Upon ligand binding at the interface between the two domains, the protein closes around the ligand in a more rigid conformation; ligand binding in this way has been termed the 'Venus fly trap' mechanism (Mao et al., 1982). As the number of SBP structures determined has increased, the level of structural diversity has concomitantly grown. Six distinct structural groups have been proposed based on structural ISSN 2053-230X similarity, size and the presence of notable structural features (Berntsson et al., 2010). This has recently been extended to a seventh structural class (G) following the structural characterization of FusA, a frucotoligosaccharide SBP from S. pneumoniae (Culurgioni et al., 2016).
Here, we describe the crystal structure of the SBP SP0092 in an atypically closed and ligand-free conformation. SP0092 oligomerizes in solution in a concentration-dependent manner and we propose that dimerization could induce a closed conformation in which ligand binding is modulated. SP0092 belongs to the newly identified 'cluster G' structural subgroup class of SBPs, possessing an extended fold and large ligandbinding cavity that typifies this cluster.

Materials and methods
2.1. Macromolecule production SP0092 39-491 was cloned into the pOPINF vector (OPPF-UK), truncating the first 39 residues coding for the periplasmic localization signal. The native His-tag fusion protein was expressed in Escherichia coli BL21 Rosetta cells by autoinduction using Overnight Express medium (Millipore) supplemented with 1%(v/v) glycerol, while selenomethioninelabelled protein was expressed using SelenoMethionine Medium Complete (Molecular Dimensions) supplemented with 0.5 mM IPTG for induction. Cells were lysed in 0.1 M HEPES pH 7.5, 0.5 M NaCl, 0.02 M imidazole, 10%(v/v) glycerol supplemented with EDTA-free protease inhibitors (Roche) and cleared for 1 h at 100 000g. Cleared lysates were loaded onto an affinity HisTrap HP column (GE Healthcare). The fusion protein was eluted with lysis buffer supplemented with 0.2 M imidazole and, after dilution, was treated with HRV 3C protease overnight at 4 C. The mixture was loaded onto a HisTrap HP column and the cleaved protein was immediately eluted. The resulting sample was loaded onto a Superdex 200 column equilibrated with 0.02 M MES pH 6.5, 0.2 M NaCl, 2.5%(v/v) glycerol, 0.5 mM TCEP. Fractions of the two peaks observed from gel filtration were collected separately and concentrated to 170 and 154 mg ml À1 for the oligomeric and monomeric states, respectively. Macromolecule-production information is summarized in Table 1.

Size-exclusion chromatography and multiangle light scattering
SP0092 39-491 samples at different protein concentrations were loaded onto a Superdex 200 5/150 GL column equilibrated with running buffer [0.02 M HEPES pH 7.5, 0.2 M NaCl, 2.5%(v/v) glycerol, 0.5 mM TCEP]. Relevant collected fractions were loaded onto an SDS-PAGE gel. Static lightscattering experiments were performed at room temperature using a Superdex 200 Increase 10/300 GL column (GE Healthcare) in-line with a DAWN HELEOS II light-scattering detector (Wyatt). The column was equilibrated with running buffer. Samples of 100 ml protein solution at 5 mg ml À1 were analysed. Data acquisition and analysis were carried out using the ASTRA software.

Crystallization
Initial crystals of SP0092 39-491 were obtained by sitting-drop vapour diffusion at 20 C. These initial crystals were obtained by mixing equal volumes of protein (at a concentration of 50 mg ml À1 ) and a reservoir solution consisting of 20%(w/v) PEG 6000, 0.1 M Tris-HCl pH 8.0, 0.02 M zinc chloride. Optimization of the crystallization conditions resulted in single crystals of about 200 mm in size using the conditions detailed in Table 2. Selenomethionine-labelled SP0092 39-491 yielded similar crystals in the same crystallization conditions.

Data collection and processing
For data collection, crystals were first transferred to a cryoprotectant solution [reservoir buffer supplemented with 25%(v/v) glycerol] and then flash-cooled in liquid nitrogen. Crystal screening and initial crystal characterization were carried out on the I03 and I04 beamlines at Diamond Light Source. Diffraction data for selenomethionine-derivatized SP0092 39-491 crystals were collected at the Se K edge. All data were processed with xia2 and resolution limits were defined using a half-data-set correlation coefficient (CC 1/2 ) limit of 0.5, although the crystals diffracted to 1.48 Å resolution in the  Table 1 Macromolecule-production information.

Source organism
S. pneumoniae TIGR4 Forward primer

ATGGTCTAGAAAGCTTTATTTTTTGTTTTTCAAG-AATTCATCGTATTGTTTTTGC
Expression vector pOPINF Expression host E. coli BL21 (Rosetta) Complete amino-acid sequence of the construct produced † The initial GP residues are the residual residues of the HRV 3C protease site.  (Winter et al., 2013). Data-collection and processing statistics are summarized in Table 3.

Structure solution and refinement
The SHELX suite was used to determine the selenium substructure (Sheldrick, 2010). Analysis of the data with SHELXC showed a strong anomalous signal to high resolution, with a CC 1/2 of 0.28 at 2.35 Å between observed and calculated E values (Schneider & Sheldrick, 2002). Data to 2.5 Å resolution (anomalous CC 1/2 of 0.35) were used for the substructure search, which located all seven Se atoms. The atomic model was completed automatically with ARP/wARP with starting phases generated by SHELXE. The autotraced model was then completed through iterative cycles of manual model building and refinement using REFMAC5 in the CCP4 suite (Murshudov et al., 2011;Langer et al., 2008;Winn et al., 2011) and Coot (Emsley et al., 2010), respectively. The final refinement statistics are reported in Table 4. The final electron density was of high quality for the complete polypeptide chain except for the loop region formed by residues 90-96 (PDB entry 5mlt). The structure was visualized with PyMOL (http:// www.schrodinger.com/pymol).

SP0092 oligomerization state
Although the majority of SBPs are monomeric in solution, a few cases of higher order oligomerization states have been detailed (Schumacher et al., 1994(Schumacher et al., , 2004Friedman et al., 1995;Ramseier et al., 1993). Following the observation of multiple elution peaks from size-exclusion chromatography, we measured the absolute molar mass of purified SP0092 39-491 samples by multiangle light scattering (MALS). At least four different states were detected with good agreement to the theoretical molecular weights of SP0092 39-491 monomer, dimer, trimer and tetramer species of 49.4, 97.0, 140.8 and 187.2 kDa, respectively (Fig. 1a). To investigate whether the oligomerization is dependent on protein concentration, we analysed the gel-filtration elution profile of the monomeric and oligomeric samples at different dilutions. From this analysis, although the main species remained the same at different concentrations, we observed an increase in oligomerization of the monomeric sample at higher concentration (increasing from 10 to 13%); inversely, the monomeric state in the oligomeric sample increased from 14 to 30% of the total amount when diluted (Fig. 1b). This points towards a dynamic equilibrium between the different species that is dependent on protein concentration (Figs. 1c and 1d).

Crystal structure of SP0092
Both the monomeric and oligomeric species of SP0092 39-491 isolated after size-exclusion chromatography were subjected to extensive crystallization trials, but only the latter yielded crystals and enabled the structure of oligomeric SP0092 39-491 to be determined to 1.61 Å resolution (PDB entry 5mlt).
SP0092 39-491 folds similarly to other substrate-binding proteins, presenting two globular / domains linked by a hinge region formed by three loops. The first domain (residues 39-154 and 321-396) is composed of one central -sheet of four strands surrounded by seven -helices, two 3 10 -helices and an additional three-stranded -sheet. The second domain (residues 155-320 and 394-491) consists of a three-stranded -sheet enclosed by eight -helices, two 3 10 -helices and an extra three-stranded -sheet.
The most striking feature of the oligomeric SP0092 39-491 structure is the presentation of a domain-swapped dimer       connecting the swapped and main domains is located at residues Gly366 and Lys367, which are positioned between the 15 and 16 strands. The hinge loop is modelled in well defined electron density (Fig. 2c). Apart from this hinge loop, the overall architecture of the two functional monomeric units is identical. A domain-swapped dimer structure has also been observed in the -keto acid substrate-binding protein TakP (Gonin et al., 2007). However, as of yet, there is no evidence that a domain-swapped dimer is a functional state of these SBPs.

Structural classification of SP0092
The recent structure determination of the fructooligosaccharide substrate-binding protein FusA from S. pneumoniae allowed a new subclass of SBPs to be defined. This structural subclass, annotated as subclass G, allowed the grouping of four SBP structures, including that of FusA. The members of subclass G are characterized by their larger molecular weight, additional structural elements, an enlarged ligand-binding cavity and a regulatory EF-hand-like calciumbinding site (Culurgioni et al., 2016). SP0092 39-491 possesses all of the features characterizing this subfamily apart from the calcium-binding site and shows approximately 24% sequence identity to the other subclass G members (Fig. 3). Independent structural superpositions of domains I and II, which make up the functional SP0092 39-491 monomer, onto the equivalent domains of the other members of subclass G resulted in a maximum root-mean-square deviation of 2.92 Å for both domains of the monomers. The only prominent difference that is observed in the SP0092 39-491 structure, when compared with the other subclass G members, is in the hinge region between the two / domains. In the case of SP0092 39-491 the loop spanning residues 315-319 is reorganized to form an additional helix, 10. This helix is positioned in the central part of the ligand-binding cavity and may play a role in substrate interaction or recognition. Thus, in summary, we propose SP0092 to be a fifth, albeit atypical, member of the structural subclass G of SBPs.

Carbohydrate-binding cavity
Comparison of the SP0092 functional monomeric unit with the other members of subclass G reveals the subunit to be in a closed conformation even though no ligand is bound . This may be a consequence of the domain-swapped dimer structure. Thus, variation in protein concentration may modulate ligand binding through the formation of a domainswapped dimer, which presents a closed SBP monomer conformation.
Despite predictions for the binding of carbohydrates ranging from galactose, mannose and N-acetylmannosamine   (ManNAc) by SP0092, the nature of the carbohydrate ligand still remains unknown (Bidossi et al., 2012). The ligandbinding cavity of SP0092 39-491 extends in volume to 2692 Å 3 , which is comparable to the closed ligand cavity of FusA ($2218 Å 3 ; Fig. 4e). Thus, the structure of SP0092 shows that the SBP has the ability to bind complex oligosaccharides, which extend by at least three sugar moieties.

Closing remarks
The pneumococcus relies solely on carbohydrates as a carbon source, with at least seven ABC transporters encoded in the reference genome strain TIGR4 annotated as carbohydrate importers. Here, we have determined the high-resolution crystal structure of the S. pneumoniae SBP SP0092, which delineates a large substrate-binding cavity and an overall structure which shows that it belongs to the newly described structural subclass G of the SBP family. Further structural analyses of the full complement of carbohydrate substratebinding proteins could aid the investigation of these proteins as potential vaccine candidates and their potential suitability as novel drug-delivery systems (Saxena et al., 2015;Garmory & Titball, 2004;Ahuja et al., 2015).
Note added in proof. During the review of this paper, three entries were released by the PDB describing the SP0092 structure in a monomeric configuration with and without oligosaccharide bound (PDB entries 5swb, 5swa and 5suo).