The structure of KPN03535 (gi|152972051), a novel putative lipoprotein from Klebsiella pneumoniae, reveals an OB-fold

KPN03535 is a protein unique to K. pneumoniae. The crystal structure reveals that KPN03535 represents a novel variant of the OB-fold and is likely to be a DNA-binding lipoprotein.


Introduction
is an orphan protein that is exclusively found in Klebsiella pneumoniae MGH 78578 (an opportunistic human pathogen belonging to enterbacteriales of gammaproteobacteria; Galperin et al., 2007;Gill et al., 2006;Frank & Pace, 2008;Ley et al., 2008) and K. pneumoniae 342 (three-residue substitution). It consists of 132 residues with a calculated pI of 9.40 and a predicted signal peptide. The N-terminus of KPN03535 has a lipoprotein signature, indicated by the presence of an LSGC motif (von Heijne, 1989), as well as predictions from LipoP 1.0 (Juncker et al., 2003). It is a singleton protein that has not been assigned to any Pfam family, but sequence-based fold-prediction methods (Ginalski et al., 2003) suggest similarity to members of the PF01336 family (OB-fold nucleic acid-binding domain). We determined the crystal structure of KPN03535 in order to explore this extremely divergent member of the commonly occurring OB-fold. Structural comparisons show similarities to the OB-fold-containing Cpx-pathway protein NlpE, single-stranded DNA-binding (SSB) proteins, bacterial OB-fold (BOF) and toxin proteins, which enable inferences about function that may now be tested biochemically. This structure should serve as a basis for understanding structure-function relationships in any newly discovered proteins with a similar sequence, such as those identified by whole microbial genome sequencing and metagenomic surveys of the human microbiome.

KPN03535 expression, purification and crystallization
Clones were generated using the Polymerase Incomplete Primer Extension (PIPE; Klock et al., 2008) cloning method. The gene encoding KPN03535 (gi|152972051; Swiss-Prot A6TEE6) was amplified by polymerase chain reaction (PCR) from K. pneumoniae MGH 78578 genomic DNA using PfuTurbo DNA polymerase (Stratagene) and I-PIPE (Insert) primers (forward primer, 5 0 -ctgtac-ttccagggcGCTTCTAAAGCCTTTTATTCCGCGGGAG-3 0 ; reverse primer, 5 0 -aattaagtcgcgttaTTTAACCACCTTGGGATTCTGTAGC-GTC-3 0 ; target sequence in upper case) that included sequences for the predicted 5 0 -and 3 0 -ends. The expression vector, pSpeedET, which encodes an amino-terminal tobacco etch virus (TEV) proteasecleavable expression and purification tag (MGSDKIHHHHHHEN-LYFQG), was PCR-amplified with V-PIPE (Vector) primers (forward primer, 5 0 -taacgcgacttaattaactcgtttaaacggtctccagc-3 0 ; reverse primer, 5 0 -gccctggaagtacaggttttcgtgatgatgatgatgatg-3 0 ). V-PIPE and I-PIPE PCR products were mixed to anneal the amplified DNA fragments together. Escherichia coli GeneHogs (Invitrogen) competent cells were transformed with the V-PIPE/I-PIPE mixture and dispensed on selective LB-agar plates. The cloning junctions were confirmed by DNA sequencing. Using the PIPE method, the gene segment encoding residues Met1-Leu22 was deleted for expression of soluble protein as these residues were initially predicted to correspond to either a signal peptide using SignalP (Bendtsen et al., 2004) or transmembrane helices using TMHMM-2.0 (Krogh et al., 2001). Expression was performed in selenomethionine-containing medium. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 mg ml À1 and the cells were harvested and frozen. After one freeze-thaw cycle, the cells were homogenized in lysis buffer [50 mM HEPES pH 8.0, 50 mM NaCl, 10 mM imidazole, 1 mM tris(2-carboxyethyl)phosphine-HCl (TCEP)] and the lysate was clarified by centrifugation at 32 500g for 30 min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) preequilibrated with lysis buffer, the resin was washed with wash buffer [50 mM HEPES pH 8.0, 300 mM NaCl, 40 mM imidazole, 10%(v/v) glycerol, 1 mM TCEP] and the protein was eluted with elution buffer  Table 1 Crystallographic data and refinement statistics for KPN03535 (PDB code 3f1z).
Values in parentheses are for the highest resolution shell. i jI i ðhklÞ À hIðhklÞij= P hkl P i I i ðhklÞ. ‡ Typically, the number of unique reflections used in refinement is slightly less than the total number that were integrated and scaled. Reflections are excluded owing to systematic absences, negative intensities and rounding errors in the resolution limits and unit-cell parameters. § R cryst = P hkl jF obs j À jF calc j = P hkl jF obs j, where F calc and F obs are the calculated and observed structure-factor amplitudes, respectively. R free is as for R cryst , but for 5.1% of the total reflections chosen at random and omitted from refinement. } This value represents the total B that includes TLS and residual B components. † † ESU, estimated overall coordinate error (Collaborative Computational Project, Number 4, 1994;Tickle et al., 1998).

Figure 1
Crystal structure of KPN03535. (a) Stereo ribbon representation of the KPN03535 monomer color-coded from the N-terminus (yellow) to the C-terminus (green). The the same buffer. The flowthrough and wash fractions were combined and concentrated for crystallization trials to 16.1 mg ml À1 by centrifugal ultrafiltration (Millipore). KPN03535 was crystallized by mixing 100 nl protein solution with 100 nl crystallization solution in a sitting drop over a 50 ml reservoir volume using the nanodroplet vapordiffusion method (Santarsiero et al., 2002) with standard Joint Center for Structural Genomics (JCSG; http://www.jcsg.org) crystallization protocols (Lesley et al., 2002). The crystallization reagent contained 31% polyethylene glycol 600 and 0.1 M CHES pH 9.5. No further cryoprotectant was added to the crystal. A cube-shaped crystal with approximate dimensions 80 Â 80 Â 80 mm was harvested after 42 d at 293 K for data collection. Initial screening for diffraction was carried out using the Stanford Automated Mounting system (SAM;  at the Stanford Synchrotron Radiation Lightsource (SSRL; Menlo Park, California, USA). The diffraction data were indexed in the orthorhombic space group P2 1 2 1 2 1 . The molecular weight and oligomeric state were determined using a 1 Â 30 cm Superdex 200 column (GE Healthcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0, 150 mM NaCl and 0.02%(w/v) sodium azide.

X-ray data collection and structure determination
Single-wavelength anomalous diffraction (SAD) data were collected to 2.46 Å resolution on beamline 9-2 at SSRL at the wavelength corresponding to the peak ( 1 ) of a selenium absorption edge using the Blu-Ice data-collection environment (McPhillips et al., 2002). A data set was collected at 100 K using a MAR Mosaic 325 CCD detector (Rayonix USA). The SAD data were integrated and reduced using MOSFLM (Leslie, 1992) and scaled with the program SCALA (Collaborative Computational Project, Number 4, 1994). Phasing was performed with SHELXD (Sheldrick, 2008) and auto-SHARP (Vonrhein et al., 2007) [20 selenium sites per asymmetric unit, overall FOM (acentric/centric) = 0.34/0.12, overall phasing power (anomalous differences) = 1.2] and automated iterative model building was performed with RESOLVE (Terwilliger, 2003). Model completion and crystallographic refinement were performed with Coot (Emsley & Cowtan, 2004) and REFMAC5 (Collaborative Computational Project, Number 4, 1994) with TLS (one group per monomer) refinement (Winn et al., 2003) and medium NCS restraints for all chains. Data and refinement statistics are summarized in Table 1.
The quality of the crystal structure was analyzed using the JCSG Quality Control server, which verifies the stereochemical quality of the model using AutoDepInputTool (Yang et al., 2004), MolProbity (Davis et al., 2004) and WHATIF 5.0 (Vriend, 1990), the agreement between the atomic model and the data using SFCHECK 4.0 (Vaguine et al., 1999) and RESOLVE (Terwilliger, 2000), the protein sequence using ClustalW (Thompson et al., 1994), atom occupancies using MOLEMAN2 (Kleywegt, 2000)  pairs. This analysis also evaluates difference in R cryst /R free , expected R free /R cryst and maximum/minimum B values by parsing the refinement log-file and PDB header. Protein quaternary structure analysis was performed using the PISA server (Krissinel & Henrick, 2005). Fig. 1(b) was adapted from an analysis using PDBsum (Laskowski et al., 2005) and all other figures were prepared with PyMOL (DeLano, 2002). Atomic coordinates and experimental structure factors for KPN03535 have been deposited in the PDB under accession code 3f1z. A systematic search for other proteins of similar structure was conducted using several different methods including the DALI server (Holm et al., 2008), the protein structure comparison service SSM at the European Bioinformatics Institute (http://www.ebi.ac.uk/msd-srv/ ssm; Krissinel & Henrick, 2004) and the flexible structure-alignment method FATCAT (Ye & Godzik, 2003).

Overall structure
Residues 1-22 of the full-length protein (1-154) were initially predicted to represent a signal peptide and were removed during cloning. The crystallized protein is comprised of a glycine left after cleavage of the expression and purification tag followed by KPN03535 residues 23-154. The final model contains ten monomers (chains A-J), two PEG molecules (PEG 600 fragments from the crystallization solution) and 323 water molecules in the asymmetric unit. The ten monomers are almost identical in structure and completeness and superimpose extremely well, with pairwise r.m.s.  (Matthews, 1968) is $3.2 Å 3 Da À1 , with an estimated solvent content of $62%. The Ramachandran plot produced by MolProbity (Davis et al., 2004) shows that 98.5% and 100% of amino acids are in the favored and allowed regions, respectively.
Residues 70-154 of the monomer form the OB-fold comprised of a five-stranded -sheet (1, 2, 3, 4 and 5) capped by a short -helix () based on the standard OB-fold nomenclature (Murzin, 1993; Fig. 1a). The capping helix is shorter than those observed in most other OB-fold proteins (Fig. 2). Residues 36-69 constitute three additional -helices (À2, À1 and 0) which are not observed in other structures of the same fold. The curved -sheet forming the -barrel core of the OB-fold is highly conserved in size and structure, while the largest variations are seen in the three loops (L 12 , L 23 and L 45 ) that extend in different directions from the core and are often functionally important.
Crystal-packing and assembly analysis using PISA (Krissinel & Henrick, 2005) supported by analytical size-exclusion chromatography and static light scattering suggest that a monomer is the likely oligomeric state. In the crystal structure, the protein assembles as two stacked pentameric rings, formed by loose interdigitation of the 'finger-like' 1-L 12 -2 structure, with outer and inner diameters of $80 Å and $40 Å , respectively, and a thickness of $40 Å . The buried surface area of each monomer within each pentamer ($540 Å 2 ) and each monomer in the interface between the two pentamers ($600 Å 2 ) is low. The quaternary structure analysis does not suggest sufficiently strong and extensive interactions to enable complex formation in solution, suggesting that these pentamers could be a crystallization artifact. The N-terminus of each monomer extends into the solvent and probably does not have an impact on the oligomerization state. In the absence of any biochemical data, the functional oligomeric state of the protein remains unknown.

Functional hypotheses
3.2.1. NlpE-like. The only other reported bacterial lipoprotein containing an OB-fold is the C-terminal domain of E. coli NlpE (new lipoprotein E), which is an outer membrane lipoprotein in Gramnegative bacteria involved in the envelope stress response in the Cpx pathway. It activates the Cpx, two-component, signal transduction pathway composed of the inner membrane histidine kinase CpxA and the cytoplasmic response regulator CpxR (Raivio & Silhavy, 1997). The Cpx pathway controls the production of the periplasmic protease DegP and other proteins involved in fighting cellular stress Danese et al., 1995;Raivio et al., 1999). Other proteins are also implicated in the regulation of the Cpx pathway. For example, CpxP with an LTXXQ motif (Pfam PF07813) is involved in feedback inhibition of the Cpx pathway . In K. pneumoniae, a periplasmic CpxP-like protein with the LTXXQ motif, KPN03534, is the neighboring gene to KPN03535. Therefore, KPN03535, like KPN03534, may play a role in the Cpx pathway, similar to NlpE. KPN03535 superimposes fairly well on E. coli NlpE (PDB code 2z4i; Hirano et al., 2007;r.m.s.d. = 3.3 Å , 16% sequence identity, Z score 2.3; Fig. 2a). Despite extremely low sequence identity, some residues are conserved in KPN03535 (Arg76, Asp100, Thr105, Lys107, Arg108 and Asn117) from structure-based sequence alignment. However, the functional roles of these residues in NlpE are not known. Surface-exposed charged and aromatic residues on KPN03535 that may be functionally important if this protein binds DNA or RNA (for clarity, the view of the monomer shown here is different from that shown in Fig. 4 and was obtained by a 180 rotation around a horizontal axis followed by a 180 rotation around a vertical axis). Arg83, Arg84 and Lys85 comprise the positive surface region described in Fig. 4 Fig. 2c). Neither NlpE nor the toxins have all three of the N-terminal helices (0, À1, À2) found in KPN03535, but À2 is observed in cholera toxin (3efx) and À1 is observed in BOF protein (1nnx). The capping helix in KPN03535 is shorter than in the toxins and NlpE, although it is similar to that observed in BOF protein. The -strands forming the curved -barrel in all these structures are of similar length, but with differences in the loop sizes that connect the -strands.
3.2.3. Single-stranded DNA-binding protein, SSB-like. Singlestranded DNA-binding proteins (SSBs) also possess OB-folds and are involved in a multitude of cellular functions, such as DNA replication, transcription, recombination, repair, translation, coldshock response and maintenance of telomeres (Theobald et al., 2003;Chase & Williams, 1986;Wold, 1997;Meyer & Laine, 1990;Lohman & Ferrari, 1994;Lohman et al., 1996). KPN03535 is structurally similar to OB-fold SSBs, including E. coli SSB (PDB code 1eyg; Raghunathan et al., 2000;r.m.s.d. 2.7 Å ; 13% sequence identity; Z score 7.0; Fig. 2d), E. coli PriB (PDB code 1v1q; Liu et al., 2004;r.m.s.d. 2.3 Å ; 13% sequence identity, Z score 8.0; Fig. 2e), Thermus thermophilus aspartyl-tRNA synthetase (PDB code 1l0w; Ng et al., 2002;r.m.s.d. 2.6 Å ; 11% sequence identity; Z score 9.0; Fig. 2f) and human mitochondrial SSB (PDB code 3ull; Yang et al., 1997;r.m.s.d.  Comparison of the electrostatic surface potentials of monomers of (a) NlpE, (b) shiga toxin, (c) BOF, (d) E. coli SSB, (e) E. coli PriB, (f) T. thermophilus aspartyl-tRNA synthetase and (g) KPN03535. All the figures are in approximately the same orientation and reflect the surface view that would be presented for oligonucleotide binding, as in tRNA synthetase. The figure reveals that the positively charged surface patch (central blue portion in black circles) on the KPN03535 most closely resembles that of E. coli PriB and is also similar to that seen in aspartyl-tRNA synthetase. In KPN03535, this positively charged patch is formed by Arg83, Arg84 and Lys85. The corresponding conserved residues are Arg17 and Lys18 in PriB and Arg29 in aspartyl-tRNA synthetases, respectively. 2.7 Å ; 8% sequence identity, Z score 7.1). The N-terminal À1 and 0 secondary-structure elements in KPN03535 are partially conserved in aspartyl-tRNA synthetase, but not in the other structures. Many of the loops in OB-fold ssDNA-binding proteins are either involved in interactions with DNA or in quaternary interactions that result in the various oligomeric forms. For example, loop L 45 , which makes the most interactions with DNA in PriB (Huang et al., 2006) and aspartyl-tRNA synthetase, is similar to that of KPN03535, but is much longer in E. coli and in human mitochondrial SSBs. Among the surfaceexposed Arg, Lys and aromatic residues that could be functionally relevant if KPN03535 were to bind DNA or RNA (Fig. 3), Arg84 and Lys85 of KPN03535 are conserved and correspond to Arg17 and Lys18 in PriB, where Lys18 is involved in ssDNA-binding (Huang et al., 2006). Arg83 and Arg99 of KPN03535 are conserved in aspartyl tRNA synthetase as Arg29 (equivalent to Arg28 in the E. coli aspartyl-tRNA synthetase that binds to tRNA; Eiler et al., 1999) and Arg39. Multiple structural alignment of various OB-fold proteins using the POSA method (Ye & Godzik, 2005) suggests that KPN03535 has a closer relationship to tRNA synthetases than to the BOF protein and is most distant from OB-fold toxins.
Analysis of the electrostatic surface potential indicates that KPN03535 most closely resembles PriB and aspartyl-tRNA synthetase (Fig. 4), with a prominent positively charged area similar to the DNA-binding region of these two proteins. Interestingly, this patch is different from that observed in the E. coli SSB, which reflects the known differences in ssDNA-binding modes of SSB and PriB. The basic nature of KPN03535 (pI 9.4) also hints at the possibility of oligonucleotide binding.
In conclusion, the crystal structure of KPN03535 reveals a novel divergent member of the prevalent OB-fold and suggests that it is most likely to be a nucleic acid-binding protein. As for the recently solved structure of MPN554 from Mycoplasma pneumoniae (Das et al., 2007), another novel OB-fold with unknown cellular function but with single-stranded DNA-binding properties, the structure of KPN03535 reveals that further exploration of the functionality of the OB-fold is necessary. Bacterial lipoproteins have many important functions and are potential vaccine candidates (Steere et al., 1998;Myers et al., 2007). K. pneumoniae is an opportunistic pathogen that is prevalent in immunocompromised patients in hospitals and in patients with liver disease (Hidron et al., 2008;Pope et al., 2008). Functional inferences that can be drawn from this crystal structure should now allow focused structure-assisted biochemistry to establish the exact molecular and cellular role for this protein.