The structure of the first representative of Pfam family PF06475 reveals a new fold with possible involvement in glycolipid metabolism

PA1994, a Pfam PF06475 (DUF1089) family homolog from P. aeruginosa, reveals remote similarities to lipoprotein localization factors and a conserved putative glycolipid-binding site.

The crystal structure of PA1994 from Pseudomonas aeruginosa, a member of the Pfam PF06475 family classified as a domain of unknown function (DUF1089), reveals a novel fold comprising a 15-stranded -sheet wrapped around a single -helix that assembles into a tight dimeric arrangement. The remote structural similarity to lipoprotein localization factors, in addition to the presence of an acidic pocket that is conserved in DUF1089 homologs, phospholipid-binding and sugar-binding proteins, indicate a role for PA1994 and the DUF1089 family in glycolipid metabolism. Genome-context analysis lends further support to the involvement of this family of proteins in glycolipid metabolism and indicates possible activation of DUF1089 homologs under conditions of bacterial cell-wall stress or host-pathogen interactions.

Introduction
In an effort to extend the structural coverage of proteins for which the biological function is unknown and cannot be deduced by homology (i.e. domains of unknown function; DUFs), targets were selected from Pfam protein family PF06745 (DUF1089). DUF1089 homologs are present in pathogenic actinobacteria, burkholderia, firmicutes and lactobacilli. Here, we report the crystal structure of PA1994, the first structural representative of this family, which was determined using the semi-automated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG; http://www.jcsg.org; Lesley et al., 2002) as part of the NIGMS Protein Structure Initiative (PSI; http://www.nigms.nih.gov/Initiatives/PSI/). The PA1994 gene of Pseudomonas aeruginosa, an opportunistic human pathogen (Gomez & Prince, 2007), encodes a protein with a molecular weight of 21.6 kDa (residues 1-187) and a calculated isoelectric point of 4.9.
We show that global and local structural and chemical similarities to lipid-binding proteins suggest the involvement of PA1994 with the bacterial membrane, while genome-context analysis supports a role for the DUF1089 family in glycolipid metabolism that is likely to be triggered under conditions of osmotic stress or host-pathogen interactions. These structural insights should help to guide future functional studies.

Protein production and crystallization
Clones were generated using the Polymerase Incomplete Primer Extension (PIPE) cloning method (Klock et al., 2008). The gene encoding PA1994 (GenBank NP_250684; gi:15597190; Swiss-Prot Q912B5) was amplified by polymerase chain reaction (PCR) from P. aeruginosa PA01-LAC genomic DNA using PfuTurbo DNA polymerase (Stratagene) and I-PIPE (Insert) primers (forward primer, 5 0 -ctgtacttccagggcATGAGTCGCGACCGTCTGTACACCT-GGG-3 0 ; reverse primer, 5 0 -aattaagtcgcgttaGAGACGCTGGAAG-AGACCCGGGTAATCG-3 0 ; target sequence in upper case) that included sequences for the predicted 5 0 and 3 0 ends. The expression vector pSpeedET, which encodes an amino-terminal tobacco etch virus (TEV) protease-cleavable expression and purification tag (MGSDKIHHHHHHENLYFQ/G), was PCR-amplified with V-PIPE (Vector) primers (forward primer, 5 0 -taacgcgacttaattaactcgtttaaacggtctccagc-3 0 ; reverse primer, 5 0 -gccctggaagtacaggttttcgtgatgatgatgatgatg-3 0 ). V-PIPE and I-PIPE PCR products were mixed to anneal the amplified DNA fragments. Escherichia coli GeneHogs (Invitrogen) competent cells were transformed with the V-PIPE/I-PIPE mixture and dispensed onto selective LB-agar plates. The cloning junctions were confirmed by DNA sequencing. Expression was performed in selenomethionine-containing medium with suppression of normal methionine synthesis. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 mg ml À1 and the cells were harvested and frozen. After one freeze-thaw cycle, the cells were sonicated in lysis buffer [50 mM HEPES pH 8.0, 50 mM NaCl, 10 mM imidazole, 1 mM tris(2-carboxyethyl)phosphine-HCl (TCEP)] and the lysate was clarified by centrifugation at 32 500g for 30 min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with lysis buffer, the resin was washed with wash buffer [50 mM HEPES pH 8.0, 300 mM NaCl, 40 mM imidazole, 10%(v/v) glycerol, 1 mM TCEP] and the protein was eluted with elution buffer [20 mM HEPES pH 8.0, 300 mM imidazole, 10%(v/v) glycerol, 1 mM TCEP]. The eluate was bufferexchanged with HEPES crystallization buffer (20 mM HEPES pH 8.0, 200 mM NaCl, 40 mM imidazole, 1 mM TCEP) using a PD-10 column (GE Healthcare) and incubated with 1 mg TEV protease per 15 mg eluted protein. The protease-treated eluate was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with HEPES crystallization buffer and the resin was washed with the same buffer. The flowthrough and wash fractions were combined and concentrated to 11.2 mg ml À1 by centrifugal ultrafiltration (Millipore) for crystallization trials. PA1994 was crystallized using the nanodroplet vapordiffusion method (Santarsiero et al., 2002) with standard JCSG crystallization protocols (Lesley et al., 2002). Sitting drops composed of 200 nl protein solution mixed with 200 nl crystallization solution were equilibrated against a 50 ml reservoir at 277 K for 40 d prior to harvesting. Initial screening for diffraction was carried out using the Stanford Automated Mounting system (SAM; http:// smb.slac.stanford.edu/facilities/hardware/SAM/UserInfo;  at the Stanford Synchrotron Radiation Lightsource (SSRL; Menlo Park, California, USA). The crystallization reagent that produced the PA1994 crystal used for the structure solution contained 5%(v/v) 2-methyl-2,4-pentanediol (MPD; racemic mixture), 10%(w/v) PEG 6000 and 0.1 M HEPES pH 7.5. Ethylene glycol was added to the crystal as a cryoprotectant to a final concentration of 15%(v/v). A rod-shaped crystal with approximate dimensions of 200 Â 20 Â 20 mm was mounted in a nylon loop. The diffraction data were indexed in the monoclinic space group C2 (Table 1). The molecular weight and oligomeric state of PA1994 were determined using a 0.8 Â 30 cm Shodex Protein KW-803 column (Thomson Instruments) pre-calibrated with gel-filtration standards (Bio-Rad).

Data collection, structure solution and refinement
Multiple-wavelength anomalous diffraction (MAD) data were collected at SSRL on beamline BL11-1 at wavelengths corresponding to the inflection ( 1 ), peak ( 2 ) and high-energy remote ( 3 ) of a selenium MAD experiment. The data sets were collected at 100 K with an ADSC Q315 CCD detector using the Blu-Ice data-collection environment (McPhillips et al., 2002). The MAD data were integrated and reduced using XDS and then scaled with the program XSCALE (Kabsch, 1993). Phasing was performed with SHELX (Sheldrick, 2008) and AutoSHARP (Bricogne et al., 2003), which resulted in a mean figure of merit of 0.15 with four selenium positions. Two were high occupancy, corresponding to the main selenium positions at residues A143 and B143, whereas the others were low occupancy (20% relative to the primary site), corresponding to an alternate conformation of residue 143 in each monomer (<4.7 Å from the primary site). It should be noted that the presence of only one ordered SeMet site (two conformations) per 188 residues in the protein chain sufficed for successful phasing and model building. Automated model building was performed with ARP/wARP (Cohen et al., 2004) and model completion and refinement were performed with Coot (Emsley & Cowtan, 2004) and REFMAC 5.2 (Winn et al., 2003). Refinement included phase restraints from AutoSHARP and TLS refinement with two TLS groups per chain as suggested by the TLSMD server (Painter & Merritt, 2006). Data reduction and refinement statistics are summarized in Table 1.

Validation and deposition
Analysis of the stereochemical quality of the model was accomplished using AutoDepInputTool (Yang et al., 2004), MolProbity (Davis et al., 2004), SFCHECK 4.0 (Collaborative Computational Project, Number 4, 1994) and WHATIF 5.0 (Vriend, 1990). Protein quaternary-structure analysis was performed using the PISA server (Krissinel & Henrick, 2007). Fig. 1(c) was adapted from an analysis using PDBsum (Laskowski et al., 2005) and all other figures were prepared with PyMOL (DeLano Scientific). Atomic coordinates and experimental structure factors for PA1994 at 1.80 Å resolution have been deposited in the PDB under accession code 2h1t.  Table 1 Summary of crystal parameters, data collection and refinement statistics for PA1994 (PDB code 2h1t).
Values in parentheses are for the highest resolution shell. Typically, a few reflections were also excluded owing to negative intensities and rounding errors in the resolution limits and unit-cell parameters. § R cryst = P hkl jF obs j À jF calc j = P hkl jF obs j, where F calc and F obs are the calculated and observed structure-factor amplitudes, respectively } R free is the same as R cryst but for 5.0% of the total reflections chosen at random and omitted from refinement. † † This represents the total B including both the TLS and residual B components. ‡ ‡ Estimated overall coordinate error (Collaborative Computational Project, Number 4, 1994;Cruickshank, 1999).

Overall structure
The crystal structure of PA1994 (Fig. 1a) was determined to 1.80 Å resolution using the multiple-wavelength anomalous dispersion (MAD) method. Refinement statistics are summarized in Table 1 Crystal structure of PA1994 from P. aeruginosa. (a) Stereo ribbon diagram of the PA1994 monomer color coded from the N-terminus (blue) to the C-terminus (red). Helices remaining after cleavage of the expression and purification tag, for the terminal selenomethionine (residue 1) of chains A and B or for Ser2 and Arg3 in chain B. The side chains of Arg5 and Glu91 in chain B were omitted owing to weak electron density. The Matthews coefficient (V M ; Matthews, 1968) was 2.5 Å 3 Da À1 and the estimated solvent content was 50.1%. A Ramachandran plot produced by MolProbity (Davis et al., 2004) showed that 99.2% of the residues are in favored regions. The two outliers, Pro106 in chains A and B, are actually found in a cis conformation in both chains and have clear electron density.
SCOP (release 1.75) classifies PA1994 as a single-domain protein with a novel fold termed a spiral -roll (http://scop.mrc-lmb. cam.ac.uk/scop/data/scop.b.c.bdb.b.b.b.html), with a 15-strandedsheet wrapped around a central helix (Fig. 1). The N-terminal half of the sheet is formed by strands 3-7 supplemented by a 1-strand exchange from the other monomer in the asymmetric unit (Fig. 1b) that hydrogen bonds extensively to the 3 and the shorter 15 strands (Figs. 1a and 1b). This swapping additionally involves strand 2 and results in a large buried dimerization interface of $3000 Å 2 per monomer. A short -strand (8) and 3 10 -helix H1 separate the first half of the -sheet from the more tightly curved C-terminal region (strands 10-15). Helix H2 and strand 9 are sandwiched between the two halves of the -sheet in the center of the molecule.
PA1994 can be viewed as consisting of two subdomains: the first half of the -sheet (1 0 , 3-8) and helix H1 (residues 1-98) compose the first domain, which packs against the other subdomain consisting of the second half of the -sheet (9-15) and helix H2 (residues 99-187). Both subdomains are present in DUF1089-family members and a sequence analysis of the family indicates a high degree of conservation in the residues that are implicated in stabilizing both regions of the molecule. Stacking interactions, both intermolecular (Trp9-Pro108 0 ) and intramolecular (Trp57-Phe113), show strict or high conservation. Additionally, conserved stacking interactions are observed in residue pairs involving the H2 helix and both the N-terminal (Trp57-Phe113) and the C-terminal (Pro114-Tyr147) halves of the -sheet, as well as the conserved binding-pocket residues (Trp9-Pro108 0 , Trp57-Phe113 and Pro106/Pro108-Phe184; see below).
A search with FATCAT (Ye & Godzik, 2004) identified that the highest structural similarity is with outer membrane proteins (SH3like barrel fold), NTF2-like proteins (cystatin-like fold) and fatty acid-binding proteins (lipocalin fold). DALI (Holm & Sander, 1995) showed significant hits with a number of different folds, including -galactosidase (immunoglobulin-like -sandwich fold), iron-transport proteins (transmembrane -barrel fold), lipovitellin (lipovitellin-phosvitin complex/-sheet shell regions fold), tail-associated lysozyme (phage-tail protein fold) and lipoprotein localization factors (prokaryotic lipoprotein localization factor fold). A search using secondary-structure matching (SSM; Krissinel & Henrick, 2004) identified the lipoprotein localization factor LolA (PDB code 1iwl) as the top hit (Z score 2.5, P score 0.0), although the P score indicates a statistically insignificant match.  Although PA1994 appears to constitute a new fold, we decided to investigate subfold similarities in an attempt to identify shared structural features that could provide insight into the origin and function of PA1994. The highest structural similarity identified by visual inspection was with lipoprotein localization factors A and B (LolA and LolB) from E. coli, which are highly conserved bacterial proteins that are implicated in lipoprotein sorting and membrane localization (Takeda et al., 2003). Superimposition of PA1994 onto LolA, with an r.m.s.d. of 3.1 Å , reveals that these proteins share the same fold and topology over the 11 -strands and the central helix, although the sequence identity over 104 aligned residues is not significant at only 5% (Fig. 2a). Differences within the barrel include PA1994 strands 9-10, which are absent in both lipoprotein localization factors, strand 8 (absent in LolA) and the orientation of the central helix in LolB (Figs. 2a and 2b). Outside the barrel, the main differences involve an additional N-terminal helix in LolA located at the bottom of the -barrel and the LolA C-terminal 3 10 -helix and -strand (Figs. 2a and 2b). Both of these C-terminal structural elements, which are absent in PA1994, are involved in the specific membrane localization of lipoproteins by LolA (Okuda et al., 2008). No strand-swapping is observed in either LolA or LolB, although the N-terminal -strand is present in both cases and overlaps with the swapped strand from the PA1994 dimer.

Analysis of a conserved cavity
An analysis of PA1994 using the CastP server (Binkowski et al., 2003) revealed a deep pocket (15 Â 6 Â 7 Å ) enclosed mainly by helix H2 and strand 7, with additional contributions made by strands 10-12 and the loop between strands 14 and 15. This pocket is lined with conserved hydrophilic residues (Ser107, Thr110, Asn111, Thr112 and Gln145) and contains the hydroxyl group of the invariant Tyr147 in addition to an acidic pocket formed by two invariant aspartates (Asp101 and Asp103; Fig. 3). The pocket is in a similar location to the cavity in LolA that has been shown to bind lipids (Watanabe et al., 2006). However, the binding pocket is hydrophobic in LolA, whereas the PA1994 pocket is acidic, suggesting a more hydrophilic ligand. The entrance to the pocket in PA1994 forms a long and narrow groove (20 Â 7 Å ) composed of strictly or highly conserved hydrophobic residues (Ile102, Pro106, Pro108, Phe165, Leu170 and Ile178) and also involves the dimerization interface (Trp13), suggesting a hydrophobic component of the ligand and the likely requirement of dimerization for binding. Analytical size-exclusion chromatography in combination with static light scattering indicates that PA1994 is a dimer in solution. Two crystallization-reagent molecules (ethylene glycol and MPD) line both the groove and the pocket, indicating that both regions could be implicated in ligand binding (Fig. 3b). Both LolA and PA1994 contain a cis-proline (Pro89 in LolA and Pro106 in PA1994) at the N-terminal end of the central helix. Because of the relatively low energy barrier between trans and cis conformations, cisprolines are often involved in function and have been implicated in both protein stabilization (Truckses et al., 1996) and catalysis (Charbonnier et al., 1999), suggesting that this residue might serve a similar purpose in LolA and PA1994.
Taken together, these structural and chemical similarities support a role for PA1996 and the DUF1089 family in glycolipid binding. The extensive dimerization interface observed in the structure, in addition to the SEC/SLS data, suggest that a dimer is likely to be the biologically relevant oligomeric state of PA1994. The swapped -strands appear to participate in stabilizing the conserved cavity. Substrate binding might induce large-scale conformational changes, as is the case for the lipid-binding proteins that share structural similarities with PA1994 (Marland et al., 2006;Oguchi et al., 2008;Grochulski, Li et al., 1994).

Genome-context analysis
Glycophospholipids, which are implicated in the synthesis of complex cell-wall structures that enable some pathogens to modulate the response by the host immune system, have been suggested to bind to similar-sized acidic pockets as that observed in PA1994 (Marland et  An acidic pocket conserved in the DUF1089 family suggests a ligand-binding site. The PA1994 monomers, colored white and blue, are shown as a ribbon diagram and as a surface representation. Invariant residues (Asp101, Asp103 and Tyr147) are indicated, with the conserved Asn111 located behind the pocket labeled in parentheses. The ethylene glycol (EDO) and MPD molecules that line the entrance to the acidic pocket in the crystal are shown in green. al., 2006). Glycolipids serve as key immunomodulatory molecules in host-pathogen interactions (Nigou et al., 2008) and lipases have been known to act as virulence factors (Smoot, 1997). In addition to their role in pathogenicity, bacterial cell-wall glycolipids are modified in response to variations in temperature, pH and other environmental stressors (Mykytczuk et al., 2007), with changes affecting both the lipid and sugar composition of the membrane (Bengoechea et al., 2002;Tymczyszyn et al., 2005).
The genome context (http://string.embl.de) of DUF1089-family members additionally supports a role in glycolipid biosynthesis which is likely to be induced under conditions of cell-wall stress or hostpathogen interactions. PA1994 is predicted with a high degree of confidence to be in functional association with a peptidyl prolyl cistrans isomerase (PA1996), an enzyme that functions as a chaperone and is up-regulated under conditions of cell-wall stress (Muthaiyan et al., 2008). The prolyl cis-trans isomerase could also assist in the folding of PA1994, as Pro106 appears to be involved in stabilization of both the hydrophobic core and the acidic pocket. Similarly, R02764, a DUF1089 homologue from Sinorhizobium meliloti, is predicted to be functionally linked to a glyceraldehyde 3-phosphate dehydrogenase [R02763, normally a cytosolic enzyme involved in energy metabolism that shows pH-dependent association with bacterial cell walls (Antikainen et al., 2007), where it becomes involved in host-pathogen interactions (Schaumburg et al., 2004)], a transketolase (R02762, an enzyme implicated in lipopolysaccharide metabolism; Eidels & Osborn, 1971) and a taurine-uptake ABC transporter (RB0965; taurine is a constituent of the bacterial cell wall that has been implicated in membrane stabilization and recovery from osmotic shock; Yancey, 2005). MT3862, a DUF1089 homologue from Mycobacterium tuberculosis, is also predicted with high confidence to be in functional association with two osmoprotectant proteins (MT3863 and MT3864) implicated in glycine betaine-dependent transport. In addition to its role in maintaining membrane fluidity, glycine betaine acts as a chemical chaperone (Diamant et al., 2001), stabilizing proteins under conditions of environmental stress.
Availability of more DUF1089-member sequences and structures might shed light on the evolutionary history of this intriguing protein family. The information presented here, in combination with further biochemical and biophysical studies, should yield valuable insights into the functional role of PA1994. Models of PA1994 homologs can be accessed at http://www1.jcsg.org/cgi-bin/models/get_mor.pl? key=2hltA.

Conclusions
The first structural representative of the DUF1089 family reveals a novel fold. Remote global and local similarities to lipid-binding and glycan-binding proteins along with genome-context analysis support a role for PA1994 in glycolipid metabolism that is likely to be induced under conditions of cell-wall stress or host-pathogen interactions.
This work was supported by the National Institute of General Medical Sciences Protein Structure Initiative grant Nos. P50 GM62411 and U54 GM074898. Portions of this research were carried out at the Stanford Synchrotron Radiation Lightsource (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the US Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program and the National Institute of General Medical Sciences). Genomic DNA from P. aeruginosa PA01-LAC (ATCC No. 47085D) was obtained from the American Type Culture Collection (ATCC). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health.