Structure of the first representative of Pfam family PF04016 (DUF364) reveals enolase and Rossmann-like folds that combine to form a unique active site with a possible role in heavy-metal chelation

The crystal structure of the first representative of DUF364 family reveals a combination of enolase N-terminal-like and C-terminal Rossmann-like folds. Analysis of the interdomain cleft combined with sequence and genome context conservation among homologs, suggests a unique catalytic site likely involved in the synthesis of a flavin or pterin derivative.

The crystal structure of Dhaf4260 from Desulfitobacterium hafniense DCB-2 was determined by single-wavelength anomalous diffraction (SAD) to a resolution of 2.01 Å using the semi-automated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG) as part of the NIGMS Protein Structure Initiative (PSI). This protein structure is the first representative of the PF04016 (DUF364) Pfam family and reveals a novel combination of two well known domains (an enolase N-terminal-like fold followed by a Rossmann-like domain). Structural and bioinformatic analyses reveal partial similarities to Rossmannlike methyltransferases, with residues from the enolase-like fold combining to form a unique active site that is likely to be involved in the condensation or hydrolysis of molecules implicated in the synthesis of flavins, pterins or other siderophores. The genome context of Dhaf4260 and homologs additionally supports a role in heavy-metal chelation.

Introduction
To extend the structural coverage of proteins for which the biological function is unknown and cannot be deduced by homology (i.e. domains of unknown function; DUFs), targets were selected from the Pfam (Finn et al., 2008) protein family PF04016 (DUF364). DUF364 homologs are encountered in proteobacteria, firmicutes, actinobacteria, cyanobacteria, thermotogae and a number of archaea. Here, we report the crystal structure of Dhaf4260, the first structural representative of this family, which was determined using the semiautomated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG; http://www.jcsg.org; Lesley et al., 2002) as part of the NIGMS Protein Structure Initiative (PSI). The Dhaf4260 gene of Desulfitobacterium hafniense DCB-2 encodes a protein with a molecular weight of 27.7 kDa (residues 1-251) and a calculated isoelectric point of 5.6. Desulfitobacterium spp. are anaerobic bacteria that are capable of dehalogenating organic compounds and have been studied for their potential in bioremediation processes (Villemur et al., 2006;El Fantroussi et al., 1998).

Protein production and crystallization
Clones were generated using the Polymerase Incomplete Primer Extension (PIPE) cloning method (Klock et al., 2008). The gene encoding Dhaf4260 (UniProt B8FUJ5, see Supplementary Material 1 ) was amplified by polymerase chain reaction (PCR) from D. hafniense DCB-2 genomic DNA using PfuTurbo DNA polymerase (Stratagene) and I-PIPE (Insert) primers (forward primer, 5 0 -ctgtacttccag-ggcATGTGGGAGATCTATGACGCCATGATC-3 0 ; reverse primer, 5 0 -aattaagtcgcgttaTTTTTTTATGGTCACCTTCTGTCCCGCG-3 0 ; target sequence in upper case) that included sequences for the predicted 5 0 and 3 0 ends. The expression vector pSpeedET, which encodes an amino-terminal tobacco etch virus (TEV) proteasecleavable expression and purification tag (MGSDKIHHHHHH-ENLYFQ/G), was PCR-amplified with V-PIPE (Vector) primers (forward primer, 5 0 -taacgcgacttaattaactcgtttaaacggtctccagc-3 0 ; reverse primer, 5 0 -gccctggaagtacaggttttcgtgatgatgatgatgatg-3 0 ). V-PIPE and I-PIPE PCR products were mixed to anneal the amplified DNA fragments together. Escherichia coli GeneHogs (Invitrogen) competent cells were transformed with the V-PIPE/I-PIPE mixture and dispensed onto selective LB-agar plates. The cloning junctions were confirmed by DNA sequencing. Expression was performed in a selenomethionine-containing medium at 310 K with suppression of normal methionine synthesis. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 mg ml À1 and the cells were harvested and frozen. After one freeze-thaw cycle, the cells were sonicated in lysis buffer [50 mM HEPES pH 8.0, 50 mM NaCl, 10 mM imidazole, 1 mM tris(2-carboxyethyl)phosphine-HCl (TCEP)] and the lysate was clarified by centrifugation at 32 500g for 30 min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with lysis buffer, the resin was washed with wash buffer [50 mM HEPES pH 8.0, 300 mM NaCl, 40 mM imidazole, 10%(v/v) glycerol, 1 mM TCEP] and the protein was eluted with elution buffer [20 mM HEPES pH 8.0, 300 mM imidazole, 10%(v/v) glycerol, 1 mM TCEP]. Since prior testing had revealed that the designed protease site in the expression and purification tag did not cleave with TEV protease, protease was not added to the protein preparation. The eluate was buffer-exchanged with crystallization buffer (20 mM HEPES pH 8.0, 200 mM NaCl, 40 mM imidazole, 1 mM TCEP) using a PD-10 column (GE Healthcare) and concentrated to 5 mg ml À1 by centrifugal ultrafiltration (Millipore).
Dhaf4260 was crystallized at 277 K by mixing 200 nl protein solution with 200 nl crystallization solution and equilibrating against 50 ml reservoir volume using the nanodroplet vapor-diffusion method (Santarsiero et al., 2002) with standard JCSG crystallization protocols (Lesley et al., 2002). The crystallization reagent consisted of 1.0 M LiCl and 0.1 M citrate pH 5.0. Ethylene glycol (1,2-ethanediol) was added to the crystal as a cryoprotectant to a final concentration of 20%(v/v). A diamond-shaped crystal of approximate dimensions 100 Â 100 Â 100 mm was harvested at room temperature after 46 d at 277 K and cryocooled in liquid nitrogen. Initial screening for diffraction was carried out using the Stanford Automated Mounting system (SAM; http://smb.slac.stanford.edu/facilities/hardware/SAM/ UserInfo; Cohen et al., 2002) at the Stanford Synchrotron Radiation Lightsource (SSRL, Menlo Park, California, USA). The data were indexed in the hexagonal space group P6 1 .
The oligomeric state of Dhaf4260 in solution was determined using a 1 Â 30 cm Superdex 200 column (GE Healthcare) coupled with miniDAWN static light-scattering (SEC/SLS) and Optilab differential refractive-index detectors (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0, 150 mM sodium chloride and 0.02%(w/v) sodium azide.

Data collection, structure solution and refinement
Single-wavelength anomalous diffraction (SAD) data were collected on beamline BL9-2 at the SSRL at a wavelength corresponding to the peak of a selenium SAD experiment. The data set was collected at 100 K with a MAR 325 CCD detector using the Blu-Ice data-collection environment (McPhillips et al., 2002). The SAD data were integrated and reduced using XDS and scaled and merged with the program XSCALE (Kabsch, 1993). Initial substructure solution was performed with SHELX (Sheldrick, 2008) and phases were refined with SOLVE (Terwilliger & Berendzen, 1999), with a mean figure of merit of 0.24 (0.37 to 2.9 Å ) for ten selenium sites. Density modification and automated model building were performed with RESOLVE (Terwilliger, 2003) and produced a trace for 443 residues (82%) with 424 side chains built and sequence-assigned. Model completion and refinement were performed with Coot (Emsley & Cowtan, 2004) and REFMAC 5.2 (Winn et al., 2003). Refinement included experimental phase restraints in the form of Hendrickson-Lattman coefficients from SOLVE, loose NCS restraints (positional weights 5.0 and thermal weights 10.0) and TLS refinement with one TLS group per chain. Data-reduction and refinement statistics are summarized in Table 1.

Validation and deposition
Analysis of the stereochemical quality of the model was accomplished using AutoDepInputTool (Yang et al., 2004), MolProbity (Davis et al., 2007), SFCHECK v.4.0 (Collaborative Computational Project, Number 4, 1994 and WHAT IF v.5.0 (Vriend, 1990). Protein quaternary-structure analysis was performed using the PISA server (Krissinel & Henrick, 2007). Fig. 1(b) was adapted from an analysis using PDBsum (Laskowski et al., 2005) and all other figures were prepared with PyMOL (DeLano Scientific). Atomic coordinates and experimental structure factors for Dhaf4260 at 2.01 Å have been deposited in the PDB and are accessible under code 3l5o.

Overall structure
The crystal structure of Dhaf4260 ( Fig. 1a) was determined to 2.01 Å resolution using the single-wavelength anomalous dispersion  Table 1 Summary of crystal parameters, data-collection and refinement statistics for Dhaf4260 (PDB code 3l5o).
Values in parentheses are for the highest resolution shell.  (Diederichs & Karplus, 1997). § The number of unique reflections used in refinement is typically slightly less than the total number that were integrated and scaled. Reflections were excluded owing to systematic absences, negative intensities and rounding errors in the resolution limits and unit-cell parameters. } R cryst = P hkl jF obs j À jF calc j = P hkl jF obs j, where F calc and F obs are the calculated and observed structure-factor amplitudes, respectively. † † R free is the same as R cryst but for 5.0% of the total reflections that were chosen at random and omitted from refinement. ‡ ‡ This value represents the total B that includes TLS and residual B components. § § Estimated overall coordinate error (Collaborative Computational Project, Number 4, 1994;Cruickshank, 1999).
(SAD) method. Data-collection, model and refinement statistics are summarized in Table 1. The final model includes two Dhaf4260 protomers [491 residues; molecule A contains residues 1-102 and 110-251 in addition to three residues from the N-terminal expression and purification tag (residues À2A to 0A), and molecule B contains residues 1-102 and 115-251 and five residues from the N-terminal expression and purification tag (residues À4B to 0B)], six ethylene glycol molecules, four imidazole molecules, two chloride ions and 239 water molecules in the asymmetric unit. The electron density was insufficient to model the loop connecting the N-and C-terminal domains (residues 103-109 in molecule A and residues 103-114 in molecule B) and the remainder of the N-terminal expression and purification tags (residues À18 to À3 in molecule A and À18 to À5 in molecule B). Side-chain atoms from Phe(À2), Phe44, Glu45, Thr46, Arg47, Gln53, Gln90, Asp101, Glu135, Leu137, Arg194, Lys223, Lys237 and Lys239 in chain A and Leu(À4), Tyr(À3), Gln(À1), Ser100, Asp101, SeMet115, Ser116, Gln117, Asn118, Lys121, Lys123, Lys137, Glu153, Lys237, Lys239 and Lys250 in chain B were omitted owing to weak electron density. The Matthews coefficient (V M ; Matthews, 1968) was 2.6 Å 3 Da À1 and the estimated solvent content was 53.2%. The Ramachandran plot produced by MolProbity (Davis et al., 2007) showed that 97.5% of the residues were in favored regions and 99.8% were in allowed regions. The single outlier, Gln117 from chain B, was located in a region of poor electron density.
Dhaf4260 is a two-domain + protein (Fig. 1). SCOP describes the N-terminal domain (residues 1-102) as adopting an enolase N-terminal domain-like fold (http://scop.mrc-lmb.cam.ac.uk/scop/ data/scop.b.e.bca.A.A.html) characterized by three helices (H1-H3) with up-down-up topology and a three-stranded antiparallel -sheet (1-3). The C-terminal domain (residues 110-251) adopts a Rossmann-like fold that is described in SCOP as PLP-dependent transferase-like (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.jg.html). The typical NAD(P)-binding Rossmann fold is characterized by a three-layer // sandwich structure with a parallel sheet and a 321456 topology, but does not contain the additional antiparallel strand observed in Dhaf4260 (strand order 3214567). The PLPdependent transferase-like fold is characterized by a similar sandwich that contains a seven-stranded mixed -sheet (4-10 in Dhaf4260) with the seventh strand (10 in Dhaf4260) antiparallel to the rest. However, there is only partial congruence between the sheet topology of the PLP-dependent transferase-like fold (strand order 3245671) and that observed in Dhaf4260. Further, the lysine to which the co-factor is linked in the PLP-dependent transferase-like fold is absent in Dhaf4260-like proteins.  A search of intact Dhaf4260 with FATCAT (Ye & Godzik, 2004) indicates that the strongest structural similarity is to precorrin-8w methyltransferases [PDB codes 1f38 (Keller et al., 2002) and 2yxd (B. Padmanabhan, Y. Bessho & S. Yokoyama, unpublished work)], with C r.m.s.d.s of 3.1 and 3.2 Å over 170 and 173 residues, respectively (sequence identity of 9%) for these Rossmann-like methyltransferases involved in the anaerobic pathway of cobalamin (vitamin B 12 ) biosynthesis (Scott & Roessner, 2002;Keller et al., 2002). The similarity maps to the C-terminal domain of Dhaf4260 and involves both fold and topology, with the exception of the last two strands, which are inverted (the strand order is 3214567 for Dhaf4260 and 3214576 for the precorrin methyltransferases). Other differences include an extra helix between precorrin methyltransferase strands 2 and 3 (equivalent to Dhaf4260 strands 5 and 6), the addition of Dhaf4260 helices H12 and H13, which are replaced by a long hairpin that is involved in tetramerization and ligand binding in the precorrin methyltransferases (Keller et al., 2002), and an additional Dhaf4260 helix H11 in the loop between strands 8 and 9 (strands 5 and 6 in the precorrin methyltransferases; Fig. 2a). In addition, a similar mode of tetramerization is not possible in Dhaf4260 because the corresponding interface is involved in interactions with the N-terminal domain.
The N-terminal domain of Dhaf4260 shows strong structural similarity to the N-terminal domain of enolase (PDB code 4enl; Lebioda et al., 1989), with a C r.m.s.d. of 1.9 Å over 60 residues, but a sequence identity of only 8%. Structural differences involve an extra N-terminal helix (H1) in Dhaf4260 and different orientations of helices H3 and H5 (Fig. 2b). A weak similarity of this domain to several RNA-binding proteins was also observed, including ribosomal protein L22 (PDB code 1bxe; Unge et al., 1998; C r.m.s.d. of 3.1 Å over 61 residues with 10% sequence identity) and a doublestranded RNA-specific editase (PDB code 1di2; Ryter & Schultz, 1998; C r.m.s.d. of 2.1 Å over 101 residues and a sequence identity of 7%). Although in both of these latter cases the -sheet and long Stereo ribbon diagram showing the structural superposition of (a) the C-terminal domain of Dhaf4260 (PDB code 3l5o; residues 110-251; salmon) and precorrin-8w methyltransferase from Methanobacterium thermoautotrophicum (MT0146; PDB code 1f38; residues 1-186; gold) and (b) the N-terminal domain of Dhaf4260 (PDB code 3l5o; residues 1-102; blue) and the enolase N-terminal domain from Saccharomyces cerevisiae (PDB code 4enl; residues 1-139; gray). The precorrin methyltransferase and enolase regions implicated in oligomerization and substrate binding are indicated. central helix (H4) systematically superimpose well, along with one or two of the outer helices (H2, H3), the connectivity is different, limiting the scope for functional inference.
In bacteria, the enolase N-terminal-like fold is found in a number of epimerases and racemases that catalyze stereochemical inversion in biological molecules. The enolase superfamily, which comprises mandelate racemase (MR), muconate-lactonizing enzyme (MLE) and enolases, is a group of functionally related enzymes each of which is organized into two domains: a substrate-specificity-determining capping N-terminal domain followed by a TIM barrel that contains the metal-ion ligands and acid/base catalysts at the C-terminal ends of the -strands (Gerlt & Babbitt, 2001). The long 3-H1 loop that connects the third strand to the first helix closes onto the active site upon substrate binding (Fig. 2b). The corresponding loop in Dhaf4260 (3-H2) is much shorter. Many enolases are dimers and the dimerization interface is conserved among prokaryotes and eukaryotes, where dimerization is proposed to play a role in promoting subunit stability (Kü hnel & Luisi, 2001). Some of the residues involved in dimerization are from the N-terminal domain. For example, residues from the first two strands of the enolase N-terminal domain -sheet and residues preceding the H1 helix interact with residues from the enolase C-terminal domain in the adjacent protomer. In Dhaf4260 such an oligomerization mode is not possible because the C-terminal domain is not the same.
Size-exclusion chromatography of Dhaf4260 in combination with static light scattering indicates a mixture of oligomerization states, with a tetramer being the predominant quaternary form. However, crystal-packing analysis of the Dhaf4260 structure only supports a monomer or dimer and did not identify any higher order oligomeric state in this crystal form. This discrepancy between the oligomerization state in solution and in the crystal could arise from the crystallization selecting monomeric or dimeric states from the observed mixture of states in solution, or the crystallization conditions could alter the distribution of states observed. The presence of the 19residue N-terminal expression and purification tag might also alter the oligomerization state relative to the wild-type protein. Thus, these results are inconclusive as to the true nature of the biologically relevant oligomeric state of this protein.

A unique catalytic site
A search of the N-terminal domain of Dhaf4260 against the Pfam database using the remote protein homology-detection server HHPred (Soding et al., 2005) produced weak hits with a ribosomal RNA methyltransferase family (PF07091; P-value 0.0023 over Dhaf4260 residues 3-26, probability 0.10) and a family of RNA polymerase II-associated proteins (PF08620; P-value 0.0069 over Dhaf4260 residues 40-88, probability 0.07). The C-terminal domain showed significant homology with PF03446 (P-value 9.5 Â 10 À5 over Dhaf4260 residues 123-204, probability 0.91), PF02826 (P-value 7.2 Â 10 À5 over residues 119-200, probability 0.81) and PF00670 (P-value 3.3 Â 10 À5 over residues 120-200, probability 0.80). All three families contain NAD-binding domains, with PF00670 being a member of a family of S-adenosyl-l-homocysteine (SAH) hydrolases, which are B 12 -dependent enzymes of the activated methyl cycle. Residues that are conserved among all three families and DUF364 are Gly129, Glu148, Thr174 and Asp180 (the numbering is for Dhaf4260). Residues that additionally show high conservation among Dhaf4260 homologs include Gly37, Gly39, Arg42, Asn83, Thr133 and Thr182. Mapping of these residues that are conserved in DUF364 homologs onto the structure of Dhaf4260 shows that they cluster inside a deep pocket ($660 Å 3 ) in the interface between the enolaselike and Rossmann-like domains (Fig. 3a), suggesting that this region serves as an active site and that DUF364 homologs function as enzymes.
An aspartate or glutamate residue that interacts with the hydroxyl groups of the ribose is the most highly conserved feature of adenosyl (e.g. ATP, NAD and S-adenosyl-l-methionine) binding sites (Carugo & Argos, 1997). Asp62 could fulfill this role in MT0146 and is superimposable with Glu148, which is strictly conserved among Dhaf4260 homologs (Fig. 3b). Other similarities to Rossmann-like folds (Burroughs et al., 2006) involve the presence of highly conserved polar residues (Thr174, Asp180 and Thr182) in the two orthogonal helices downstream of strand 4 of the Rossmann-like The interdomain pocket forms a unique catalytic site. (a) Surface representation of the Dhaf4260 domain interface colored by sequence conservation according to ConSurf (Landau et al., 2005). High conservation among DUF364 homologs is indicated in maroon and low conservation is indicated in turquoise. A docked S-adenosyl-l-homocysteine (SAH) molecule is shown in ball-and-stick representation. Docking was based on its superposition with MT0146 (PDB code 1l3i; Keller et al., 2002). (b) Ribbon representation of Dhaf4260 in the same orientation as in (a). Highly conserved Dhaf4260 residues are shown in ball-and-stick representation and are labeled.
fold (equivalent to strand 7 in Dhaf4260) and a glycine followed by a hydrophobic or aromatic residue (Gly129 and His130) in the classical loop position between strand 1 and helix 1 of the Rossmann-like fold (equivalent to strand 4 and helix H7 in Dhaf4260) (Figs. 1 and 3b).
The GGSGG loop that completes the precorrin binding site and is implicated in binding S-adenosyl-l-methionine (SAM) through an induced-fit mechanism is absent from Dhaf4260, suggesting a different ligand and a different reaction mechanism. A GXG-type signature is observed in a different loop (Gly37, Gly39 and Arg42) bordering one side of the adenine base, with Arg42 ( Fig. 3b) possibly engaged in a similar hydrophobic packing interaction as Arg63 in MT0146. In addition, the Dhaf4260 pocket is both narrower and longer than in Rossmann-like methyltransferases such as MT0146, suggesting that it catalyzes the modification of a longer substrate or the condensation of two molecules.
A chloride ion is present in this cleft in both molecules in the asymmetric unit and is coordinated by the backbone amide of Trp149 and solvent. This chloride-binding site is in a similar location to that of the adenosyl ring of the SAH bound in the MT0146 structure. Since the chloride only makes a single protein contact within the pocket and given the high chloride concentration in the crystallization reagent (1 M), this interaction is not likely to be functionally relevant.
The corrin ring (four pyrrole subunits) that comprises the core of vitamin B 12 is chemically similar to the porphyrin found in hemes, but one of the bridging methylene groups is removed. Uroporphyrinogen III is an intermediate in the biosynthesis of vitamin B 12 and also of heme, siroheme, chlorophylls and factor F430 (Scott & Roessner, 2002). Hence, all ligands predicted for Dhaf4260 share chemical and structural similarity with flavin or pterin derivatives.

Genome-context analysis
The genome context (http://string.embl.de) of DUF364 homologs shows a high degree of confidence in a predicted functional association with a number of proteins involved in the transport and chelation of rare metals such as iron (WS1133), tungstate (MTH926), vanadium (RPA1384 and RPA1385) and molybdate (MTH924, Mbar_A1307 and amb0153), as well as transcriptional regulators (e.g. TetR, LysR, TraR/DksA, CrcB, MerR and PadR) involved in the chemical stress response. Gene neighborhood association with ABC transporters (including both ATPase and membrane-spanning permease subunits) are found with a wide phylogenetic distribution in prokaryotic homologs, suggesting that DUF364 enzymes predominantly act on a soluble substrate, which is likely to be a heavy metal that is transported by these systems. In this context, DUF364 homologs could function in the condensation or hydrolysis of specific side chains in the synthesis of derivatives of flavins, pterins or similar compounds (e.g. siderophores) that might serve to chelate these metals.
The Dhaf4260 protein family DUF364 (PF04016) contains around 165 homologs that are mostly found in cyanobacteria, actinobacteria, thermotogae and proteobacteria, but are also found in firmicutes and a range of archaea; all of these proteins are approximately 230 residues in length. The availability of further DUF364 member sequences and structures might shed light on the evolutionary history of this intriguing protein family. The information presented here, in combination with further biochemical and biophysical studies, should yield valuable insights into the functional role of Dhaf4260. Models for Dhaf4260 homologs can be accessed at http://www1.jcsg.org/ cgi-bin/models/get_mor.pl?key=3l5oA.

Conclusions
The first structural representative of the DUF364 family reveals a novel two-domain organization in which an enolase N-terminal-like fold combines with a C-terminal Rossmann-fold-like domain to form