Structural characterization of DynU16, a START/Bet v1-like protein involved in dynemicin biosynthesis

The crystal structure of DynU16, a protein identified in the dynemicin-biosynthetic gene cluster of Micromonospora chersina, was determined using iodide phasing and reveals a di-domain helix-grip fold.


Introduction
Dynemicin A (DYN; Fig. 1) is a bacterial-derived metabolite possessing a ten-membered 1,5-diyn-3-ene core that achieves double-stranded DNA scission via radical-mediated hydrogen abstraction Sluka et al., 1987;Elbaum et al., 1995). An anthraquinone (ANQ) moiety is appended to the core, with the quinone carbonyl O atoms serving as the activation sites for the reaction cascade, resulting in the formation of a phenyl biradical intermediate following core Bergman cyclization (Elbaum et al., 1995;Nicolaou et al., 1988;Sugiura et al., 1990). The DYN-biosynthetic gene cluster was first reported in 2008, illuminating the conserved coreassembling elements encoded in other enediyne pathways (Gao & Thorson, 2008); however, many of the enzymes do not possess substantial homology to proteins in the Protein Data Bank (Berman et al., 2003), raising more questions about DYN bioconstruction. The enediyne core is synthesized by a type I highly reducing iterative polyketide synthase (iPKS), yielding a linear conjugated polyene upon release from the iPKS by a thioesterase (Gao & Thorson, 2008;Ahlert et al., 2002;Liew et al., 2010). It was recently discovered that the iPKS also participates in the synthesis of the ANQ carbon skeleton, revealing the dual functionality encoded in the iPKS (Cohen & Townsend, 2018). The iPKS duality provides additional evidence supporting the notion that the tailoring enzymes govern the carbon-skeleton chemodiversity, as opposed to the iPKS and associated thioesterase (Horsman et al., 2010). The downstream cyclases mediating enediyne-core cyclization have not been identified, and several potential candidates appear to be membrane-bound based on bioinformatic analysis, which impeded their functional and structural characterization. Fortunately, a subset of DYN enzymes share homology with enzymes encoded by aromatic polyketide pathways, which use cytoplasmic mono-domain and di-domain cyclase/aromatases to produce diverse decaketides.
Here, we report, at a resolution of 1.5 Å , the crystal structure of DynU16, a putative di-domain protein homologous to StAR-related lipid-transfer (START/Bet v1-like) domain-containing proteins. Comparative bioinformatics to enediyne-biosynthetic gene clusters reveals homologues in the biosynthetic pathways of anthraquinone-fused enediynes. Moreover, DynU16 shares 30% sequence identity with CalU16, a calicheamicin mono-domain resistance protein; however, DynU16 possesses an additional domain, suggesting divergence in function from the self-sacrificing functionalities (Elshahawi et al., 2014;Singh et al., 2006;Biggins et al., 2003). In search of a di-domain homologue, an aromatase BexL was identified from the BE-7585A aromatic polyketide pathway (Caldara-Festin et al., 2015). In contrast to the type II PKSassociated di-domain cyclase/aromatases (CYC/AROs) BexL and StfQ, the DynU16 structure reveals an extended cavity bridging the domains. With only two reported di-domain structures from type II PKS systems, the DynU16 structure expands our limited knowledge.

Macromolecule cloning, expression and purification
The DynU16 gene was codon-optimized for Escherichia coli expression and synthesized by Integrated DNA Technology (Coralville, Iowa, USA). The DynU16 gene was cloned into a pET-28a-derived expression plasmid via Polymerase Incomplete Primer Extension (PIPE) methods (Klock & Lesley, 2009). The PCR products were digested using the DpnI enzyme, mixed and annealed at 315 K. The annealed DynU16 and expression plasmid was transformed into NEB High-Efficiency E. coli cells (New England BioLabs, Ipswich, Massachusetts, USA) and plated onto kanamycin LB agar plates. Individual colonies were prepared and sequenceverified using Sanger sequencing (Genewiz, South Plainfield, New Jersey, USA). Positive clones were transformed into NEB E. coli BL21(DE3) expression cells.
A single colony was grown in LB medium supplemented with 50 mg ml À1 kanamycin overnight at 310 K. The cultured cells were inoculated into LB medium containing 50 mg ml À1 kanamycin and grown at 310 K until the OD 600 reached $0.6. Protein synthesis was induced with 1.0 mM isopropyl -d-1thiogalactopyranoside (IPTG; GoldBio, St Louis, Missouri, USA) at 291 K for 18 h. The cells were harvested by centrifugation at 5000 rev min À1 at 277 K and stored at 193 K. For purification, the cells were thawed and resuspended in binding buffer consisting of 50 mM HEPES pH 7.5, 200 mM NaCl, 10 mM imidazole. Cell disruption was performed by sonication using a Qsonica sonicator programmed to 40% amplitude with iterative sonications. The lysed cells were separated by centrifugation at 18 000 rev min À1 for 45 min and the supernatant was incubated with 3 ml Nickel-NTA Superflow resin equilibrated with binding buffer (Qiagen, Hilden, Germany) on a rotating platform at 277 K. The protein-bound resin was washed with binding buffer supplemented with 30 mM imidazole and subsequently eluted with 200 mM imidazole. DynU16 was incubated with His-tagged TEV protease for 6 h at room temperature and further purified by reverse-phase nickel IMAC chromatography followed by size-exclusion chromatography on a HiLoad 16/600 column (GE Healthcare, Piscataway, New Jersey, USA). The protein concentration was determined from the absorbance at 280 nm using a calculated molar extinction coefficient of 75 970 M À1 cm À1 ). Macromolecule-production information is summarized in Table 1.

Crystallization and derivatization
DynU16 was concentrated to 15 mg ml À1 using an Amicon Ultra-15 centrifugal filter (Merck KGaA, Darmstadt, Germany) and subjected to high-throughput crystallization screening, which was performed using a Mosquito LCP robot  Table 1 Macromolecule-production information.  Supplementary Fig. S1). The crystals were cryoprotected with 20% glycerol and flashcooled in liquid nitrogen. A heavy-atom derivative was produced by soaking crystals in 1 M potassium iodide for 1.5 min prior to cryoprotection with 10% glycerol and flashcooling.

Data collection and processing
The crystals were sent to the LS-CAT 21-ID-D beamline at the Advanced Photon Source (APS) for screening and data collection. A native data set was collected from a crystal that diffracted to a resolution of 1.5 Å and was indexed in space group P3 1 21. The data were integrated and scaled with XDS (Kabsch, 2010) within the xia2 (Winter, 2010) data-processing pipeline (Table 3). Data were collected from the heavy-atomsoaked DynU16 crystals at a wavelength of 2.07 Å to enhance the anomalous signal from iodide; the crystals diffracted to 2.3 Å resolution.

Structure solution and refinement
DynU16 phases were determined by single isomorphous replacement with anomalous scattering (SIRAS) using SHELXC/D/E (Sheldrick, 2010) for anomalous signal assessment and phase calculations within the HKL2MAP graphical interface (Pape & Schneider, 2004). ARP/wARP was used for initial modeling building (Langer et al., 2008). Coot (Emsley et al., 2010) and phenix.refine (Liebschner et al., 2019) were subsequently used for rebuilding and structure refinement ( Table 4). The structure was visualized using a collaborative 3D graphics system . The super algorithm in PyMOL (version 2.4.1; Schrö dinger) was used for r.m.s.d. calculations. The structural biology software applications used in this project were compiled and configured by SBGrid (Morin et al., 2013). Coordinates and structure factors were deposited in the worldwide Protein Data Bank (Berman et al., 2003) as PDB entry 6v04.

Overall structure
The DynU16 structure was determined at a nominal resolution of 1.5 Å using SIRAS, revealing a di-domain helix-grip fold (Fig. 2a, Table 4). The final model included 269 protein residues, 282 water molecules and five ions. Residues 0 (left from the tag after cleavage) to 8 at the N-terminus and 278-283 at the C-terminus were disordered and were not modeled. The N-terminal domain comprises an antiparallel -sheet, two short -helices and one extended -helix. The N-terminal (residues 1-139) and C-terminal (residues 150-283) domains are connected by a flexible ten-residue linker (140-149). The sequence and structure similarity between the N-and C-terminal domains suggests that the second domain was    acquired through a gene-duplication event. A structure-based backbone-atom alignment of 313 atoms in the N-and C-terminal domains was calculated, giving a root-mean-square atomic position deviation of 2.0 Å with 15% sequence identity over 107 aligned residues ( Supplementary Fig. S2). Assessment of the C-terminal domain suggests divergence from the canonical START/Bet v1 fold observed in the N-terminal domain. The C-terminal domain possesses eight instead of six antiparallel -strands and has adopted additional loops.

Putative cavity
The cavity bridges the N-and C-terminal domains, with approximate dimensions of 22 Â 12 Â 8 Å (Figs. 2b and 2c). One end of the pocket comprises the polar residues Ser153, Ser240, Arg147, Arg227, Arg233, Glu117, Glu149, Asp61, Asp108, Asp151 and Gln120, along with Leu121. The back of the pocket has two clustered cysteines, Cys89 and Cys110, that are not engaged in a disulfide bond and could be involved in substrate stabilization. The opposing region of the cavity is composed of aromatic residues, including Phe57, Phe63, Phe82, Phe128, Trp80, Trp125 and Tyr144. The abundance of aromatic residues localized in one region of the cavity suggests a possible role in carbocation intermediate stabilization if cyclization is catalyzed in the cavity (Dougherty, 1996). The cavity-forming residues frequently reside in loops or -strands, potentially increasing cavity flexibility and enabling contraction and expansion to accommodate substantial substrate rearrangements. The shape and size of the cavity appears to support the binding of a linear or perhaps a cyclic polyene (Fig. 2c). The cavity openings (Fig. 2b) could accommodate either a free polyene or one tethered to an acyl carrier protein (ACP) domain of the iPKS. However, it has not yet been established whether the enediyne core is synthesized attached to the ACP of the iPKS (Cohen & Townsend, 2018) or whether the intermediates remain free after they have been cleaved from the iPKS by the thioesterase (Annaval et al., 2017).

Sequence and structure comparison
Prior to the structural determination of DynU16, a limited number of structures of di-domain START/Bet v1 enzymes had been reported. BexL and StfQ were identified in type II PKS systems and operate on nonreduced and partially reduced decaketide substrates, respectively. BexL possesses aromatase activity localized to the N-terminal domain and a pseudocatalytic C-terminal domain. StfQ possesses a pseudocatalytic N-terminal domain and cyclase activity localized to the C-terminal domain. In the cases of BexL and StfQ, the pseudo-catalytic domains are hypothesized to provide both solubility and structural support (Caldara-Festin et al., 2015). Due to the limited information on the START-domain family, folds reminiscent of the START helix-grip fold were explored. The nuclear transport factor 2 (NTF2)-like cyclase + structure adopted by SnoaL and the isomerase TsrD provide additional evidence for the speculation that DynU16 harbors cyclase or dehydratase functionality ( Fig. 3  nine-membered enediyne antibiotic C-1027 requires SgcJ, which is a monodomain NTF2-like superfamily member (Huang et al., 2016).
The NCBI BLAST platform (Altschul et al., 1997) was used to explore the sequence space of related enzymes (Supplementary Fig. S3). The search identified 20 uncharacterized di-domain homologues with sequence identity equal to or exceeding 30% and 90% coverage, with E values of less than 2 Â 10 À40 . Using ESPript (Robert & Gouet, 2014), an extended motif in the N-terminal domain, RxxFxDGDFF, was identified amongst related family members. Structural mapping of the conserved motif locates it to the connection between -strands 3 and 4 in proximity to the cavity. This region of the cavity aligns with the polyketide-binding location in the bifunctional mono-domain ARO/CYC enzymes TcmN and WhiE (Lee et al., 2012). Assessment of the interactions of the motif with nearby residues identifies a number of salt links between Arg147 and Asp151 and between Arg54 and Asp56 and van der Waals interactions between Phe63 and Trp80, suggesting that this motif may be involved in stabilizing the enzyme during catalysis. Other cyclase systems have been examined for their associated energetics, revealing highly exothermic reactions that could destabilize the enzyme without the proper network of stabilizing forces (Thoma et al., 2004). Additionally, conserved residues were identified in the C-terminal domain, identifying a region that potentially mediates protein-protein interactions within the biosynthetic network. While an actual enzymatic assay is needed to confirm the speculated function of DynU16, we conclude that it is likely that this protein participates in the biosynthesis of dynemicin as a cyclase or perhaps at least as a dehydratase of an extended or cyclic ANQ precursor. under Contract No. DE-AC02-06CH11357. Use of the LS-CAT Sector 21 was supported by the Michigan Economic Development Corporation and the Michigan Technology Tri-Corridor (Grant 085P1000817). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.