research communications
High-resolution NMR structures of the domains of Saccharomyces cerevisiae Tho1
aMRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, England
*Correspondence e-mail: mxb@mrc-lmb.cam.ac.uk
THO is a multi-protein complex involved in the formation of messenger ribonuclear particles (mRNPs) by coupling transcription with et al. (2014), PLoS One, 9, e103470]. A number of additional proteins are thought to be involved in the formation of mRNP in yeast, including Tho1, which has been shown to bind RNA in vitro and is recruited to actively transcribed chromatin in vivo in a THO-complex and RNA-dependent manner. Tho1 is known to contain a SAP domain at the N-terminus, but the ability to suppress the expression defects of the hpr1Δ mutant of THO was shown to reside in the RNA-binding C-terminal region. In this study, high-resolution structures of both the N-terminal DNA-binding SAP domain and C-terminal RNA-binding domain have been determined.
processing and export. THO is thought to be formed from five subunits, Tho2p, Hpr1p, Tex1p, Mft1p and Thp2p, and recent work has determined a low-resolution structure of the complex [PoulsenPDB references: Saccharomyces cerevisiae Tho1, SAP domain, 4uzw; C-terminal domain, 4uzx
1. Biological context
The delivery of translationally effective ribonuclear particles (mRNPs) to the cytosol is a complex process in eukaryotes that requires the integration of numerous processes including transcription and processing of pre-mRNA, formation of mRNPs and export from the nucleus (Köhler & Hurt, 2007). A vast array of proteins are involved in these processes and their interactions are carefully controlled to facilitate the delivery of mRNPs to the nuclear pore complex. Loss of control at any point can result in cellular mechanisms degrading mRNPs before they are exported (Houseley et al., 2006).
An essential component of early Saccharomyces cerevisiae is composed of Tho2p, Hpr1p, Tex1p, Mft1p and Thp2p. The exact mechanism by which it function is unknown, but it is thought to bind RNA polymerase II during transcription via its polyphosphorylated C-terminal domain (Meinel et al., 2013). THO has also been shown to bind Yra1p and Sub2p to form a complex known as TREX (Strässer et al., 2002). The THO complex also mediates interactions with several additional proteins to stimulate co-transcriptional recruitment to nascent transcripts (Hurt et al., 2004; Zenklusen et al., 2002). Depletion and/or knockout of individual THO complex components in vivo has revealed that THO is not only involved in biogenesis but also takes part in preserving genome integrity (Aguilera, 2005; Huertas & Aguilera, 2003).
biogenesis is the THO complex, which in the yeastTho1 was identified as a multicopy suppressor of hpr1Δ (Jimeno et al., 2002; Piruat & Aguilera, 1998) and was thought to function in a similar manner to the yeast protein Sub2. Studies revealed that Tho1, like Sub2, can assemble onto the nascent during transcription and that Tho1 and Sub2 can provide alternative pathways for mRNP biogenesis in the absence of a functional THO complex (Jimeno et al., 2006). Null mutants of THO1 did not result in a distinct phenotype and thus the function of Tho1 in vivo remains unclear. However, the ability of Tho1 to suppress hpr1Δ was shown to be located in the RNA-binding C-terminal region. Our study has determined the solution structures of both the N-terminal SAP domain, which in other proteins has been shown to bind to DNA (Göhring et al., 1997), and the C-terminal domain thought to be responsible for RNA binding. The SAP domain contains a helix–extended-loop–helix motif similar to those found in other members of this family and binds to DNA. The C-terminal region adopts a helical fold similar to that of the WHEP RNA-binding domains of metazoan aminoacyl-tRNA synthetases (Cahuzac et al., 2000).
2. Methods and experiments
2.1. Domain architecture of Tho1
The domain architecture of yeast Tho1 was analyzed using JPred (Cuff et al., 1998) and Phyre (Kelley & Sternberg, 2009) to identify regions that are likely to have a discrete fold.
2.2. Expression and purification of Tho1 N-terminal and C-terminal domains
DNA encoding the N- and C-terminal domains of Tho1 were amplified from S. cerevisiae genomic DNA by PCR and cloned into a modified pRSETA (Invitrogen) expression vector that produces proteins fused to the N-terminally His6-tagged lipoyl domain of Bacillus stearothermophilus dihydrolipoamide acetyltransferase. The resulting plasmids were transformed into Escherichia coli C41 (DE3) cells. Cells were grown in 2×TY medium at 37°C to mid-log phase and were induced with 1 mM IPTG. The temperature was reduced to 22°C and the cells were grown for a further 16 h. domains were prepared by growing cells in K-MOPS (Neidhardt et al., 1974) minimal medium containing 15NH4Cl and/or [13C]-glucose. Cells were lysed by sonication and the fusion proteins were purified by Ni2+–NTA The purified proteins were dialyzed overnight in the presence of TEV protease, which cleaves the fusion proteins after the lipoyl domain. A second Ni2+–NTA affinity-chromatography step was carried out to remove the lipoyl domain. The domains were further purified by gel filtration using a HiLoad 26/60 Superdex 75 column (GE Healthcare). The elution volumes of both domains were consistent with their being monomeric. Double-deionized water was used to make the buffer solutions.
2.3. NMR spectroscopy
Protein samples prepared for NMR spectroscopy experiments were typically at 1.5 mM in 90% H2O, 10% D2O containing 20 mM potassium phosphate pH 6.5, 100 mM NaCl, 5 mM β-mercaptoethanol. All spectra were acquired using a Bruker DRX800, DRX600 or DMX500 spectrometer equipped with pulsed field gradient triple resonance at 25°C, and were referenced relative to external sodium 2,2-dimethyl-2-silapentane-5-sulfonate (DSS) for proton and carbon signals or liquid ammonia for those of nitrogen. Assignments were obtained using standard NMR methods using 13C/15N-labelled, 15N-labelled, 10%13C-labelled and unlabelled protein samples (Bax et al., 1991; Englander & Wand, 1987). Backbone assignments were obtained using the following standard set of two-dimensional and three-dimensional heteronuclear spectra: 1H–15N HSQC, HNCACB, CBCA(CO)NH, HNCACO, HNCO, HBHA(CO)NH and 1H–13C HSQC. Additional assignments were made using two-dimensional TOCSY and DQF–COSY spectra. Distance constraints were derived from two-dimensional NOESY spectra recorded with a mixing time of 120 ms. Torsional angle constraints were obtained from an analysis of C′, N, Cα, Hα and Cβ chemical shifts using TALOS (Cornilescu et al., 1999). The stereospecific assignments of Hβ resonances determined from DQF–COSY and HNHB spectra were confirmed by analyzing the initial ensemble of structures. Stereospecific assignments of Hγ and Hδ resonances of Val and Leu residues, respectively, were assigned using a fractionally 13C-labelled protein sample (Neri et al., 1989). Once all NOEs had been assigned and initial structures had been calculated, hydrogen-bond constraints were included for a number of backbone amide protons for which signals were still detected after 10 min in a two-dimensional 1H–15N HSQC spectrum recorded in D2O at 278 K. Candidates for the acceptors were identified using HBPLUS for the hydrogen-bond donors that were identified by the H–D exchange experiments. When two or more candidates for acceptors were found for the same donor in different structures, the most frequently occurring candidate was selected. For hydrogen-bond partners, two distance constraints were used, where the distance (D)H—O(A) corresponded to 1.5–2.5 Å and (D)N—O(A) to 2.5–3.5 Å. The three-dimensional structures of the domains were calculated using the standard torsion-angle dynamics simulated-annealing protocol in CNS v.1.2 (Brunger, 2007). Residual dipolar couplings were measured for proteins aligned in 5% C12E5/1-hexanol. Alignment tensor values for the RDC constraints of the C-terminal domain were calculated using SSIA and the RDC constraints were incorporated in the final round of structure calculations. Structures were accepted where no distance violation was greater than 0.25 Å and where no dihedral angle violations were greater than 5°. The final coordinates have been deposited in the Protein Data Bank (PDB entries 4uzw and 4uzx).
To monitor the interaction of the SAP domain with DNA and RNA, 15N HSQC spectra of 200 µM Tho1 SAP domain were recorded in the presence of 200 µM self-complementary corresponding to either a typical histone cluster scaffold attachment region sequence (5′-AGAAAATAATAAAATAAAACTAGCTATTTTATATTTTTTC-3′) or a random dsDNA sequence (5′-TCCTGATCAGGA-3′). The potential interaction with dsRNA was also measured using the 30-mer dsRNA oligonucleotide 5′-GGACAGCUGUCCCUUCGGGGACAGCUGUCC-3′. The potential interaction of the C-terminal domain with RNA was measured using 15N HSQC spectra recorded for 200 µM C-terminal Tho1 domain in the absence and presence of several 200 µM RNA and DNA including 18-mer polyA, 18-mer polyU, 18-mer polyG, 18-mer polyC, the 30-mer dsRNA 5′-GGACAGCUGUCCCUUCGGGGACAGCUGUCC-3′ and the 20-mer ssRNA 5′-CUUGUACAUAGUUGGCCAUA-3′.
3. Results and discussion
3.1. Cloning and domain-boundary selection
JPred and Phyre both predicted Tho1 to contains two α-helical clusters, with Phyre predicting an additional helical motif at the C-terminus (Fig. 1a). Careful analysis of the sequence and disorder prediction suggested that the additional α-helix predicted by Phyre would be unlikely to form. A 15N HSQC spectrum of a clone comprising residues 51–218 showed no additional resonances in the regions expected for structured residues (Supplementary Fig. S1). Subsequently, a number of clones were created to investigate the structures of the domains.
3.2. NMR assignments and data deposition
Two clones comprising residues 1–50 and residues 119–183 of S. cerevisiae Tho1 were used for NMR assignment and structural analysis. We could assign 98% of the backbone resonances (only the N-terminal amide resonance and the amide N atoms of prolines were unassigned). All of the observable side-chain proton resonances were assigned using a combination of homonuclear and triple-resonance experiments as described in §2.3. The 1HN and 15N resonance assignments for the proteins are shown by the single-letter code followed by the sequence number in the 1H–15N HSQC (Fig. 2).
3.3. Structural studies of the N-terminal SAP domain
The structure of the SAP domain was determined using CNS v.1.2 from NOE, dihedral angle and hydrogen-bond restraints. Owing to the compact nature of the domain, residual dipolar couplings were not measured or used. A summary of all conformational constraints and statistics is presented in Table 1. The ensemble of structures calculated and a cartoon representation of the SAP domain are shown in Figs. 3(a) and 3(b), respectively. The SAP domain is composed of two α-helices (residues 9–19 and 27–42) connected by a structured loop in the helix–extended-loop–helix (HEH) motif typical of this fold. The N-terminus was structured from residue 2 onwards, whereas the C-terminal tail has few medium-range or long-range NOEs and was disordered. The structures of several SAP domains have been determined previously, with most having a role in DNA binding and chromosomal reorganization (Aravind & Koonin, 2000). Comparison with known structures using DALI shows that the Tho1 SAP domain is most similar to the structures of the SAP domains of SARNP (PDB entry 2do1; RIKEN Structural Genomics/Proteomics Initiative, unpublished work) and HNRNPUL1 (PDB entry 1zrj; RIKEN Structural Genomics/Proteomics Initiative, unpublished work), a protein that is also involved in the nuclear export of DNA-binding experiments have revealed that the SAP domain of S. cerevisiae Tho1 has the potential to bind DNA (Jacobsen, 2003), but not dsRNA (Supplementary Fig. S2). The binding of the SAP domain to a random dsDNA 12-mer was investigated (Supplementary Fig. S3) and it was shown to bind in a manner consistent with other SAP domains (Okubo et al., 2004)
|
3.4. Structural studies of the C-terminal domain
CNS v.1.2 was used to determine a high-resolution solution structure of the domain using NOE, dihedral angle, hydrogen-bond and residual dipolar coupling (RDC) constraints. A summary of all conformational constraints and statistics is presented in Table 1. The ensemble of structures calculated and a cartoon representation of the C-terminal domain are shown in Figs. 3(c) and 3(d), respectively. The domain is composed of two antiparallel α-helices (residues 122–141 and 147–162) connected by a structured loop. Each of the antiparallel helices has a hydrophobic face and these faces pack together. The fold is further stabilized by two leucine residues in the C-terminal helix that interact with hydrophobic residues at the N-terminal end of helix 1 and the C-terminal end of helix 2. The N-terminus was structured from residue 120 onwards, whilst the C-terminal tail contained a structured loop and a small α-helix (residues 170–175). The residues after Ser179 contained no medium-range or long-range NOEs. A structure-comparison search using DALI revealed a similarity (r.m.s.d. of 2.7 Å over 50 residues) between the helix–turn–helix motif formed by the first two helices and the fold of the WHEP RNA-binding domain, which is found in multiple copies in a number of higher eukaryotic aminoacyl-transfer RNA synthetases. The C-terminal region of Tho1 has been shown to bind RNA (Jimeno et al., 2006), and whilst there are several conserved positively charged residues in the domain (Fig. 1b), the domain expressed (residues 119–183) exhibited no potential to bind RNA (Supplementary Fig. S4). The domain may still have the potential to bind to RNA, but the exact nature and sequence of the RNA required for binding is unknown. Alternatively, the domain may require the contribution of additional residues of Tho1 that were not included in the expression constructs used for this study.
3.5. Homologues of Tho1
A human protein, CIP29, has been proposed from sequence alignment to be a homologue of yeast Tho1 (Jimeno et al., 2006). CIP29 contains a SAP domain, interacts with DNA, RNA and UAP56, and hence has been thought to have some role in transcription, RNA splicing, RNA export or translation (Aravind & Koonin, 2000; Hashii et al., 2004; Leaw et al., 2004; Sugiura et al., 2007; Dufu et al., 2010). CIP29 was initially reported to be a cytokine-induced protein and has been linked to several cancers (Choong et al., 2001; Fukuda et al., 2002; Hashii et al., 2004; Leaw et al., 2004), although the exact function of CIP29 is unknown. Comparison of the sequences of other members of the CIP29/Tho1 family (Fig. 4a) reveals that the hydrophilic faces of the C-terminal ends of both of the helices in the helix–turn–helix motif are highly conserved. Each helix ends with a glycine residue, which is preceded by a phenylalanine that projects into solvent (Fig. 4b). The residue preceding the phenylalanine and the residues one and two helical turns back from it are also highly conserved as either arginine or lysine. This produces two very similar potential RNA-binding sites at opposite ends of the domain that could, for example, interact with two copies of the same RNA sequence separated by a particular number of bases or specifically orientated within a structural motif. Inspection of the sequences of the C-terminal region of CIP29 shows that it contains a second closely spaced copy of this module, which can be readily identified by the presence of the lysine–arginine–phenylalanine–glycine sequence motif (Fig. 4c). Two copies of this motif are also present in CIP29 homologues from other animal species. The Arabidopsis Tho1 homologue MOS11 (Germain et al., 2010), together with homologous proteins from other plant species, also contains two copies of the motif but appears to lack an N-terminal SAP domain. Given the wide distribution of proteins containing two copies of the domain it is possible that the C-terminal copy has been lost in Tho1, with only the N-terminus of the first helix of the second domain being retained in the form of the small third helix, perhaps because it contributes to the stability of the fold. If this were the case, where both domains are present they would be expected to be orthogonal to each other. As well as binding to all members of the Tho1/CIP29/MOS11 family characterized to date also bind to SUB2/UAP56 DEAD-box RNA helicases. As the C-terminal domain is the only strictly conserved region in this protein family, it may mediate these interactions as well.
4. Conclusions
We report here the solution structures of the N-terminal SAP domain and C-terminal domain of yeast Tho1. The structures of the domains provide potential insight into the structure of related domains in the Tho1/CIP29/MOS11 family of proteins. The location of the DNA-binding site of the Tho1 SAP domain was shown to be similar to that observed in other SAP domains. The putative RNA binding of the C-terminal domain was investigated, although none was detected. Further work will be required to determine exactly which region of yeast Tho1 is responsible for RNA binding (Jimeno et al., 2006). It is possible that RNA binding is mediated by a folding/binding event with a region of Tho1 that was not investigated in this study.
Supporting information
Supporting Information: Supplementary Figures S1-S4. DOI: 10.1107/S2053230X16007597/pq5029sup1.pdf
References
Aguilera, A. (2005). Curr. Opin. Cell Biol. 17, 242–250. CrossRef PubMed CAS Google Scholar
Aravind, L. & Koonin, E. V. (2000). Trends Biochem. Sci. 25, 112–114. CrossRef PubMed CAS Google Scholar
Bax, A., Ikura, M., Kay, L. E., Barbato, G. & Spera, S. (1991). Ciba Found. Symp. 161, 108–119. PubMed CAS Google Scholar
Brunger, A. T. (2007). Nature Protoc. 2, 2728–2733. Web of Science CrossRef CAS Google Scholar
Cahuzac, B., Berthonneau, E., Birlirakis, N., Guittet, E. & Mirande, M. A. (2000). EMBO J. 19, 445–452. CrossRef PubMed CAS Google Scholar
Choong, M. L., Tan, L. K., Lo, S. L., Ren, E.-C., Ou, K., Ong, S.-E., Liang, R. C. M. Y., Seow, T. K. & Chung, M. C. M. (2001). FEBS Lett. 496, 109–116. CrossRef PubMed CAS Google Scholar
Cornilescu, G., Delaglio, F. & Bax, A. (1999). J. Biomol. NMR, 13, 289–302. Web of Science CrossRef PubMed CAS Google Scholar
Cuff, J. A., Clamp, M. E., Siddiqui, A. S., Finlay, M. & Barton, G. J. (1998). Bioinformatics, 14, 892–893. Web of Science CrossRef CAS PubMed Google Scholar
Dufu, K., Livingstone, M. J., Seebacher, J., Gygi, S. P., Wilson, S. A. & Reed, R. (2010). Genes Dev. 24, 2043–2053. CrossRef CAS PubMed Google Scholar
Englander, S. W. & Wand, A. J. (1987). Biochemistry, 26, 5953–5958. CrossRef CAS PubMed Google Scholar
Fukuda, S., Wu, D. W., Stark, K. & Pelus, L. M. (2002). Biochem. Biophys. Res. Commun. 292, 593–600. CrossRef PubMed CAS Google Scholar
Germain, H., Qu, N., Cheng, Y. T., Lee, E., Huang, Y., Dong, O. X., Gannon, P., Huang, S., Ding, P., Li, Y., Sack, F., Zhang, Y. & Li, X. (2010). PLoS Genet. 6, e1001250. CrossRef PubMed Google Scholar
Göhring, F., Schwab, B. L., Nicotera, P., Leist, M. & Fackelmayer, F. O. (1997). EMBO J. 16, 7361–7371. PubMed Google Scholar
Hashii, Y., Kim, J. Y., Sawada, A., Tokimasa, S., Hiroyuki, F., Ohta, H., Makiko, K., Takihara, Y., Ozono, K. & Hara, J. (2004). Leukemia, 18, 1546–1548. CrossRef PubMed CAS Google Scholar
Houseley, J., LaCava, J. & Tollervey, D. (2006). Nature Rev. Mol. Cell Biol. 7, 529–539. Web of Science CrossRef CAS Google Scholar
Huertas, P. & Aguilera, A. (2003). Mol. Cell, 12, 711–721. CrossRef PubMed CAS Google Scholar
Hurt, E., Luo, M. J., Röther, S., Reed, R. & Strässer, K. (2004). Proc. Natl Acad. Sci. USA, 101, 1858–1862. CrossRef PubMed CAS Google Scholar
Jacobsen, J. O. B. (2003). PhD thesis. Centre of Cambridge University. Google Scholar
Jimeno, S., Luna, R., García-Rubio, M. & Aguilera, A. (2006). Mol. Cell. Biol. 26, 4387–4398. CrossRef PubMed CAS Google Scholar
Jimeno, S., Rondón, A. G., Luna, R. & Aguilera, A. (2002). EMBO J. 21, 3526–3535. CrossRef PubMed CAS Google Scholar
Kelley, L. A. & Sternberg, M. J. (2009). Nature Protoc. 4, 363–371. Web of Science CrossRef CAS Google Scholar
Köhler, A. & Hurt, E. (2007). Nature Rev. Mol. Cell Biol. 8, 761–773. Google Scholar
Leaw, C. L., Ren, E. C. & Choong, M. L. (2004). Cell. Mol. Life Sci. 61, 2264–2273. CrossRef PubMed CAS Google Scholar
Meinel, D. M., Burkert-Kautzsch, C., Kieser, A., O'Duibhir, E., Siebert, M., Mayer, A., Cramer, P., Söding, J., Holstege, F. C. P. & Strässer, K. (2013). PLoS Genet. 9, e1003914. CrossRef PubMed Google Scholar
Neidhardt, F. C., Bloch, P. L. & Smith, D. F. (1974). J. Bacteriol. 119, 736–747. CAS PubMed Web of Science Google Scholar
Neri, D., Szyperski, T., Otting, G., Senn, H. & Wüthrich, K. (1989). Biochemistry, 28, 7510–7516. CrossRef CAS PubMed Google Scholar
Okubo, S., Hara, F., Tsuchida, Y., Shimotakahara, S., Suzuki, S., Hatanaka, H., Yokoyama, S., Tanaka, H., Yasuda, H. & Shindo, H. (2004). J. Biol. Chem. 279, 31455–31461. CrossRef PubMed CAS Google Scholar
Piruat, J. I. & Aguilera, A. (1998). EMBO J. 17, 4859–4872. CrossRef CAS PubMed Google Scholar
Poulsen, J. B., Sanderson, L. E., Agerschou, E. D., Dedic, E., Boesen, T. & Brodersen, D. E. (2014). PLoS One, 9, e103470. CrossRef PubMed Google Scholar
Strässer, K., Masuda, S., Mason, P., Pfannstiel, J., Oppizzi, M., Rodriguez-Navarro, S., Rondón, A. G., Aguilera, A., Struhl, K., Reed, R. & Hurt, E. (2002). Nature (London), 417, 304–308. PubMed Google Scholar
Sugiura, T., Sakurai, K. & Nagano, Y. (2007). Exp. Cell Res. 313, 782–790. CrossRef PubMed CAS Google Scholar
Zenklusen, D., Vinciguerra, P., Wyss, J.-C. & Stutz, F. (2002). Mol. Cell. Biol. 22, 8241–8253. CrossRef PubMed CAS Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.