High-resolution NMR structures of the domains of Saccharomyces cerevisiae Tho1

In this study, high-resolution structures of both the N-terminal DNA-binding SAP domain and the C-terminal RNA-binding domain of S. cerevisiae Tho1 have been determined.

THO is a multi-protein complex involved in the formation of messenger ribonuclear particles (mRNPs) by coupling transcription with mRNA processing and export. THO is thought to be formed from five subunits, Tho2p, Hpr1p, Tex1p, Mft1p and Thp2p, and recent work has determined a low-resolution structure of the complex [Poulsen et al. (2014), PLoS One, 9, e103470]. A number of additional proteins are thought to be involved in the formation of mRNP in yeast, including Tho1, which has been shown to bind RNA in vitro and is recruited to actively transcribed chromatin in vivo in a THO-complex and RNA-dependent manner. Tho1 is known to contain a SAP domain at the N-terminus, but the ability to suppress the expression defects of the hpr1Á mutant of THO was shown to reside in the RNA-binding C-terminal region. In this study, high-resolution structures of both the N-terminal DNA-binding SAP domain and C-terminal RNA-binding domain have been determined.

Biological context
The delivery of translationally effective ribonuclear particles (mRNPs) to the cytosol is a complex process in eukaryotes that requires the integration of numerous processes including transcription and processing of pre-mRNA, formation of mRNPs and export from the nucleus (Kö hler & Hurt, 2007). A vast array of proteins are involved in these processes and their interactions are carefully controlled to facilitate the delivery of mRNPs to the nuclear pore complex. Loss of control at any point can result in cellular mechanisms degrading mRNPs before they are exported (Houseley et al., 2006).
An essential component of early mRNA biogenesis is the THO complex, which in the yeast Saccharomyces cerevisiae is composed of Tho2p, Hpr1p, Tex1p, Mft1p and Thp2p. The exact mechanism by which it function is unknown, but it is thought to bind RNA polymerase II during transcription via its polyphosphorylated C-terminal domain (Meinel et al., 2013). THO has also been shown to bind Yra1p and Sub2p to form a complex known as TREX (Strä sser et al., 2002). The THO complex also mediates interactions with several additional proteins to stimulate co-transcriptional recruitment to nascent mRNA transcripts (Hurt et al., 2004;Zenklusen et al., 2002). Depletion and/or knockout of individual THO complex components in vivo has revealed that THO is not only involved in mRNA biogenesis but also takes part in preserving genome integrity (Aguilera, 2005;Huertas & Aguilera, 2003).
Tho1 was identified as a multicopy suppressor of hpr1Á (Jimeno et al., 2002;Piruat & Aguilera, 1998) and was thought to function in a similar manner to the yeast protein Sub2. Studies revealed that Tho1, like Sub2, can assemble onto the nascent mRNA during transcription and that Tho1 and Sub2 ISSN 2053-230X can provide alternative pathways for mRNP biogenesis in the absence of a functional THO complex (Jimeno et al., 2006). Null mutants of THO1 did not result in a distinct phenotype and thus the function of Tho1 in vivo remains unclear. However, the ability of Tho1 to suppress hpr1Á was shown to be located in the RNA-binding C-terminal region. Our study has determined the solution structures of both the N-terminal SAP domain, which in other proteins has been shown to bind to DNA (Gö hring et al., 1997), and the C-terminal domain thought to be responsible for RNA binding. The SAP domain contains a helix-extended-loop-helix motif similar to those found in other members of this family and binds to DNA. The C-terminal region adopts a helical fold similar to that of the WHEP RNA-binding domains of metazoan aminoacyl-tRNA synthetases (Cahuzac et al., 2000).

Domain architecture of Tho1
The domain architecture of yeast Tho1 was analyzed using JPred (Cuff et al., 1998) and Phyre (Kelley & Sternberg, 2009) to identify regions that are likely to have a discrete fold.

Expression and purification of Tho1 N-terminal and C-terminal domains
DNA encoding the N-and C-terminal domains of Tho1 were amplified from S. cerevisiae genomic DNA by PCR and cloned into a modified pRSETA (Invitrogen) expression vector that produces proteins fused to the N-terminally His 6tagged lipoyl domain of Bacillus stearothermophilus dihydrolipoamide acetyltransferase. The resulting plasmids were transformed into Escherichia coli C41 (DE3) cells. Cells were grown in 2ÂTY medium at 37 C to mid-log phase and were induced with 1 mM IPTG. The temperature was reduced to 22 C and the cells were grown for a further 16 h. Isotopically labelled domains were prepared by growing cells in K-MOPS (Neidhardt et al., 1974) minimal medium containing 15 NH 4 Cl and/or [ 13 C]-glucose. Cells were lysed by sonication and the fusion proteins were purified by Ni 2+ -NTA affinity chromatography. The purified proteins were dialyzed overnight in the presence of TEV protease, which cleaves the fusion proteins after the lipoyl domain. A second Ni 2+ -NTA affinitychromatography step was carried out to remove the lipoyl domain. The domains were further purified by gel filtration using a HiLoad 26/60 Superdex 75 column (GE Healthcare). The elution volumes of both domains were consistent with their being monomeric. Double-deionized water was used to make the buffer solutions.

NMR spectroscopy
Protein samples prepared for NMR spectroscopy experiments were typically at 1.5 mM in 90% H 2 O, 10% D 2 O containing 20 mM potassium phosphate pH 6.5, 100 mM NaCl, 5 mM -mercaptoethanol. All spectra were acquired using a Bruker DRX800, DRX600 or DMX500 spectrometer equipped with pulsed field gradient triple resonance at 25 C, and were referenced relative to external sodium 2,2-dimethyl-2-silapentane-5-sulfonate (DSS) for proton and carbon signals or liquid ammonia for those of nitrogen. Assignments were obtained using standard NMR methods using 13 C/ 15 Nlabelled, 15 N-labelled, 10% 13 C-labelled and unlabelled protein samples (Bax et al., 1991;Englander & Wand, 1987). Backbone assignments were obtained using the following standard set of two-dimensional and three-dimensional heteronuclear spectra: 1 H-15 N HSQC, HNCACB, CBCA(CO)NH, HNCACO, HNCO, HBHA(CO)NH and 1 H-13 C HSQC. Additional assignments were made using two-dimensional TOCSY and DQF-COSY spectra. Distance constraints were derived from two-dimensional NOESY spectra recorded with a mixing time of 120 ms. Torsional angle constraints were obtained from an analysis of C 0 , N, C , H and C chemical shifts using TALOS (Cornilescu et al., 1999). The stereospecific assignments of H resonances determined from DQF-COSY and HNHB spectra were confirmed by analyzing the initial ensemble of structures. Stereospecific assignments of H and H resonances of Val and Leu residues, respectively, were assigned using a fractionally 13 C-labelled protein sample (Neri et al., 1989). Once all NOEs had been assigned and initial structures had been calculated, hydrogen-bond constraints were included for a number of backbone amide protons for which signals were still detected after 10 min in a twodimensional 1 H-15 N HSQC spectrum recorded in D 2 O at 278 K. Candidates for the acceptors were identified using HBPLUS for the hydrogen-bond donors that were identified by the H-D exchange experiments. When two or more candidates for acceptors were found for the same donor in different structures, the most frequently occurring candidate  To monitor the interaction of the SAP domain with DNA and RNA, 15 N HSQC spectra of 200 mM Tho1 SAP domain were recorded in the presence of 200 mM self-complementary oligonucleotides corresponding to either a typical histone cluster scaffold attachment region sequence (5 0 -AGAAAAT-AATAAAATAAAACTAGCTATTTTATATTTTTTC-3 0 ) or a random dsDNA sequence (5 0 -TCCTGATCAGGA-3 0 ). The potential interaction with dsRNA was also measured using the 30-mer dsRNA oligonucleotide 5 0 -GGACAGCUGUCCCU-UCGGGGACAGCUGUCC-3 0 . The potential interaction of the C-terminal domain with RNA was measured using 15 N HSQC spectra recorded for 200 mM C-terminal Tho1 domain in the absence and presence of several 200 mM RNA and DNA oligonucleotides including 18-mer polyA, 18-mer polyU, 18-mer polyG, 18-mer polyC, the 30-mer dsRNA 5 0 -GGA-CAGCUGUCCCUUCGGGGACAGCUGUCC-3 0 and the 20-mer ssRNA 5 0 -CUUGUACAUAGUUGGCCAUA-3 0 .

Cloning and domain-boundary selection
JPred and Phyre both predicted Tho1 to contains two -helical clusters, with Phyre predicting an additional helical motif at the C-terminus (Fig. 1a). Careful analysis of the sequence and disorder prediction suggested that the additional -helix predicted by Phyre would be unlikely to form. A 15 N HSQC spectrum of a clone comprising residues 51-218 showed no additional resonances in the regions expected for structured residues (Supplementary Fig. S1). Subsequently, a number of clones were created to investigate the structures of the domains.

NMR assignments and data deposition
Two clones comprising residues 1-50 and residues 119-183 of S. cerevisiae Tho1 were used for NMR assignment and   Residues with high sequence similarity and identity are shown in closed boxes, with basic, acidic and aliphatic residues coloured blue, red and grey, respectively. structural analysis. We could assign 98% of the backbone resonances (only the N-terminal amide resonance and the amide N atoms of prolines were unassigned). All of the observable side-chain proton resonances were assigned using a combination of homonuclear and triple-resonance experiments as described in x2.3. The 1 H N and 15 N resonance assignments for the proteins are shown by the single-letter code followed by the sequence number in the 1 H-15 N HSQC (Fig. 2).

Structural studies of the N-terminal SAP domain
The structure of the SAP domain was determined using CNS v.1.2 from NOE, dihedral angle and hydrogen-bond restraints. Owing to the compact nature of the domain, residual dipolar couplings were not measured or used. A summary of all conformational constraints and statistics is presented in Table 1. The ensemble of structures calculated and a cartoon representation of the SAP domain are shown in Figs. 3(a) and 3(b), respectively. The SAP domain is composed of two -helices (residues 9-19 and 27-42) connected by a structured loop in the helix-extended-loop-helix (HEH) motif typical of this fold. The N-terminus was structured from residue 2 onwards, whereas the C-terminal tail has few medium-range or long-range NOEs and was disordered. The structures of several SAP domains have been determined previously, with most having a role in DNA binding and chromosomal reorganization (Aravind & Koonin, 2000). Comparison with known structures using DALI shows that the Tho1 SAP domain is most similar to the structures of the SAP domains of SARNP (PDB entry 2do1; RIKEN Structural Genomics/ Proteomics Initiative, unpublished work) and HNRNPUL1 (PDB entry 1zrj; RIKEN Structural Genomics/Proteomics Initiative, unpublished work), a protein that is also involved in the nuclear export of mRNA. DNA-binding experiments have revealed that the SAP domain of S. cerevisiae Tho1 has the potential to bind DNA (Jacobsen, 2003), but not dsRNA ( Supplementary Fig. S2) Two-dimensional 1 H-15 N HSQC spectra of the N-terminal SAP domain (a) and C-terminal domain (b) of S. cerevisiae Tho1 recorded at pH 6.5 and 293 K. The spectra were recorded on a Bruker DRX 500 MHz spectrometer with 1024 and 256 complex points along the t 2 and t 1 dimensions, respectively. The protein concentration was 1.5 mM in 95% H 2 O/5% D 2 O. The peaks are labelled with the single-letter amino-acid code followed by their respective sequence number, as established by sequence-specific assignments of the protein backbone. random dsDNA 12-mer was investigated (Supplementary Fig.  S3) and it was shown to bind in a manner consistent with other SAP domains (Okubo et al., 2004)

Structural studies of the C-terminal domain
CNS v.1.2 was used to determine a high-resolution solution structure of the domain using NOE, dihedral angle, hydrogenbond and residual dipolar coupling (RDC) constraints. A summary of all conformational constraints and statistics is presented in Table 1. The ensemble of structures calculated and a cartoon representation of the C-terminal domain are shown in Figs. 3(c) and 3(d), respectively. The domain is composed of two antiparallel -helices (residues 122-141 and 147-162) connected by a structured loop. Each of the antiparallel helices has a hydrophobic face and these faces pack together. The fold is further stabilized by two leucine residues in the C-terminal helix that interact with hydrophobic residues at the N-terminal end of helix 1 and the C-terminal end of helix 2. The N-terminus was structured from residue 120 onwards, whilst the C-terminal tail contained a structured loop and a small -helix (residues 170-175). The residues after Ser179 contained no medium-range or long-range NOEs. A structure-comparison search using DALI revealed a similarity (r.m.s.d. of 2.7 Å over 50 residues) between the helix-turnhelix motif formed by the first two helices and the fold of the WHEP RNA-binding domain, which is found in multiple copies in a number of higher eukaryotic aminoacyl-transfer RNA synthetases. The C-terminal region of Tho1 has been shown to bind RNA (Jimeno et al., 2006), and whilst there are several conserved positively charged residues in the domain (Fig. 1b), the domain expressed (residues 119-183) exhibited no potential to bind RNA ( Supplementary Fig. S4). The domain may still have the potential to bind to RNA, but the exact nature and sequence of the RNA required for binding is unknown. Alternatively, the domain may require the contribution of additional residues of Tho1 that were not included in the expression constructs used for this study.

Homologues of Tho1
A human protein, CIP29, has been proposed from sequence alignment to be a homologue of yeast Tho1 (Jimeno et al., 2006). CIP29 contains a SAP domain, interacts with DNA, RNA and UAP56, and hence has been thought to have some role in transcription, RNA splicing, RNA export or translation (Aravind & Koonin, 2000;Hashii et al., 2004;Leaw et al., 2004;Sugiura et al., 2007;Dufu et al., 2010). CIP29 was initially reported to be a cytokine-induced protein and has been linked to several cancers (Choong et al., 2001;Fukuda et al., 2002;Hashii et al., 2004;Leaw et al., 2004), although the exact function of CIP29 is unknown. Comparison of the sequences of other members of the CIP29/Tho1 family (Fig. 4a) reveals that the hydrophilic faces of the C-terminal ends of both of the helices in the helix-turn-helix motif are highly conserved. Each helix ends with a glycine residue, which is preceded by a phenylalanine that projects into solvent (Fig. 4b). The residue preceding the phenylalanine and the residues one and two helical turns back from it are also highly conserved as either arginine or lysine. This produces two very similar potential RNA-binding sites at opposite ends of the domain that could, for example, interact with two copies of the same RNA sequence separated by a particular number of bases or specifically orientated within a structural motif. Inspection of the sequences of the C-terminal region of CIP29 shows that it contains a second closely spaced copy of this module, which can be readily identified by the presence of the lysine-arginine -phenylalanine-glycine sequence motif (Fig. 4c). Two copies of this motif are also present in CIP29 homologues from other animal species. The Arabidopsis Tho1 homologue MOS11 (Germain et al., 2010), together with homologous proteins from other plant species, also contains two copies of the motif but appears to lack an N-terminal SAP domain. Given the wide distribution of proteins containing two copies of the domain it is possible that the C-terminal copy has been lost in Tho1, with only the N-terminus of the first helix of the second domain being retained in the form of the small third helix, perhaps because it contributes to the stability of the fold. If this were the case, where both domains are present they would be expected to be orthogonal to each other. As well as binding to mRNA, all members of the Tho1/CIP29/MOS11 family characterized to date also bind to SUB2/UAP56 DEAD-box RNA helicases. As the C-terminal domain is the only strictly conserved region in this protein family, it may mediate these interactions as well.

Conclusions
We report here the solution structures of the N-terminal SAP domain and C-terminal domain of yeast Tho1. The structures of the domains provide potential insight into the structure of related domains in the Tho1/CIP29/MOS11 family of proteins. The location of the DNA-binding site of the Tho1 SAP domain was shown to be similar to that observed in other SAP domains. The putative RNA binding of the C-terminal domain was investigated, although none was detected. Further work will be required to determine exactly which region of yeast Tho1 is responsible for RNA binding (Jimeno et al., 2006). It is possible that RNA binding is mediated by a folding/binding event with a region of Tho1 that was not investigated in this study.