research papers
A tetramerization domain in prokaryotic and eukaryotic transcription regulators homologous to p53
aExperiments Division, ALBA Synchrotron Light Source, Carrer de la Llum 2–26, 08290 Cerdanyola del Vallès, Catalunya, Spain, and bCentro de Biología Molecular `Severo Ochoa' (CSIC–UAM), Universidad Autónoma de Madrid, Calle Nicolás Cabrera 1, Canto Blanco, 28049 Madrid, Spain
*Correspondence e-mail: rboer@cells.es
Transcriptional regulation usually requires the action of several proteins that either repress or activate a promotor of an open reading frame. These proteins can counteract each other, thus allowing tight regulation of the transcription of the corresponding genes, where tight repression is often linked to DNA looping or cross-linking. Here, the tetramerization domain of the bacterial gene repressor Rco from Bacillus subtilis plasmid pLS20 (RcopLS20) has been identified and its structure is shown to share high similarity to the tetramerization domain of the well known p53 family of human tumor suppressors, despite lacking clear sequence homology. In RcopLS20, this tetramerization domain is responsible for inducing DNA looping, a process that involves multiple tetramers. In accordance, it is shown that RcopLS20 can form octamers. This domain was named TetDloop and its occurrence was identified in other Bacillus species. The TetDloop fold was also found in the structure of a transcriptional repressor from Salmonella phage SPC32H. It is proposed that the TetDloop fold has evolved through divergent evolution and that the TetDloop originates from a common ancestor predating the occurrence of multicellular life.
PDB reference: tetramerization domain of Rco, 8bny
1. Introduction
Proper gene regulation is essential for every organism to adjust the expression profile of its encoded genes to changing environmental conditions. The mechanisms of transcriptional regulation are very different in prokaryotes and eukaryotes, and are generally more complex in the latter. For example, in eukaryotes the genomes are packed in more sophisticated ways and transcriptional regulators rely less on sequence specificity for DNA response elements (REs; Youssef et al., 2019). The reason for this higher level of sophistication is most likely related with intercellular communication in multicellular organisms, which becomes clear when considering situations in which gene regulation is disturbed and causes disease (Lee & Young, 2013). The well studied human tumor suppressor protein p53 illustrates the importance of proper transcriptional regulation (Vogelstein et al., 2000; Lane & Levine, 2010). Mutations in this protein have been associated with the occurrence of tumorigenesis for over four decades (Rivlin et al., 2011; Perri et al., 2016).
The active control of the conformation of the DNA duplex regulates access to promoter regions and thereby makes an important contribution to gene regulation. An example of this is long-range dsDNA looping, which is not to be confused with the formation of short hairpin loops in ssDNA. In long-range looping, two DNA elements are brought into close proximity by introducing a kink or strong bend in the DNA located between the two elements. DNA looping occurs frequently in both prokaryotes as well as eukaryotes (Cournac & Plumbridge, 2013; Vilar & Saiz, 2005; Morelli et al., 2009). Examples of transcriptional regulation through DNA looping in prokaryotes are found in the metabolic genes ara, gal, lac and deo and in phage systems (for a review, see Matthews, 1992). In addition, the human tumor suppressor protein p53, which represents a large family of homologs in metazoa, induces the DNA bending required for binding (McKinney & Prives, 2002). Interestingly, all of these regulators form homotetramers and have been shown to be essential for DNA looping, even though the folds of the proteins involved do not share any structural similarity. This suggests that the tetrameric provides a particular functional advantage in the formation of DNA loops, which may be related to the cooperativity in the stability conferred by having two DNA-recognition anchoring points at both extremes of the loop region.
p53 has been studied extensively and many functions have been attributed to this protein in processes related to, amongst others, human development and DNA repair and metabolism (Gaglia et al., 2013; Lane & Levine, 2010). It consists of five domains: two transcription-activation domains (TADs; residues 1–40 and 40–61), a proline-rich region (residues 64–92), a DNA-binding domain (DBD; residues 98–303), a nuclear localization signal-containing region (residues 303–323), an domain (residues 323–365) and a C-terminal basic domain (residues 363– 393). The TADs are important for the transactivation of different target genes. The DBD recognizes the p53 recognition element (p53RE), which consists of two copies of a 5′-RRRCWWGYYY-3′ sequence (IUPAC nomenclature, where R is A or G, Y is C or T and W is A or T) separated by a spacer of 0–13 bp (Brázda & Coufal, 2017; El-Deiry et al., 1992). The domain induces tetramerization and possibly other states of p53, which has been amply documented (reviewed in Chène, 2001). We will refer to this domain as p53Tet. The C-terminal domain is highly charged due to a high lysine content and is intrinsically disordered. This domain has a dual function: the positive charge enhances the affinity for DNA and at the same time is responsible for recruiting other factors. p53 is also able to form DNA loops and link DNA across large distances due to the combined action of the DBD and the p53Tet domain, which itself does not bind DNA but rather links DBD-bound DNAs (Stenger et al., 1994; Kearns et al., 2016; Brázda & Coufal, 2017). Many of the oncogenic mutations occur in the DBD (Perri et al., 2016), but several have been mapped to the tetramerization domain (reviewed in Chène, 2001; see references in Kamada et al., 2011; Petitjean et al., 2007). In addition, mutations in p53Tet cause Li–Fraumeni syndrome, a hereditary disease that conveys a high disposition to develop early-onset neoplasms (Etzold et al., 2015).
Here, we report a structural analog of the Bacillus subtilis. This transcriptional regulator, named Rco (Singh et al., 2013), forms tetramers in solution (Ramachandran et al., 2014; Crespo et al., 2020; Singh et al., 2013) and represses the main conjugation promoter Pc that controls the transcription of genes 28–74. The promotor contains 11 copies of the sequence 5′-CAGTGAAA-3′ and variations thereof (Ramachandran et al., 2014), which are likely to be the recognition sites for RcopLS20. RcopLS20 controls its own expression by regulating the activity of the overlapping and divergently oriented Pr promoter (reviewed in Meijer et al., 2021). The simultaneous regulation of the Pc and Pr promoters involves RcopLS20-mediated DNA looping, which is achieved through the binding of RcopLS20 to two operators separated by 75 bp and located near the Pc and Pr promoters, each containing multiple direct repeats of the Rco-recognition element (rcoRE; Ramachandran et al., 2014). One of the operators, OII, overlaps with the Pc and Pr promoters and contains at least six direct repeats of the rcoRE. The other operator, OI, contains at least four direct repeats of the rcoRE, which are convergently oriented with respect to the rcoREs in operator OII. DNA recognition by Rco is most likely to be achieved through a conserved helix–turn–helix (HTH) domain at its N-terminus, which is followed by a sequence of hitherto unknown function at its C-terminus. Given the presence of at least ten rcoREs in the promotor region to which Rco binds, it can be expected that at least two Rco tetramers bind to this region.
domain of p53 from the conjugative plasmid pLS20 of the Gram-positive (G+) bacteriumWe have previously solved the structure of RappLS20 (Crespo et al., 2020), which is the response regulator that binds RcopLS20, thereby activating the conjugation promotor and allowing expression of the conjugation operon (Singh et al., 2013). The aim of this study was to understand the structural mechanism of transcriptional regulation of RcopLS20. For this purpose, we further analyzed the behavior of RcopLS20 and identified the domain. We confirm the formation of tetramers and show that RcopLS20 can also form octamers under specific conditions. Furthermore, we present the of the tetramerization domain of RcopLS20. The domain encompasses 35 residues of the C-terminal region and is structurally homologous to the tetramerization domain of the human oncogene p53. As this fold is implicated in the formation of DNA loops, we designate domains that have this fold TetDloop. We define a motif for TetDloop and suggest that the TetDloop domain is ubiquitous among all kingdoms of life.
The implications of the occurrence of TetDloop in prokaryotes and eukaryotes are profound. First of all, the structures suggest that the fold precedes the appearance of multicellular life, which had not been considered (see, for example, Joerger et al., 2014). This implies that these proteins share a common ancestor and hence are ubiquitous. Secondly, it suggests that a basic paradigm of gene regulation through DNA looping exists among prokaryotes and eukaryotes, and we believe that the possible parallels between the mechanism of action of p53 and RcopLS20 should be further investigated.
2. Materials and methods
2.1. Production and purification of RcopLS20
Cloning, expression, isolation and purification of RcopLS20 were performed as described previously (Crespo et al., 2020). Typically, a yield of 20 mg RcopLS20 was obtained from 10 g cell pellet. Purity was assessed to be >95% by SDS–PAGE followed by Coomassie Blue staining. Protein concentration was determined from the absorbance at 280 nm on a Nanodrop 2000 spectrophotometer (ThermoFisher Scientific) using an extinction coefficient ɛ(1%) of 2.93. Protein was used for assays immediately where possible or stored in aliquots at −80°C.
2.2. Crystallization of RcoTetpLS20
RcopLS20 was concentrated to 17 mg ml−1 using an Amicon Ultra 15 ml centrifugal filter (Merck Millipore) with a cutoff of 10 kDa in a buffer consisting of 250 mM NaCl, 20 mM Tris pH 8.0. Concentrated RcopLS20 was gently mixed with a previously annealed double-stranded oligonucleotide (Biomers.net, Germany) with forward sequence 5′-GTCAGTGAAAAA-3′ in a 2:1 (protein:DNA) stoichiometry. The crystals giving the highest resolution data were obtained by the sitting-drop vapor-diffusion method at 18°C by equilibration of drops consisting of 100 nl protein solution and 100 nl crystallization buffer [0.1 M HEPES pH 7.5, 28%(v/v) PEG 600] against 100 µl crystallization buffer in the reservoir. The crystals took three months to grow and were harvested for X-ray diffraction data collection by cryocooling them by direct transfer from the crystallization drop into liquid nitrogen.
2.3. Data collection and processing
Data collection was performed on the BL13-XALOC beamline at the ALBA Synchrotron Light Source at 100 K. Data were processed with AutoPROC (Vonrhein et al., 2011) using anisotropic resolution cutoffs (see Table 1). The structure was determined de novo using ARCIMBOLDO (Rodríguez et al., 2009) followed by automated model building in Phenix version 1.12-2829 (Adams et al., 2010). The structure was refined using Phenix interspersed with manual adjustments in Coot version 0.8.9.2205 (Emsley et al., 2010). The are given in Table 1. The structure was deposited in the PDB with accession code 8bny.
‡Rmeas = . §Rcryst = . ¶Rfree = , where T represents a test set comprising ∼5% of all reflections that were excluded during |
2.4. Calculation of an AlphaFold model of full-length RcopLS20
AlphaFold version 2.1.0 (Jumper et al., 2021) was used to generate five models of full-length RcopLS20 (UniProt entry E9RIY8). The five models are essentially the same and the highest ranking model was used for analysis.
2.5. Database searches and figures
BlastP (Altschul et al., 1990) searches were performed using residues Ser124–Asp161 of the RcopLS20 sequence. Figures were prepared using PyMOL version 2.3 (Schrödinger). Superpositions were performed using the built-in `align' function in PyMOL. PDBeFold from EMBL–EBI was used to identify folds similar to that of RcopLS20 in the PDB (Krissinel & Henrick, 2005).
2.6. (SEC) assays
25 µg RcopLS20 (25 µl) were injected into a Superdex 200 Increase 5/150 GL column (GE Healthcare) that had been equilibrated with buffers at different pH values. For pH 5, the column was equilibrated with 500 mM NaCl, 20 mM citrate buffer pH 5. For pH 8, 500 mM NaCl, 20 mM Tris pH 8 was used. For pH 10, 500 mM NaCl, 20 mM glycine–NaOH pH 10 was used. Elution was performed at a flow rate of 0.2 ml min−1. The elution was continuously monitored at wavelengths of 280 and 260 nm. Estimation of the molecular weight (MW) was performed from the elution volume Vel of the detected peaks using an in-house calibration of the relation between log(MW) and Vel of proteins with known MW. The equation derived from this calibration was Vel = −0.6815log(MW) + 5.1906, with R2 = 0.933.
3. Results
3.1. of the tetramerization domain of RcopLS20
Crystallization trials on apo full-length (FL) RcopLS20 were not successful, but cocrystallization of FL RcopLS20 with various DNA sequences did result in the formation of crystals. However, revealed that the crystals contained only part of RcopLS20, corresponding to residues Val125–Lys159 of the FL protein, and no DNA molecules (Fig. 1). Since the missing sequence of the protein cannot fit into the it is likely that the protein degraded during the course of the crystallization experiment, which often occurs in multidomain proteins. Indeed, (SEC) analysis of the protein confirmed that degradation occurred over time (Supplementary Fig. S1), leading to the appearance of fragments with a molecular weight compatible with the crystallized fragment. We show below that the crystallized fragment indeed corresponds to one of the domains of RcopLS20, which we will refer to as RcoTetpLS20. The contains four monomers forming one crystallographically independent tetramer.
Each monomer consists of an elongated sequence comprising residues Val125–Thr132, which includes a four-amino-acid β-strand formed by residues Arg127–Asp130. This strand is followed by a sharp turn facilitated by the glycine residue Gly133, which is followed by an α-helix comprising residues Glu136–Lys157 (Fig. 1a). The RcoTetpLS20 tetramer consists of a dimer of primary dimers. The primary homodimer is formed by the arrangement of two β-strands from two monomers in an antiparallel fashion and by concomitant antiparallel packing of the helices against one face of the two β-strands (Fig. 1c). The α-helices interact with one of the faces of the β-strands through hydrophobic interactions involving Val131, Phe129, Leu134, Ile139, Val142, Ile146 and Leu149 (Fig. 1b). At the center of the exposed helical face, a cluster of charged residues is formed by Arg141 and Glu145. The hydrophobic Leu148 residues are located at both extremes of this cluster along the α-helix (Figs. 1c and 1d), which connects to the hydrophobic cluster through Leu149. The tetramer is formed through interactions of the Arg141, Glu145 and Leu148 residues, which we will refer to as the REL motif, from the four monomers (Figs. 1c and 1d). The carboxylic acid groups of the Glu145 residues interact through hydrogen bonds (Fig. 1d).
The overall shape of the tetrameric structure is reminiscent of an octagon (see Fig. 1e, left panel). The C-terminal ends of two helices form two pairs of opposed vertices of the octagon. The remaining four vertices are formed by the N-termini of the β-strands. Thus, two pairs of N-terminal DBDs are located on opposing sides of the RcoTetpLS20 domain. The lateral edges of the octagon formed by the β-strands are hydrophobic (Fig. 1e), which is expected to contribute to interactions with the DBDs based on analogy with other structures (see below). The planar faces of the octagonal box are formed by α-helices (Fig. 1e).
3.2. pH-dependent behavior of RcopLS20
RcopLS20 tetramerization seems to mainly be driven by the charged interactions of the REL motif (Figs. 1c and 1d). This triggered us to study the behavior of full-length RcopLS20 at pH 5, pH 8 and pH 10, respectively. We found that RcopLS20 tends to form higher order oligomers under alkaline conditions (pH 10; Fig. 1f). Under acidic conditions, i.e. pH 5, disruption of the tetramer is observed (Fig. 1f). It is likely that protonation-induced neutralization of the carboxylates of the central Glu145 occurs at pH 5. This causes disruption of the counter-charge stabilized hydrogen-bonding network, shown in Fig. 1(d), between the agglomerated Glu145 and Arg141 residues at the tetramerization interface.
The MW of the species that form at pH 5, pH 8 and pH 10 were estimated by calibration of the SEC column elution based on the individual elution of a set of proteins of distinct MW under equivalent conditions (Fig. 1f). The MW estimates are 56.99 kDa at pH 5, 85.43 kDa at pH 8 and 151.57 kDa at pH 10 (Supplementary Table S1). Given that the MW of FL RcopLS20 is 20.32 kDa, the elution peaks therefore correspond to two to three FL protein molecules at pH 5, four protein molecules at pH 8 and eight protein molecules at pH 10 (Supplementary Table S1). It is unlikely that RcopLS20 can form a trimer at pH 5; it is far more likely that protonation of the Glu145 residues at pH 5 disrupts the tetramerization interface, resulting in dimers. The slight deviation in the elution pattern from a dimer at pH 5 may be a result of an altered surface charge under acidic conditions, which may affect interactions with the agarose–dextran matrix of the column, or a mixture of dimers and tetramers that results in an average migration of the peak. These data show that the protein prefers forming octamers at pH 10 even at low concentrations, which can be isolated by SEC. No direct evidence for hexamers, heptamers or complexes larger than octamers have been observed by SEC under the conditions tested.
3.3. Comparison of the TetDloop folds of structural homologs
A search of the PDB (Berman et al., 2000) for structural homologs of RcoTetpLS20 using PDBeFold (Krissinel & Henrick, 2005) resulted in several significant hits, which all corresponded to the human oncogene p53 and its analogs (Supplementary Table S2). The basic fold of all these structures consists of a pair of short β-strands followed by an α-helix connected by a kinked loop (Figs. 1a and 2 and Supplementary Fig. S2). Hydrophobic residues line the internal surfaces of the β-strands and the α-helix, thereby forming the hydrophobic core of the structure. Both the β-strands and the α-helix have similar lengths, and structural differences are mainly found in the angle between the β-strands and the α-helix, which appears to be conditioned by the residues forming the kinked loop. In the structures identified and analyzed here this angle is smaller than 27° when a glycine is present in the loop, whereas it ranges from 34° to 62° when this glycine is lacking, as is the case for Drosophila melanogaster p53 (Dmp53) and CEP-1 (Supplementary Fig. S2). Due to the variation in the angle, the monomers of the structural homologs generally do not superpose, which complicates the detection of structural similarity. Strikingly, the structure of RepSPC32H (Kim et al., 2016), a repressor encoded by Salmonella phage SPC32H, reveals a tetramerization domain that is structurally similar to RcoTetpLS20 and p53Tet. This domain is also called CAD and, apart from inducing tetramers in RepSPC32H, it is also responsible for its interaction with the antirepressor Ant (Kim et al., 2016). We propose that it is a TetDloop and will refer to it as such here. This structure did not appear in the PDBeFold search using RcoTetpLS20 as a query as described above. Instead, RepSPC32H was identified as an RcopLS20 analog based on the following shared features. Firstly, the architecture of the two full-length proteins is similar and contains a DBD at the N-terminus and a TetDloop at the C-terminus. RepSPC32H additionally contains a dimerization domain (MDD) situated between the DBD and the TetDloop (Kim et al., 2016). Secondly, the DBDs are homologous in sequence. Thirdly, RepSPC32H is a transcriptional repressor and, together with its antirepressor protein Ant, exhibits a similar regulatory mechanism to the RappLS20–RcopLS20 pair. Thus, RepSPC32H prevents entry into the lytic cycle of the phage by tight repression of the genes essential for the lytic cell cycle. However, the structural mechanism of gene repression of this protein has not been extensively characterized.
3.4. Relative spatial position of the TetDloop and additional domains
The angle between the β-strands and the α-helix in the monomer ultimately determines the respective orientations of the dimers in the of the tetramer. This further complicates the detection of structural similarity between structural homologs. The interface of the tetramer is all hydrophobic for p53 (Fig. 2b) and all of its homologs, except for Dmp53, which has a charged core similar to RcoTetpLS20, consisting of interacting glutamate residues lined with arginines in a two-layered configuration, as shown for RcoTetpLS20 in Figs. 1 and 2(a). Interestingly, the β-strands and α-helix are duplicated in the sequence of Dmp53, and the tetramer interface is therefore formed by eight helices, i.e. two helices from each of the four monomers. The exact nature of the amino acids involved in the hydrophobic tetramerization interfaces can vary, which is exemplified by RepSPC32H and p53: the former uses phenylalanines Phe187 and Phe190 of the four monomers to form this interface, whereas in p53 a cluster of leucine and methionine residues is found.
The quaternary structures of the TetDloop of p53 and RcopLS20 are similar (Fig. 2), despite the difference in the relative orientations of the dimers across the tetramerization interface. The configuration of the interaction between primary dimers causes the N-termini of the TetDloop to point in opposite directions. This is confirmed by the FL structure of a complex of p53 and p300 determined by (Ghosh et al., 2019), which shows a planar overall structure with the DBDs interacting pairwise; each pair extends away from the central p53Tet domain in opposite directions (Fig. 3a). The placement of the MDDs in the structure of RepSPC32H is similar (Fig. 3b), where two pairs of adjacent MDDs are separated by ∼100 Å, as measured between the far ends of these domains. The highest ranked AlphaFold2 model of a tetramer of full-length Rco (Supplementary Fig. S3a) is consistent with the arrangement of the RHH domains of p53 and RepSPC32H shown in Figs. 3(a) and 3(b), respectively. It is interesting to note that the AlphaFold2 model of the tetramer of the full-length protein (Supplementary Fig. S3b) shows that the tetramerization domain is well predicted for this state.
3.5. Structurally similar proteins show low sequential homology
The vertebrate p53 homologs generally share low sequence homology in regions other than their DNA-binding domains (Ou et al., 2007; Lu & Abrams, 2006). In line with this, RcoTetpLS20 and RepSPC32H also share low homology with these proteins. For example, the residues conferring interactions between the β-strands and the α-helix in the core of the dimer are hydrophobic, but the identities of these residues are not conserved between the different TetDloop domains. Similarly, even within vertebrates the tetramerization interface has diverged among different proteins, leading to the occurrence of hydrophobic and/or charged tetramerization interfaces as described above. It is therefore not surprising that sequential homology is low in these structurally similar proteins.
Despite low sequence conservation, we used the RcoTetpLS20 domain (RcopLS20 residues Ser124–Asp161) as a query in BlastP searches. As expected from the low sequence homology, none of the structural homologs described above were identified in the BlastP search. Instead, nine nonredundant proteins with E-values ranging between 3 × 10−6 and 2 × 10−3 were identified, which are all encoded by bacteria belonging to the phylum Firmicutes (Fig. 3c). Thus, the TetDloop is found in G+ proteins related to B. subtilis.
4. Discussion
RcopLS20 is a transcriptional regulator that exerts its function through the formation of a DNA loop by binding to two regions that are separated by about 75 bp (Ramachandran et al., 2014). The N-terminal HTH motif identified in the N-terminal region is likely to be involved in recognition of the rcoREs, but the structural basis of looping was not well understood up to this point. Our present results show that the master regulator RcopLS20, which is crucial in the transcriptional regulation of the conjugation operons of plasmid pLS20, contains a tetramerization domain that is likely to be involved in formation of the DNA loop, and that this domain has high structural homology to the tetramerization domain present in the p53 family of tumor repressors. The structure reveals that RcoTetpLS20 forms tetramers like the p53 tumor suppressor protein family and shows how the different DNA-recognition elements are bridged. Identification of the tetramerization domain based on sequence was hampered by its low sequence homology with similar domains of known function.
RcopLS20 is one of many examples illustrating the intimate link between protein tetramerization and DNA looping or long-range DNA cross-linking observed in both prokaryotic and eukaryotic systems. In prokaryotes, regulators have been extensively studied, for example gal and lac (Cournac & Plumbridge, 2013; Matthews, 1992), whereas in eukaryotes the p53 analogs (Kearns et al., 2016; Stenger et al., 1994) are well known examples. To date, however, no structural homology has been reported between the structures of tetrameric, DNA-loop inducing proteins from eukaryotes and non-eukaryotes. Remarkably, we found that the TetDloop domain observed in RcopLS20 and p53 is also present in the structure of Salmonella phage SPC32H (PDB entry 5d4z). However, structural homology of this domain to existing structures was not detected in this study (Kim et al., 2016), although a structural comparison with PDB structures using DALI was performed. It is likely that variations in the sequence and in the relative orientation of the β-strands and α-helix hampered detection. In fact, the RepSPC32H linker consists of a short α-helix which is not present in any of the other structures. This connecting α-helix allows the β-strands and the second α-helix to adopt a nearly parallel configuration, without the need for a glycine residue. Furthermore, there is no sequence homology between the dimer and tetramer interface of the TetDloop of RepSPC32H, and all the proteins analyzed above.
Despite the low sequence homology, it was possible to discern motifs for the proteins that contain a charged tetramerization interface. In our analysis, a positively charged residue (Arg141 in RcoTetpLS20) should be followed approximately a full turn later in the α-helix (i.e. three or four positions downstream) by a glutamate (Glu145 in RcoTetpLS20). In addition, we find a hydrophobic residue at a second full turn (+7) from the positively charged residue (Leu148 in RcoTetpLS20). We named this distinctive motif the `REL motif'. Furthermore, a glycine residue is often found in the sharp turn between the α-helix and β-strands and may be of importance, as it is highly conserved in metazoan homologs of human p53 and in hits found through sequence-homology searches. From our comparison, it seems that a glycine is a prerequisite for a small angle between the β-strands and α-helix, as a lack of this residue leads to differences in the orientation of the helices and β-strands of the respective monomers. It should be noted that variations on this motif exist, as the RepSPC32H structure contains a short additional α-helix perpendicular to the preceding β-strand and the trailing α-helix (Fig. 3b). Variability is also observed for the tetramerization interface joining two dimers, which has evolved to either a completely hydrophobic interface or an interface stabilized by complementary charges.
RcopLS20 has the capacity to form higher order complexes as shown by the results described here and by SAXS analysis of FL RcopLS20 at neutral pH, where it was shown to occur in a concentration-dependent manner (Crespo et al., 2020). The pH dependence of octamerization determined here suggests that this interface is also charged, like that of the tetramer. Analysis of the distribution of charges and hydrophobic patches across the surface of RcoTetpLS20 suggests that the interaction is likely to be mediated by the charged face of the octagonal form of the tetramer, given that the edges are hydrophobic. Remarkably, inspection of the interactions between symmetry-related molecules of the of RepSPC32H reveals interactions between the TetDloop of adjacent tetramers, resulting in octamers that are in accordance with the model of octamerization through the TetDloop, as proposed for RcopLS20. The model that emerges from the DNA binding of the tetramer is that of cooperative binding of recognition sites across the DNA loop. The initial RcoTetpLS20–DNA complex would be formed stochastically, perhaps with the aid of helper proteins that shape the DNA at the turn and through the intrinsic propensity of the DNA to bend at the loop region (Ramachandran et al., 2014). Binding of several tetramers of the repressor across the loop stabilizes the loop structure.
The striking structural similarity between the prokaryotic and eukaryotic tetramerization domains exemplified by RcoTetpLS20 and p53, respectively, raises the question whether they evolved independently or whether they are derived from a common ancestor. This question cannot be answered straightforwardly, as sequence homology in these short sequences is difficult to detect. However, the conservation of the REL motif in some interfaces, coupled with the structural similarity of the core strand–kink–helix, suggests common ancestry. For example, RcoTetpLS20 and Dmp53 are structurally homologous and both have a charged tetramerization interface, albeit that duplication of the secondary-structure elements occurs in Dmp53 as described above. In addition, a well established cofactor of p53, named Strap, consists of a TPR motif (Adams et al., 2012) and interacts with p53 (Jung et al., 2007). Thus, Strap could be a functional homolog of RappL20.
The combination of a shared core structure and tetramerization interface and cofactors of similar structure suggests that convergent evolution is unlikely and strongly favors the hypothesis that all proteins incorporating a TetDloop have diverged from a common ancestor. The presence of the TetDloop motif across the gene pools of all kingdoms of life also supports divergent evolution, as it is unlikely to have occurred through multiple convergent events. We propose therefore that all members of this family stem from a common ancestor that included DBD and TetDloop domains. This would imply that the TetDloop fold predates the occurrence of multicellularity some 3–3.5 billion years ago (Grosberg & Strathmann, 2007).
The work described here suggests a common regulatory mechanism that exists in all kingdoms of life and that controls gene transcription, which is mediated by a tetramerization domain that we call TetDloop. The detection of this domain in sequences and structures is complicated due to low sequence conservation and to structural variation of the constituting elements. We describe the motifs that we have observed in the different structures. However, improved bioinformatics tools that can increase the predictive power for identifying small domains such as TetDloop would help in the detection of these domains in proteins that share similar mechanisms conferred by this domain.
Supporting information
PDB reference: tetramerization domain of Rco, 8bny
Supplementary Figures and Tables. DOI: https://doi.org/10.1107/S2059798323001298/jb5053sup1.pdf
Acknowledgements
We thank the members of the WJJM laboratory for useful discussions and for providing us with the clones. We thank the staff of the XALOC beamline and the floor coordinators at the ALBA synchrotron, Barcelona, Spain for their support during data collection. Author contributions were as follows: DRB designed the research; NB, IC and AC performed the research; WJJM and DRB wrote the paper. The authors declare no conflicts of interest.
Funding information
This work was supported by the Ministry of Economy and Competitiveness of the Spanish Government [Grant Nos. PID2019-108778GB-C21 (AEI/FEDER, EU) to WJJM and BIO2016-77883-C2-2-P, which also supported NB, PID2020-117028GB-I00 and FIS2015-72574-EXP (AEI/FEDER, EU) to DRB]. Funding for the open-access charge was provided by the Ministry of Economy and Competitiveness of the Spanish Government (BIO2016-77883-C2-1-P and BIO2016-77883-C2-2-P). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
References
Adams, C. J., Pike, A. C. W., Maniam, S., Sharpe, T. D., Coutts, A. S., Knapp, S., La Thangue, N. B. & Bullock, A. N. (2012). Proc. Natl Acad. Sci. USA, 109, 3778–3783. CrossRef CAS PubMed Google Scholar
Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L.-W., Kapral, G. J., Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. & Zwart, P. H. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403–410. CrossRef CAS PubMed Web of Science Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Brázda, V. & Coufal, J. (2017). Int. J. Mol. Sci. 19, 2737. Google Scholar
Chène, P. (2001). Oncogene, 20, 2611–2617. PubMed Google Scholar
Cournac, A. & Plumbridge, J. (2013). J. Bacteriol. 195, 1109–1119. CrossRef CAS PubMed Google Scholar
Crespo, I., Bernardo, N., Miguel-Arribas, A., Singh, P. K., Luque-Ortega, J. R., Alfonso, C., Malfois, M., Meijer, W. J. J. & Boer, D. R. (2020). Nucleic Acids Res. 48, 8113–8127. CrossRef CAS PubMed Google Scholar
Drozdetskiy, A., Cole, C., Procter, J. & Barton, G. J. (2015). Nucleic Acids Res. 43, W389–W394. Web of Science CrossRef CAS PubMed Google Scholar
El-Deiry, W. S., Kern, S. E., Pietenpol, J. A., Kinzler, K. W. & Vogelstein, B. (1992). Nat. Genet. 1, 45–49. PubMed CAS Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Etzold, A., Schröder, J. C., Bartsch, O., Zechner, U. & Galetzka, D. (2015). Fam. Cancer, 14, 161–165. CrossRef CAS PubMed Google Scholar
Gaglia, G., Guan, Y., Shah, J. V. & Lahav, G. (2013). Proc. Natl Acad. Sci. USA, 110, 15497–15501. CrossRef CAS PubMed Google Scholar
Ghosh, R., Kaypee, S., Shasmal, M., Kundu, T. K., Roy, S. & Sengupta, J. (2019). Biochemistry, 58, 3434–3443. CrossRef CAS PubMed Google Scholar
Grosberg, R. & Strathmann, R. (2007). Annu. Rev. Ecol. Evol. Syst. 38, 621–654. CrossRef Google Scholar
Joerger, A. C., Wilcken, R. & Andreeva, A. (2014). Structure, 22, 1301–1310. CrossRef CAS PubMed Google Scholar
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P. & Hassabis, D. (2021). Nature, 596, 583–589. Web of Science CrossRef CAS PubMed Google Scholar
Jung, H., Seong, H.-A. & Ha, H. (2007). J. Biol. Chem. 282, 35293–35307. CrossRef PubMed CAS Google Scholar
Kamada, R., Nomura, T., Anderson, C. W. & Sakaguchi, K. (2011). J. Biol. Chem. 286, 252–258. CrossRef CAS PubMed Google Scholar
Kearns, S., Lurz, R., Orlova, E. V. & Okorokov, A. L. (2016). Nucleic Acids Res. 44, 6185–6199. CrossRef CAS PubMed Google Scholar
Kim, M., Kim, H. J., Son, S. H., Yoon, H. J., Lim, Y., Lee, J. W., Seok, Y.-J., Jin, K. S., Yu, Y. G., Kim, S. K., Ryu, S. & Lee, H. H. (2016). Proc. Natl Acad. Sci. USA, 113, E2480–E2488. CAS PubMed Google Scholar
Krissinel, E. & Henrick, K. (2005). Computational Life Sciences, edited by M. R. Berthold, R. C. Glen, K. Diederichs, O. Kohlbacher & I. Fischer, pp. 67–78. Berlin, Heidelberg: Springer. Google Scholar
Lane, D. & Levine, A. (2010). Cold Spring Harb. Perspect. Biol. 2, a000893. PubMed Google Scholar
Lee, T. I. & Young, R. A. (2013). Cell, 152, 1237–1251. Web of Science CrossRef CAS PubMed Google Scholar
Lu, W.-J. & Abrams, J. M. (2006). Cell Death Differ. 13, 909–912. CrossRef PubMed CAS Google Scholar
Matthews, K. S. (1992). Microbiol. Rev. 56, 123–136. CrossRef PubMed CAS Google Scholar
McKinney, K. & Prives, C. (2002). Mol. Cell. Biol. 22, 6797–6808. CrossRef PubMed CAS Google Scholar
Meijer, W. J. J., Boer, D. R., Ares, S., Alfonso, C., Rojo, F., Luque-Ortega, J. R. & Wu, L. J. (2021). Front. Mol. Biosci. 8, 648468. CrossRef PubMed Google Scholar
Mittl, P. R. E., Chène, P. & Grütter, M. G. (1998). Acta Cryst. D54, 86–89. Web of Science CrossRef CAS IUCr Journals Google Scholar
Morelli, M. J., ten Wolde, P. R. & Allen, R. J. (2009). Proc. Natl Acad. Sci. USA, 106, 8101–8106. CrossRef PubMed CAS Google Scholar
Ou, H. D., Löhr, F., Vogel, V., Mäntele, W. & Dötsch, V. (2007). EMBO J. 26, 3463–3473. CrossRef PubMed CAS Google Scholar
Perri, F., Pisconti, S. & Della Vittoria Scarpati, G. (2016). Ann. Transl. Med. 4, 522. CrossRef PubMed Google Scholar
Petitjean, A., Mathe, E., Kato, S., Ishioka, C., Tavtigian, S. V., Hainaut, P. & Olivier, M. (2007). Hum. Mutat. 28, 622–629. Web of Science CrossRef PubMed CAS Google Scholar
Ramachandran, G., Singh, P. K., Luque-Ortega, J. R., Yuste, L., Alfonso, C., Rojo, F., Wu, L. J. & Meijer, W. J. J. (2014). PLoS Genet. 10, e1004733. CrossRef PubMed Google Scholar
Rivlin, N., Brosh, R., Oren, M. & Rotter, V. (2011). Genes Cancer, 2, 466–474. CrossRef CAS PubMed Google Scholar
Rodríguez, D. D., Grosse, C., Himmel, S., González, C., de Ilarduya, I. M., Becker, S., Sheldrick, G. M. & Usón, I. (2009). Nat. Methods, 6, 651–653. Web of Science PubMed Google Scholar
Singh, P. K., Ramachandran, G., Ramos-Ruiz, R., Peiró-Pastor, R., Abia, D., Wu, L. J. & Meijer, W. J. J. (2013). PLoS Genet. 9, e1003892. CrossRef PubMed Google Scholar
Stenger, J. E., Tegtmeyer, P., Mayr, G. A., Reed, M., Wang, Y., Wang, P., Hough, P. V. & Mastrangelo, I. A. (1994). EMBO J. 13, 6011–6020. CrossRef CAS PubMed Google Scholar
Vilar, J. M. & Saiz, L. (2005). Curr. Opin. Genet. Dev. 15, 136–144. CrossRef PubMed CAS Google Scholar
Vogelstein, B., Lane, D. & Levine, A. J. (2000). Nature, 408, 307–310. CrossRef PubMed CAS Google Scholar
Vonrhein, C., Flensburg, C., Keller, P., Sharff, A., Smart, O., Paciorek, W., Womack, T. & Bricogne, G. (2011). Acta Cryst. D67, 293–302. Web of Science CrossRef CAS IUCr Journals Google Scholar
Youssef, N., Budd, A. & Bielawski, J. P. (2019). Methods Mol. Biol. 1910, 3–31. CrossRef CAS PubMed Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.