Received 26 August 2013
A family portrait: structural comparison of the Whirly proteins from Arabidopsis thaliana and Solanum tuberosum
DNA double-strand breaks are highly detrimental genomic lesions that routinely arise in genomes. To protect the integrity of their genetic information, all organisms have evolved specialized DNA-repair mechanisms. Whirly proteins modulate DNA repair in plant chloroplasts and mitochondria by binding single-stranded DNA in a non-sequence-specific manner. Although most of the results showing the involvement of the Whirly proteins in DNA repair have been obtained in Arabidopsis thaliana, only the crystal structures of the potato Whirly proteins WHY1 and WHY2 have been reported to date. The present report of the crystal structures of the three Whirly proteins from A. thaliana (WHY1, WHY2 and WHY3) reveals that these structurally similar proteins assemble into tetramers. Furthermore, structural alignment with a potato WHY2-DNA complex reveals that the residues in these proteins are properly oriented to bind single-stranded DNA in a non-sequence-specific manner.
In plant mitochondria and plastids, DNA lesions such as DNA double-strand breaks (DSBs) can be repaired conservatively through homologous recombination (reviewed in Maréchal & Brisson, 2010). Under some conditions, however, the DNA-repair machinery is unable to cope with all DNA damage and non-conservative repairs occur (Abdelnoor et al., 2003; Cappadocia et al., 2010; Kwon et al., 2010; Maréchal et al., 2009; Shedge et al., 2007). Whirly proteins are negative regulators of a non-conservative repair pathway named microhomology-mediated break-induced replication (MMBIR; Maréchal et al., 2009; Cappadocia et al., 2010). Whirly proteins are mainly found in the plant kingdom, where they localize to either chloroplasts (specialized mature plastids responsible for photosynthesis) or mitochondria (reviewed in Desveaux et al., 2005). In solution, the recombinant proteins form tetramers (Desveaux et al., 2002; Cappadocia et al., 2010) that bind single-stranded DNA with low sequence specificity (Cappadocia et al., 2010). In the absence of the chloroplast-directed Whirly proteins, the plastid genome of both Arabidopsis thaliana and Zea mays (maize) becomes unstable and accumulates DNA rearrangements that contain microhomologies at their endpoint junction (Maréchal et al., 2009). Also, the treatment of Arabidopsis plants with ciprofloxacin, an inhibitor of DNA gyrases that induces DSBs in both plastids and mitochondria (Rowan et al., 2010; Parent et al., 2011), results in an increase in microhomology-mediated DNA rearrangements in the plastids of plants lacking WHY1 and WHY3 (Cappadocia et al., 2010). WHY2, the Whirly protein directed to mitochondria in Arabidopsis, also appears to be involved in repressing the MMBIR pathway in the mitochondria, although to a lesser extent (Cappadocia et al., 2010).
The crystal structures of the Solanum tuberosum (potato) WHY1 and WHY2 proteins were solved in the free form (Desveaux et al., 2002; Cappadocia et al., 2008, 2010). These structures revealed that both chloroplast-directed and mitochondria-directed Whirly proteins assemble into tetramers with C4 symmetry. Each Whirly domain consists of two four-stranded -sheets that stack at a 90° angle and two -helices. The -helices constitute the core of the proteins, against which the -sheets stack in a whirligig-like manner. The crystal structures of WHY2 in complex with different DNA molecules revealed how this protein binds ssDNA with low sequence specificity (Cappadocia et al., 2010). Specifically, the DNA is sandwiched in between the -sheets of adjacent subunits, thereby preventing spurious contacts between the nucleobases and the protein surface. This type of binding is coherent with the DNA-repair role assigned to the Whirly proteins.
To date, no structure of an Arabidopsis Whirly protein has been elucidated. This is unfortunate, as a large proportion of the results concerning Whirly proteins were obtained using Arabidopsis as a model organism. In this manuscript, we report the crystallization and crystal structure determination of the Arabidopsis proteins WHY1, WHY2 and WHY3.
Unless specified otherwise, the Whirly proteins mentioned hereafter refer to Arabidopsis.
The DNA fragments encoding the Whirly domains of WHY1, WHY2 and WHY3 (UniProt accession Nos. Q9M9S3, Q8VYF7 and Q66GR6-2, respectively) were amplified by PCR and cloned into the pET-21a vector (Novagen). WHY174-241 and WHY378-245 were cloned in between the NdeI and NotI restriction sites, thus adding a methionine at the N-termini of the proteins and an AAALEHHHHHH sequence at their C-termini. WHY245-212 was cloned in between the NdeI and XhoI restriction sites, thus adding a methionine at the N-terminus of the protein and an LEHHHHHH sequence at its C-terminus. The sequences were confirmed by DNA sequencing.
The expression plasmids were transformed into Escherichia coli strain BL21(DE3). The cells were grown at 310 K in Luria-Bertani broth. When the cells reached an OD600 of 0.6-1.0, isopropyl -D-1-thiogalactopyranoside was added to a final concentration of 1 mM. After cell growth overnight at 303 K, the cells were harvested and lysed by alumina grinding. The lysate was resuspended in 20 mM sodium phosphate pH 7.5, 500 mM NaCl, 25 mM imidazole. The recombinant proteins were purified by applying the supernatant from the cell lysate onto a HisTrap Chelating nickel-affinity column (GE Healthcare). The proteins were further purified using a Superdex 200 16/60 size-exclusion column (GE Healthcare) pre-equilibrated in a buffer consisting of 10 mM Tris-HCl pH 8.0, 100 mM NaCl. The proteins were concentrated using Millipore 10K concentrators and the protein concentration was determined using the Bicinchoninic Acid (BCA) Protein Assay Kit (Pierce). The proteins were diluted to a final concentration of 20 mg ml-1.
Crystals were typically grown from a hanging drop at 296 K by mixing 3 µl purified protein with 3 µl reservoir solution and allowing the system to equilibrate by vapour diffusion. Initial conditions for growing WHY1 crystals were obtained using the PEG Screen (NeXtal). These conditions were refined to 5%(v/v) PEG 3350, 0.2 M potassium acetate, 0.1 M MES pH 5.5. Crystals of WHY2 were obtained using conditions similar to those used to obtain potato WHY2 crystals (Cappadocia et al., 2008). The conditions for WHY2 were refined to 15%(v/v) PEG 3350, 0.1 M MOPS pH 7.0. Initial conditions for growing WHY3 crystals were obtained from the WHY1 conditions and were refined to 14%(v/v) PEG 1000, 0.2 M potassium acetate, 0.1 M sodium citrate pH 4.2.
Crystal cryoprotection was achieved by using a modified mother-liquor solution in which the PEG concentration was raised to 25%(v/v). The crystals were mounted in CryoLoops (Hampton Research) and flash-cooled in a stream of nitrogen gas at 100 K. 360 frames were recorded using an oscillation range of 0.5° and crystal-to-detector distances of 240, 227 and 227 mm for WHY1, WHY2 and WHY3, respectively. Diffraction data were collected using an ADSC Quantum 315 CCD detector on beamline X29 of the National Synchrotron Light Source (NSLS) at Brookhaven National Laboratory (BNL, USA). The data were processed, indexed and scaled using either HKL-2000 (Otwinowski & Minor, 1997) or XDS (Kabsch, 2010) and SCALA (Evans, 1993).
The structures of WHY1 and WHY2 were solved by molecular replacement using PHENIX (Adams et al., 2010) with the crystal structures of the potato WHY1 (PDB entry 1l3a ; Desveaux et al., 2002) and WHY2 (PDB entry 3n1h ; Cappadocia et al., 2010) as search models, respectively. The structure of WHY3 was also solved by molecular replacement using the refined structure of WHY1 as a search model. The models were improved by iterative model building in Coot (Emsley et al., 2010) and refinement in PHENIX.
Whirly proteins are typically composed of a transit peptide that target them to chloroplasts or mitochondria, a Whirly domain that has ssDNA-binding capacity and an acidic aromatic C-terminal tail. As both the transit peptide and the acidic aromatic C-terminal tail are predicted to be flexible in solution and could interfere with the crystallization process, they were excluded from the constructs.
WHY1 crystallized in space group C2221 with four protein molecules in the asymmetric unit. The crystals of WHY1 had a Matthews coefficient value VM of 2.67 Å3 Da-1 (considering a molecular mass of 19 978.9 Da) and diffracted to 1.88 Å resolution (Table 1). The WHY2 crystals belonged to space group P212121, had a Matthews coefficient value VM of 2.00 Å3 Da-1 (considering a molecular mass of 19 711.6 Da), diffracted to 1.75 Å resolution and contained four proteins in the asymmetric unit (Table 1). For WHY3, a splicing variant of the protein that does not possess a serine residue at position 175 was chosen for structure determination as crystallization of the protein containing the serine residue led to perfectly twinned crystals that hampered structure determination. WHY3 crystallized in space group P4212 with one molecule in the asymmetric unit. These crystals had a Matthews coefficient value VM of 2.58 Å3 Da-1 (considering a molecular mass of 19 990.9 Da) and diffracted to 1.85 Å resolution (Table 1).
WHY1, WHY2 and WHY3 all exhibit the canonical Whirly fold. In the WHY2 structure, the four proteins present in the asymmetric unit form a Whirly tetramer with fourfold symmetry (Fig. 1a). For WHY1 and WHY3, this same quaternary arrangement can also be generated by applying the appropriate crystallographic symmetry (Figs. 1b and 1c). This, together with previous reports of Whirly proteins forming tetramers (Desveaux et al., 2002; Cappadocia et al., 2010), supports the idea that plant Whirly proteins minimally fold as tetramers.
| || Figure 1 |
Tetramers of (a) WHY2, (b) WHY1 and (c) WHY3 in cartoon representation. WHY2 tetramers are present in the asymmetric unit. Tetramers of WHY1 and WHY3 were generated by applying the appropriate crystallographic symmetries.
The Whirly domains of chloroplast-directed Whirly proteins from Arabidopsis and potato reveal good conservation at the sequence level (Fig. 2a). Structurally, WHY1 and WHY3 are also similar, with an r.m.s.d. for equivalent C positions of 1.1-1.6 Å between the four WHY1 subunits and WHY3 (Fig. 2b). The determination of the crystal structures of WHY1 and WHY3 offers the possibility of comparing them with the structure of potato WHY1 (PDB entry 1l3a ; Fig. 2b). The two WHY1 models display closely related folds with an r.m.s.d. for equivalent C positions of 1.1-1.5 Å when superposing Arabidopsis and potato subunits. For comparison, the r.m.s.d. between individual subunits in potato WHY1 alone varies between 0.8 and 1.0 Å and it varies between 0.4 and 0.8 Å for Arabidopsis WHY1 subunits. The main difference between the structures is a shift of 4-7 Å (depending on the subunit) in the position of the 174-185 loop of WHY1. Together with our previous report (Cappadocia et al., 2010), the present results suggest strong similarity of Arabidopsis WHY1, WHY3 and potato WHY1 both at the sequence and at the structure levels.
| || Figure 2 |
(a) Sequence alignment of the Whirly domains of chloroplast-directed Whirlies. AtWhy1, Arabidopsis WHY1; AtWhy2, Arabidopsis WHY2; StWhy1, S. tuberosum WHY1. (b) Structural alignment of chloroplast-directed Whirly proteins in cartoon representation with WHY1 in green, WHY3 in cyan and potato WHY1 (PDB entry 1l3a ) in orange.
The Whirly domains of mitochondria-directed Whirly proteins from Arabidopsis and potato also reveals good conservation at the sequence level (Fig. 3a). The four subunits of WHY2 display a high degree of structural variation, with an r.m.s.d. ranging from 1.4 to 4.8 Å. However, most of this variation is limited to a -hairpin encompassing residues 74-90 which exhibits significant flexibility (Fig. 3b). Indeed, in the absence of this region WHY2 display a lower degree of structural variation, with an r.m.s.d. ranging from 0.7 to 1.9 Å. Residues 74 and 90 indeed appear to act as a hinge enabling a near-70° rotation of the -hairpin relative to the core of the protein. It is the first time that such a large movement has been reported for a protein with a Whirly fold. Except for this loop, the Arabidopsis WHY2 displays great structural similarity to its potato homologue (PDB entry 3n1h ), with an r.m.s.d. varying between 0.9 and 1.9 for individual subunits (Fig. 3b). This suggests that the mitochondria-directed Whirly proteins adopt similar structures.
| || Figure 3 |
(a) Sequence alignment of the Whirly domains of mitochondria-directed Whirlies. AtWhy2, Arabidopsis WHY2; StWhy2, S. tuberosum WHY2. (b) Structural alignment of mitochondria-directed Whirly proteins in cartoon representation with WHY2 in yellow and potato WHY2 (PDB entry 3n1h ) in light blue. The black arrows point to the -hairpin encompassing residues 74-90.
Following the elucidation of the crystal structure of potato WHY2 bound to ssDNA, we proposed a general model for the binding of ssDNA by Whirly proteins (Cappadocia et al., 2010). Our present report of the crystal structures of the three Arabidopsis Whirly proteins offers a unique opportunity to verify the actual scope of this model. With this aim, we generated the Whirly tetramers by applying the crystallographic symmetry when necessary and aligned the three Arabidopsis Whirly structures with that of the potato complex (Fig. 4a). Generating the complete tetramer is important as each ssDNA-binding site encompasses two adjacent subunits. Only three clashes were observed between the Arabidopsis Whirly structures and the ssDNA. Importantly, these clashes were only observed for the side-chain moieties and could be prevented in all cases if the side chain adopted a different rotameric configuration. We also observed that the residues involved in ssDNA binding were either conserved or replaced by residues with similar biochemical propensities at equivalent positions in WHY1, WHY2 or WHY3, notably residues equivalent to Phe64, His139 and Lys153 of potato WHY2. The case of residue Trp100 (potato WHY2 nomenclature), however, merits further consideration. This residue interacts with the ssDNA nucleobases through hydrophobic interactions (Fig. 4b). In Arabidopsis WHY2 this residue is replaced by a methionine that can fulfil a similar role. In WHY1 and WHY3 this residue is replaced by an alanine. This enables a rotation of the tryptophan at position 110 that would place this residue in a good conformation to interact with the ssDNA nucleobases through hydrophobic interactions. Globally, our results suggest that all Arabidopsis Whirly proteins can interact with ssDNA through similar binding interfaces.
| || Figure 4 |
(a) Alignment of Arabidopsis Whirly ssDNA-binding sites. Proteins are shown in cartoon representation and residues equivalent to those of potato WHY2 (PDB entry 3n1i ) that contact ssDNA are shown in stick representation with C atoms in grey. Those of WHY1, WHY2 and WHY3 are shown in cyan, magenta and green, respectively. The potato WHY2 nomenclature was used for clarity. The ssDNA is shown in stick representation with its C atoms in yellow. (b) Close-up of residues Trp100 and Trp110. The representation is similar to that in (a).
We have elucidated the structure of the three Arabidopsis Whirly proteins. These structures demonstrate a high degree of structure similarity between plant Whirly proteins but also capture previously unforeseen movements of certain structural elements. The high structure similarity of plant Whirly proteins suggests that the capacity of Whirly to interact with ssDNA is dependent on a preformed DNA-binding platform.
The research carried out at the National Synchrotron Light Source (Brookhaven National Laboratory) was supported by the US Department of Energy, Division of Materials Sciences and Division of Chemical Sciences. Assistance by the X29 beamline personnel is gratefully appreciated. LC was supported by a scholarship from the Fonds Québécois de la Recherche sur la Nature et les Technologies. This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada to both NB and JS.
Abdelnoor, R. V., Yule, R., Elo, A., Christensen, A. C., Meyer-Gauen, G. & Mackenzie, S. A. (2003). Proc. Natl Acad. Sci. USA, 100, 5968-5973.
Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.
Cappadocia, L., Maréchal, A., Parent, J.-S., Lepage, E., Sygusch, J. & Brisson, N. (2010). Plant Cell, 22, 1849-1867.
Cappadocia, L., Sygusch, J. & Brisson, N. (2008). Acta Cryst. F64, 1056-1059.
Desveaux, D., Allard, J., Brisson, N. & Sygusch, J. (2002). Nature Struct. Biol. 9, 512-517.
Desveaux, D., Maréchal, A. & Brisson, N. (2005). Trends Plant Sci. 10, 95-102.
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486-501.
Evans, P. R. (1993). Proceedings of the CCP4 Study Weekend. Data Collection and Processing, edited by L. Sawyer, N. Isaacs & S. Bailey, pp. 114-122. Warrington: Daresbury Laboratory.
Kabsch, W. (2010). Acta Cryst. D66, 125-132.
Kwon, T., Huq, E. & Herrin, D. L. (2010). Proc. Natl Acad. Sci. USA, 107, 13954-13959.
Maréchal, A. & Brisson, N. (2010). New Phytol. 186, 299-317.
Maréchal, A., Parent, J.-S., Véronneau-Lafortune, F., Joyeux, A., Lang, B. F. & Brisson, N. (2009). Proc. Natl Acad. Sci. USA, 106, 14693-14698.
Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307-326.
Parent, J.-S., Lepage, E. & Brisson, N. (2011). Plant Physiol. 156, 254-262.
Rowan, B. A., Oldenburg, D. J. & Bendich, A. J. (2010). J. Exp. Bot. 61, 2575-2588.
Shedge, V., Arrieta-Montiel, M., Christensen, A. C. & Mackenzie, S. A. (2007). Plant Cell, 19, 1251-1264.