Structural analysis of DNA–protein complexes regulating the restriction–modification system Esp1396I

Comparison of bound and unbound DNA in protein–DNA co-crystal complexes reveals insights into controller-protein binding and DNA distortion in transcriptional regulation.


Introduction
Bacterial restriction-modification (RM) systems act as a form of primitive immune system and prevent the establishment of foreign DNA (such as bacteriophages and plasmids) within bacteria (Wilson & Murray, 1991). It has been proposed that RM systems play a key role during the process of horizontal gene transfer between bacteria (Akiba et al., 1960). An RM system is comprised of two complementary enzymes: a methyltransferase (M) to label 'self' DNA and an endonuclease (R) to cleave unlabelled ('non-self') DNA (Wilson & Murray, 1991). The plasmid-borne type II RM system Esp1396I has been well studied both in vitro and in vivo and reveals a temporal control mechanism that employs a controller protein (C-protein) encoded within the RM operon (Cesnaviciene et al., 2003;Bogdanova et al., 2008Bogdanova et al., , 2009. This temporal control is necessary for the correct function of RM systems and to prevent auto-restriction (i.e. endonucleolytic cleavage of the bacterial chromosome and pEsp1396I plasmid).
The controller protein C.Esp1396I, and indeed all other C-proteins studied to date, have been shown to be homodimeric helix-turn-helix proteins that bind to pseudo-symmetrical DNA operator sequences (Ball et al., 2009;McGeehan et al., 2005;Streeter et al., 2004;Kita et al., 2002;Sawaya et al., 2005). In C.Esp1396I and similar systems, it has been proposed that each DNA operator site comprises two 'C-boxes' having pseudo-dyad symmetry with the consensus sequence GACT and a short spacer sequence in between them that is generally comprised of alternating purine-pryrimidine sequences (Streeter et al., 2004;Knowle et al., 2005;Sorokin et al., 2009). Subsequently, it was found that the only specific contacts between C.Esp1396I and the C-boxes are to the GAC bases, so the C-box is perhaps better described as the trinucleotide GAC (and its symmetry-related sequence GTC) with the two C-boxes being separated by the spacer TATA, at least in the optimal binding site . In addition, there are sequence-specific contacts to a conserved TG motif outside the C-boxes.
C.Esp1396I binds to three subtly different DNA sequences with vastly different affinities (K d between 1 and 230 nM) that are located upstream of the C/R and M genes: O M (which regulates the expression of the M gene), and O L and O R (which together control the expression of both the C and R genes) (Bogdanova et al., 2009;Fig. 1). The X-ray crystal structure of C.Esp1396I has been determined to high resolution as the free protein (Ball et al., 2009) and as various protein-DNA complexes (McGeehan et al., 2008Ball et al., 2012). All of the C-protein-DNA complex structures reveal distortion of the DNA helix owing to compression of the minor groove, which is either induced or stabilized by the bound C-protein. Owing to symmetry-related averaging of the tetrameric C-protein-DNA complex in the crystal structure (McGeehan et al., 2008) further studies employed just single operator sites, to which a single C-protein dimer bound McGeehan et al., 2012). The O L sequence yielded the highest resolution C-protein-DNA complex structure to date and showed the binding interface in great detail . The subsequent O M C-protein-DNA complex  revealed conformational flexibility within the protein structure, enabling the protein to recognize different sequences but with quite different affinities. In contrast, the DNA was shown to have an almost identical structure in each case, with the overall bend angle being very similar to that of O L and closely resembling that observed in the C/R tetrameric complex.
Here, we present two novel crystal structures that show the operator DNA structure corresponding to the O R binding site, the lowest affinity of the three for C.Esp1396I. Each of these two structures, termed 19O R and 25O L , are nucleoprotein complexes comprising a C-protein dimer and a DNA duplex. The 19O R structure includes the entire O R C-protein binding site. The 25O L DNA sequence includes the O L sequence plus half of the O R C-protein binding site. The 25O L complex allows the observation of part of the free (unbound) O R sequence, unlike the previously published 35O L+R complex that has the complete O R sequence. In the latter complex, owing to the high cooperativity between sites, the C-protein forms a tetramer (i.e. two dimers) on the 35O L+R DNA (McGeehan et al., 2008).

Crystallization
Expression and purification of native C.Esp1396I was carried out as described previously (McGeehan et al., 2008). In brief, C.Esp1396I was overexpressed in Escherichia coli strain BL21 (DE3) pLysS using the pET-28b vector to introduce an N-terminal six-histidine sequence. C.Esp1396I was purified using nickel-affinity chromatography and size-exclusion chromatography. Prior to the crystallization trials, the six-histidine tag was removed using thrombin. The DNA oligonucleotides for crystallization of the 19O R complex (5 0 -TGTGT-GATTATAGTCAACA-3 0 and its complementary strand) and the 25O L complex (5 0 -ATGTGACTTATAGTCGTGTGATTA-3 0 and its complementary strand) were synthesized by ATDBio and Eurogentec, respectively, and were purified by RP-HPLC. The complementary oligonucleotides were annealed by heating to 353 K followed by cooling and the duplexes were further purified using gel electrophoresis. Initial cocrystallization was carried out using a HoneyBee X8 crystallization robot (Cronus Technologies) and sparse-matrix screening using the PACT Premier and JCSG+ screens (Molecular Dimensions Ltd) at varying molar ratios of C.Esp1396I to DNA duplex. Crystals of the 19O R complex formed by vapour diffusion in 0.1 M propionic acid, sodium cacodylate and bis-tris propane (PCB) buffer pH 4.0 with 25%(w/v) PEG 3350 at a molar protein:DNA ratio of 1:1. However, these crystals were of insufficient size for diffraction experiments, so a microseeding approach was employed (D'Arcy et al., 2007). This produced much larger crystals in 0.1 M PCB buffer pH 5.0, 25%(w/v) PEG 3350, 10 mM spermidine. The crystals were confirmed to contain both protein and DNA by washing them and subsequently dissolving them in dH 2 O before taking a UV absorbance spectrum. Crystals of the 25O L complex formed in 0.1 M PCB buffer pH 4.0, 20%(w/v) PEG 1500, 10 mM spermidine at a molar protein:DNA ratio of 2:1.

X-ray diffraction data collection and refinement
The 19O R and 25O L crystals were transferred to a cryoprotectant containing 25%(v/v) glycerol or 20%(v/v) PEG 400, respectively, and flash-cooled in liquid nitrogen. For the 19O R crystal, 180 images of 1 oscillation were collected on beamline I02 at the Diamond Light Source (DLS), Oxfordshire at a wavelength of 0.98 Å using an ADSC Quantum 315r CCD detector at 100 K. For the 25O L crystal, 120 images of 1 oscillation were collected using an ADSC Q4R CCD detector at 100 K on beamline ID14-4 at the ESRF, Grenoble.
The data were processed using either MOSFLM (Leslie, 1992) and AIMLESS Evans, 2006Evans, , 2011 or XDS and XSCALE (Kabsch, 2010) and a molecular-replacement solution was found by Phaser (McCoy et al., 2007) using the native free protein structure as a search model (Ball et al., 2009; PDB entry 3g5g). The DNA was built by hand in Coot (Emsley & Cowtan, 2004) and was subsequently refined using REFMAC5 (Murshudov et al., 2011) and phenix.refine (Afonine et al., 2005). Data-processing and refinement statistics are summarized in Table 1. The completed models were deposited in the PDB with accession codes 4i8t (19O R ) and 4iwr (25O L ).

X-ray diffraction and structure solution
The 19O R crystals showed weak isotropic diffraction extending to 3 Å resolution. The scaling program AIMLESS (Evans, 2006(Evans, , 2011Winn et al., 2011) gave a high R merge for the outer shell, but inspection of the electron-density maps and use of the CC 1/2 metric (Karplus & Diederichs, 2012) gave a clear indication that the data were acceptable to 3 Å resolution, with a final R work and R free of 0.28 and 0.36 for the outer shell. The structure was refined in space group C2 with one copy of the complex per asymmetric unit (Fig. 2). The resulting 2F o À F c maps were of good quality for the resolution (Fig. 3).
The best crystals of the 25O L complex diffracted to $2.3 Å resolution. The structure was refined in space group P3 2 with two copies of the complex per asymmetric unit. The DNA was easily modelled into the electron density for the section that was bound to C.Esp1396I . However, owing to the high degree of flexibility of the additional six base pairs, these were more difficult to model and were primarily based on the positions of the backbone phosphate groups since these gave much higher peaks in the electron density relative to the bases. This flexibility resulted in B factors of approximately 130 Å 2 in this unbound section of the DNA compared with an average B factor of approximately 15 Å 2 in the protein-bound portion of the DNA ( Supplementary Fig. S1 1 ). DNA groove-width analysis (Fig. 4) was performed using the Curves + server (Lavery et al., 2009).

The 19O R structure
The overall fold of C.Esp1396I in the 19O R structure closely matches that of the free protein structure (PDB entry 3g5g; Ball et al., 2009) Representative 2F o À F c electron-density maps. (a) Base pairs of T14 and C15 of chain C with G6 and A7 of chain D from the 19O R DNA. (b) Base pairs between chain G (C7 and T8) and chain H (A18 and G19) from the 25O L DNA. Hydrogen bonds are shown as dashed lines. 2F o À F c electron-density maps are contoured at 0.16 and 0.32 e Å À3 for 19O R and 25O L , respectively. The images were generated using PyMOL. However, owing to the limited resolution, not all side chains could be placed with high confidence other than those that are highly ordered and binding to symmetry-related protein chains or to the DNA. Surprisingly, the protein dimer does not bind to the DNA in the usual manner via the helix-turn-helix (HTH) motif; instead, it binds 'end-on' to the DNA helix, resulting in very few protein-DNA interactions (Fig. 2b). This non-biological complex reflects the low intrinsic binding affinity at a single O R site. It is only when a C-protein dimer is bound to the adjacent O L site that the protein binds in the expected manner (as observed in the complex with the 35 bp O L + O R operator DNA). This arises from the high degree of cooperativity that increases the affinity for the O R site by two orders of magnitude when a C-protein dimer is bound at the O L site.
In the 19O R crystal structure, each protein dimer contacts four DNA duplexes and two protein subunits belonging to adjacent asymmetric units. The protein-protein contacts involve two tyrosines (Tyr29 from each subunit) stacking against each other in a manner similar to that previously observed, but with the addition of hydrogen bonds between Tyr29 and Glu25 and Asp26 (Ball et al., 2009McGeehan et al., 2012). The only clear contacts between the C-protein and the DNA occur between the protein side chains and the phosphate groups in the DNA backbone.
The overall conformation of the DNA duplex in the 19O R structure does not conform to the canonical B-form; it is significantly distorted and resembles the biologically bound conformation previously observed in the 19O L structure (Figs. 4 and 5). The overall bend of 42 is a little less than that observed in the biologically bound O L complex (54 ), but the DNA retains the reduced minor groove in the central spacer between the two C-boxes, despite the lack of significant interactions with the HTH motif. The bend in the DNA is centred at the TATA sequence between the C-boxes (Fig. 1), as observed in other C-protein complexes. The bent DNA structure that we observe here is most likely to reflect a natural propensity to bend at this sequence, and in biologically relevant complexes is enhanced and stabilized by interactions with the HTH motif, as observed in the tetrameric complex and in the higher affinity O L and O M complexes (McGeehan et al., 2008Ball et al., 2012).

The 25O L structure
There were no significant differences between the conformations of the two complexes in the asymmetric unit. The 25O L protein structure (Fig. 2a) closely resembles that of the previously published 19O L protein-DNA complex structure, with an overall r.m.s.d. of 0.48 Å for the main-chain atoms of the protein monomers and 0.92 Å for the corresponding 18 bp of the DNA (Fig. 5). The same specific and nonspecific protein-DNA contacts were visible in the structure. However, owing to the longer DNA component of the complex, the crystal-packing interactions between the proteins are markedly different.
The only observable protein-protein contacts between crystallographic symmetry-related dimers again involve the stacking of Tyr29 side chains, together with a hydrogen bond between Tyr29 and Asp26 of the symmetry-related subunit. There are very few protein-DNA interactions between chains that are not within the biological complex and all involve interactions between protein side chains and phosphate groups on the DNA backbone. The crystallographic DNA-DNA interactions between symmetry-related molecules are limited to stacking between the terminal base pair A1-T25 (chains C and D) and the corresponding A-T base pair of chains G and H. This causes the DNA to form a pseudo-continuous double helix.
The width of the major groove in the 25O L DNA varies from 10 to 15 Å in a sequence-dependent manner (Fig. 4). Likewise, the minorgroove width varies from 2 to 9 Å . The portion of the 25O L structure that contains the first C-box (O L ) overlays very closely with the relevant sequence in the 35O L+R tetramer structure, with an r.m.s.d. of 0.92 Å (Fig. 4). The remainder of the DNA that is not bound by the protein also follows a similar path to that of the DNA in the tetrameric complex. It is noteworthy that the major groove that is significantly widened in the centre of the tetrameric 35 bp complex is also widened in the equivalent region of the 25O L complex, even though this region of the DNA is unbound (Figs. 4 and 5).

Discussion
These novel protein-DNA complexes enable comparison of the conformation of the DNA sequence before and after C-protein binding. C-proteins, in common with many helix-turn-helix DNAbinding proteins, bend and distort their DNA-binding sites in order to access the bases for sequence recognition (Kita et al., 2002;Papapanagiotou et al., 2007;McGeehan et al., 2008McGeehan et al., , 2012Ball et al., 2012). The 19O R structure presented here shows that even in the absence  of specific protein-DNA contacts the DNA sequence at the O R operator is compressed at the minor groove, greatly reducing the energy penalty of DNA distortion following C-protein binding. Using circular dichroism, it has been shown that significant structural deformation of the DNA occurs when the controller protein C.AhdI binds its operator sequence in solution (Papapanagiotou et al., 2007). Presumably, the same will apply to the O L and O M operators of the Esp1396I RM system, which all contain the central TATA sequence.
The observed path of the DNA within the 25O L complex supports the proposal that the binding of the first C-protein to the O L site assists in opening up the major groove of the O R site in preparation for binding the second C-protein dimer, thus compensating for the weaker intrinsic binding affinity of the O R site. This provides a significant component of the observed cooperativity of binding between the two adjacent operator sites, in addition to specific protein-protein contacts between adjacent dimers (McGeehan et al., 2008). A similar mechanism based on DNA distortion has been proposed for the cooperative binding of the QacR transcriptional regulator to its operator site (Schumacher et al., 2002), but in this case there were no additional protein-protein interactions contributing to the cooperativity. The downstream effects of binding one protein dimer on the structure of the adjacent DNA, thereby enhancing its DNA-binding affinity for a second protein dimer, could represent a more general mechanism of transcriptional control.