Structure of the AvrBs3–DNA complex provides new insights into the initial thymine-recognition mechanism

The crystal structure of the AvrBs3–DNA complex is reported.

Transcription activator-like effectors contain a DNA-binding domain organized in tandem repeats. The repeats include two adjacent residues known as the repeat variable di-residue, which recognize a single base pair, establishing a direct code between the dipeptides and the target DNA. This feature suggests this scaffold as an excellent candidate to generate new protein-DNA specificities for biotechnological applications. Here, the crystal structure of AvrBs3 (residues 152-895, molecular mass 82 kDa) in complex with its target DNA sequence is presented, revealing a new mode of interaction with the initial thymine of the target sequence, together with an analysis of both the binding specificity and the thermodynamic properties of AvrBs3. This study quantifies the affinity and the specificity between AvrBs3 and its target DNA. Moreover, in vitro and in vivo analyses reveal that AvrBs3 does not show a strict nucleotide-binding preference for the nucleotide at the zero position of the DNA, widening the number of possible sequences that could be targeted by this scaffold.

Introduction
Transcription activator-like effectors (TALEs) compose a family of virulence proteins that act as transcriptional activator factors in plant cells (Boch & Bonas, 2010). TALEs are organized into three different domains: an N-terminal region involved in protein translocation by the bacterial secretion system , a central DNA-binding domain and a C-terminal region that contains both the nuclear localization signal sequence and an acidic transcriptional activation region. The central DNA-binding domain is composed of an array of tandem repeats which recognizes the DNA target. The repeats contain a conserved sequence of 30-42 residues constituting a new DNA-binding motif (Boch & Bonas, 2010). The number of repeats in the DNA-binding domain of the TALE proteins ranges between 1.5 and 33.5; the last repeat of the DNA-binding domain only contains half of the residues (Boch & Bonas, 2010;Boch et al., 2009). It is most likely that the smaller TALEs are nonfunctional, as a minimum number of 6.5 repeats is needed to induce targetgene expression (Boch et al., 2009). The amino-acid sequence of each repeat is well conserved, with the exception of two contiguous amino acids at positions 12 and 13 known as the repeat variable di-residue (RVD). The DNA bases recognized by each repeat are specified by the amino-acid sequence of the RVDs, establishing a direct code between these pairs of amino acids in each repeat and the nucleotides in the target sequence (Boch et al., 2009;Moscou & Bogdanove, 2009).
More than 20 different RVDs have been identified in the different TALEs examined to date (Cong et al., 2012).
However, some of the dipeptides can bind several bases, promoting a degeneration of the protein-DNA recognition code (Boch et al., 2009;Streubel et al., 2012). Other residues outside the RVD dipeptides do not show a significant effect on basepair specificity (Moscou & Bogdanove, 2009;Morbitzer et al., 2010). Structural data showed that only the residue in position 13 of the RVD makes specific contacts for target DNA recognition, while the amino acid at position 12 seems to stabilize the repeat (Deng et al., 2012;Mak et al., 2012). The simple RVD nucleotide code allows the design of new TALEs generated with new repeat combinations. The assembly of several repeats in redesigned TALEs recognizing new DNA targets has confirmed the modularity of these DNAbinding domains and their use in biotechnological applications Miller et al., 2011). The TALE-DNA interaction orients the protein repeats in the Nterminal to C-terminal direction contacting the 5 0 -3 0 -sense DNA strand. All of the natural targets contain a 5 0 T (also known as T 0 ) preceding the recognized DNA, which has been reported to be important for TALE activity .
Here, we present the crystal structure of the DNA-binding region of AvrBs3 from Xanthomonas campestris bound to its target sequence present in the pepper Bs3 promoter, including the N-terminal region interacting with the initial thymine. The structure reveals a new mode of interaction of this domain with T 0 . These data, together with analysis of the protein-DNA binding in vitro and the activity in vivo, suggest that AvrBs3 is able recognize its target despite the base at position zero.  Crystal structure of the AvrBs3-DNA complex. (a) Sketch of the TALE domain structure. The N-terminus is coloured green and the C-terminus is coloured blue. The central DNA-binding domain contains the RVDs involved in DNA recognition. The sequence of the first repeat (cyan) of AvrBs3 is shown, indicating the position of each amino acid in the repeat. The sequence of the oligonucleotide used in crystallization and the different dipeptides is depicted below. (b) Cartoon representation of the protein-DNA structure perpendicular to the DNA target (upper panel) and along the DNA helix (lower panel). The helices of AvrBs3 are shown as cylinders and repeats À1, 0, 1, 2 and 3 are coloured red, lime, cyan, green and beige, respectively, to indicate the initial DNA domain building blocks. The duplex oligonucleotide is represented in stick mode and the sense-strand sequence is indicated with the corresponding dipeptide for each nucleotide. The dipeptides are represented using the single-letter amino-acid code. (c) Electrostatic surface representation of the AvrBs3 protein in complex with its DNA target. The upper panel depicts the electropositive strip (blue) on the superhelical arrangement running from the amino-terminus to the carboxyl-terminus. The electronegative strip (red) running along the protein in the opposite side is depicted in the lower panel. The sense strand is coloured magenta and the antisense strand is coloured green.

Protein expression and purification
The AvrBs3 used in this study contains some modifications that seem to improve protein expression without altering specificity. The NI RVDs were used to target adenine (except for the first repeat) and one of the NG RVDs differs from the wild-type sequence. The coding cDNA for AvrBs3 was cloned into a pET-derived vector and transformed into Escherichia coli BL21. The cells were grown in LB medium at 310 K and protein expression was induced with 1 mM IPTG for 2 h when the culture reached an OD at 600 nm of 1. The induced cells were disrupted by sonication in buffer A (50 mM HEPES pH 8.0, 150 mM NaCl, 0.5 mM imidazole, 0.5 mM TCEP). Cell debris was removed by centrifugation and the supernatant was loaded onto an Ni-NTA (GE Healthcare) column. After extensive column washing, the protein was eluted with a linear gradient to 500 mM imidazole. Fractions containing the protein were pooled and loaded onto a heparin column (GE Healthcare) equilibrated in buffer B (50 mM HEPES pH 8.0, 150 mM NaCl, 0.5 mM TCEP). The protein was eluted using a linear gradient to 1 M NaCl. The fractions containing the protein were pooled and loaded onto a Superdex 200 (GE Healthcare) gel-filtration column equilibrated in buffer A. Protein fractions were pooled and stored at 193 K.

Crystallization, data collection, structure solution, model building and refinement
The purified AvrBs3 DNA-binding domain was crystallized in complex with a 21-base-pair DNA duplex containing the target sequence of the pepper Bs3 promoter (see Fig. 1a for the oligonucleotide sequence and Supplementary Fig. S1 1 for the protein sequence). The 21 bp DNAs (IDT) for crystallography were annealed by slow-cooling in 25 mM HEPES pH 8.0, 150 mM NaCl at a final duplex concentration of 0.5 mM. A 1.2 molar excess of TALE AvrBS3 relative to the DNA was incubated on ice for 10 min at a protein-DNA concentration of 7 mg ml À1 in a solution consisting of 25 mM HEPES pH 8, 150 mM NaCl, 0.2 mM TCEP. The complex was dialyzed against 20 mM MES pH 6.0, 100 mM NaCl, 5 mM MgCl 2 at 277 K for 1 h. The crystallization was performed with 0.8 ml sitting drops using a Cartesian 4000 XL robot. Optimal quality crystals were grown at 298 K using a 1:1 ratio of complex solution and reservoir solution [100 mM MES pH 6.5, 5-15%(v/v) PEG 3350, 5-15%(v/v) 2-propanol]. Crystals grown after 2-3 d were cryoprotected by adding 30%(v/v) 2-propanol to the mother liquor and were flash-cooled in liquid nitrogen. Diffraction data were collected at 100 K using synchrotron radiation on the PXI-XS06 beamline at SLS, Villigen, Switzerland. Data-processing and scaling were accomplished with XDS (Kabsch, 2010; Table 1). Initial phases were obtained by combining information from a Ta 6 Br 12cluster single-wavelength anomalous diffraction (SAD) data set and molecular-replacement information obtained from a previous partial model obtained using Phaser (McCoy et al., 2007) (see Table 1 and Supplementary Fig. S2). The anomalous Patterson showed the presence of two possible sites, and two Ta 6 Br 12 clusters were found using the SHELX package (Sheldrick, 2008). The search model for molecular replacement was based on a polyalanine backbone of three RVD repeats derived from PDB entry 3v6t (Deng et al., 2012). The initial molecular-replacement phases displayed density that was not well defined in several protein regions. The combination of the heavy-atom cluster phases with the molecular-replacement solution yielded an improved electrondensity map. These initial phases were extended to 2.55 Å resolution using a higher resolution native data set with the AutoBuild routine in PHENIX (Adams et al., 2010). The structure was built and subjected to iterative cycles of model building with Coot (Emsley et al., 2010) and refinement by combining REFMAC (Murshudov et al., 2011) and PHENIX (Adams et al., 2010). Identification and analysis of the protein-DNA hydrogen bonds and van der Waals contacts was performed with the Protein Interfaces, Surfaces and Assemblies service (PISA) at the European Bioinformatics Institute (http://www.ebi.ac.uk/msdsrv/prot_int/pistart.html).

Fluorescence anisotropy
The dissociation constants between the TALE protein and DNA were estimated from the change in fluorescent polar-  ization upon protein addition using oligonucleotides that were 6-FAM-labelled at the their 5 0 -end. The optimal concentration of the 6-FAM-DNAs was determined empirically by measuring the fluorescence polarization of serially diluted 6-FAM-DNA samples . The concentration of the 6-FAM-labelled DNAs ranged between 20 and 40 nM and that of the TALE protein was increased to 1000 nM. Both proteins and DNAs were dialyzed in buffer consisting of 25 mM HEPES pH 8, 150 mM NaCl, 0.2 mM TCEP. After incubation at 298 K for 10 min, the fluorescence polarization was measured in a black 96-well assay plate with a Wallac 1420 VICTOR2 multilabel counter (PerkinElmer). The fitting of the data and the K d calculations were performed as described in Molina et al. (2012). For the competitive binding assay, the concentration of the 24 bp nonspecific DNA duplex (5 0 -TCAGACTTCTCCACAGGAGTCAGA-3 0 ) was 100 mM.

Isothermal titration calorimetry assays
Isothermal titration calorimetry (ITC) experiments were conducted at 298 K using a MicroCal ITC200 instrument (Microcal GE Healthcare, UK). The buffer consisted of 25 mM HEPES pH 8, 150 mM NaCl, 0.2 mM TCEP. To ensure minimal buffer mismatch, protein and DNA samples were dialyzed against the same buffer. The syringe for the ligand contained DNA duplexes in the concentration range 0.06-0.2 mM. The thermostatic cell contained the TALE protein in the concentration range 0.006-0.02 mM. Competitive binding studies were carried out using the strong-binding ligand A (target DNA) as the injectant, with the solution in the cell containing the second competitive ligand B (competitor DNA) as well as the TALE (T). This system then has two equilibria that are displaced with each injection: The values of K B and ÁH B for the competing ligand were first measured in a conventional ITC experiment, and these parameter values are entered as known parameters when determining K A from the results of the competition experiment. For the competition experiment, the total concentration of competitor [B] tot was calculated using the formula where 'K A ' is the estimated association constant of the TALE for the target DNA obtained in the best concentration range (10 5 -10 8 M À1 ) for measurements for ITC. The thermostatic cell contains the TALE protein in the concentration range 0.006-0.01 mM and competitor DNA at a concentration of 0.005 mM. The syringe for the ligand contained the DNA duplex in the concentration range 0.06-0.1 mM. The experiments consisted of a series of 4 ml injections of DNA into 200 ml protein solution in the thermostatic cell with an initial delay of 60 s, a 4 s duration of injection and a spacing between injections of 180 s. The corrected binding isotherms were fitted using a single-site and competitive-binding model nonlinear least-squares analysis with the Origin 7.0 software (MicroCal) to obtain values of the equilibrium binding constant (K A ), stoichiometry (n) and enthalpy changes (ÁH) and the TÁS associated with DNA binding. The K d was the inverse of the calculated K A and the associated error was estimated using an error-propagation calculator (http://laffers.net/tools/ error-propagation-calculator/).

Mating of TALE nuclease (TALEN) expressing clones and screening in yeast
The yeast strain expressing the TALEN to be assayed is mated with a strain harbouring a reporter plasmid containing the chosen target, which is flanked by overlapping truncated lacZ genes (LAC and ACZ). Upon target cleavage, tandemrepeat recombination restores a functional lacZ gene that can be monitored using standard methods. TALENs were gridded on nylon filters covering YPD plates using a high gridding density (about 20 spots cm À2 ). A second gridding process was performed on the same filters to spot a second layer consisting of reporter-harbouring yeast strains for each target. Membranes were placed on solid agar YPD-rich medium and incubated at 303 K overnight to allow mating. Next, the filters were transferred to a synthetic medium lacking leucine and tryptophan with galactose (2%) as a carbon source and were incubated for 5 d at 310 K to select for diploids carrying the expression and target vectors. After 5 d, filters were placed on solid agarose medium with 0.02%(w/v) X-Gal in 0.5 M sodium phosphate buffer pH 7.0, 0.1%(w/v) SDS, 6% dimethylformamide (DMF), 7 mM -mercaptoethanol, 1%(w/v) agarose and incubated at 310 K to monitor -galactosidase activity. Results were analysed by scanning and quantification was performed. -Galactosidase activity is directly associated with the efficiency of homologous recombination. Experiments using several purified I-CreI mutants with various recombination activities in yeast have shown that the recombination efficiency quantified in yeast (Afilter value) is correlated with the cleavage activity in vitro (Arnould et al., 2007;Grizot et al., 2009). 3. Results and discussion 3.1. Overall structure of the AvrBs3-DNA complex The structure of the protein-DNA complex was solved by combining a Ta 6 Br 12 SAD data set and a molecular-replacement solution ( Table 1). The model was refined to 2.55 Å resolution. The crystallized protein includes residues 152-895 ( Supplementary Fig. S1) of AvrBs3 and a 21-base doublestrand oligonucleotide with a T overhang at the 5 0 -end of the sense strand (Figs. 1a and 1b), displaying a relatively unperturbed B-form DNA with an overall wider major groove ( Supplementary Fig. S2). The electron density for the 30 amino-terminal and carboxyl-terminal residues is fuzzy owing to protein flexibility. However, the quality of the electron density is excellent from the first repeat until the middle of repeat 17 in residue 830; in the N-terminal region the electron density is defined such that side chains can be observed from research papers residue 230 onwards. The superhelical arrangement of the 17.5 AvrBs3 repeats is intimately engaged in binding the majorgroove nucleotides of the DNA molecule (Figs. 1b and 2). All of the repeats in the DNA-bound AvrBs3 structure form highly similar two-helix bundles (Fig. 1b). The helices span positions 3-11 and 14-33 in the repeat, locating the RVD (positions 12 and 13; see Fig. 1a) in the loop that joins them. The proline at position 27 creates a kink in the second helix that appears to be critical for sequential packing and association of tandem repeats with the DNA double helix. The protein shows a left-handed packing of the consecutive helices within and between the individual repeats.

Protein-DNA interaction
Interestingly, an electropositive strip runs along one side of the superhelical AvrBs3 arrangement (Deng et al., 2012) Deng et al., 2012) and AvrBs3 (orange) protein structures. The DNA moiety is omitted for clarity. Differences among the three TALE structures in the N-terminal region (including the 0 and À1 repeats) can be observed (see Results and discussion and Supplementary Fig. S5).
an electronegative strip is observed on the opposite side (Fig. 1c). This positive polar band is built by a lysine at position 16 in each repeat and involves nonspecific interactions with the phosphate backbone of the DNA sense strand, whereas the negative band, built mainly by the glutamates at position 4 of the repeats with the collaboration of some of the aspartates at position 13, lies in the neighbourhood of the antisense DNA strand (Fig. 1c). The arrangement of these polar bands along the protein suggests a possible mechanism to facilitate the recognition of the nucleotides in the sense strand by the RVDs while avoiding interference from the nucleotides in the other strand. In fact, the antisense strand does not display contacts with the protein (Supplementary Fig.  S3).
The sequence-specific contacts of TALE AvrBs3 with the DNA are exclusively made by the residue at position 13 in each RVD interacting with the corresponding base on the sense strand ( Supplementary Fig. S4). In contrast, the side chain of the residue at position 12 of each RVD contacts the backbone carbonyl O atom at position 8 in each repeat, constraining the RVD-containing loop. Additionally, the positions within the core of individual repeats are occupied entirely by small aliphatic residues, whereas several positions in the interface between repeats correspond to polar residue pairs.

Dipeptide-DNA interaction
The AvrBs3-DNA structure displays seven HD RVDs. The pair in the first repeat contains a unique HD associated with adenine (Fig. 2a). The other HD dipeptides interact with cytosines along the target sequence. The rest of the adenines are associated with NI dipeptides and the four thymines interact with NG dipeptides (Fig. 1a). The observed contacts for each repeat (Supplementary Figs. S3 and S4) shed light on the molecular basis of their different specificity and fidelity, which has been described via computational and genetic analyses (Moscou & Bogdanove, 2009;Streubel et al., 2012). The HD RVD contacting A 1 displays a hydrogen bond (3.01 Å ) between the side chain of Asp301 in the first RVD and the NH 2 group of the base. The interaction is the same as when the HD recognizes a cytosine. In contrast, His300 in the initial RVD does not interact with the DNA and its side chain contacts the main-chain backbone of the following repeat, stabilizing the interface between the first and the second repeats (Fig. 2a).
The rest of the HD RVDs show the aspartate residues associated with the NH 2 group of the cytosines through hydrogen bonds ranging from 2.95 to 3.5 Å in length along the sense DNA strand. Contacts between cytosine and acidic side chains exclude alternative base recognition via steric and electrostatic clashes (Rohs et al., 2010). The NI dipeptide exhibits an unusual interaction pattern with the other adenines. The aliphatic side chain of the isoleucine residue makes nonpolar van der Waals interactions with the purine ring, and the asparagine residues play a role similar to that of the histidine in the HD dipeptides, stabilizing the inter-repeat interaction. The fact that five of them are grouped in two regions containing three and two consecutive adenines extends these two similar interaction areas along the DNA target. Finally, the NG repeats associate with the thymines through nonpolar van der Waals interactions of the glycine main chain with the methyl group of the base. This interaction is barely observed in T 18 owing to disorder of the last repeat.
The TALE repeats seem to be organized into two regions interacting with the sense stand, whereas the antisense strand does not display any protein contacts. The first region, which is involved in indirect readout, is composed of a lysine and a glutamine at positions 16 and 17 of the repeat and interacts with the sense-strand DNA backbone (Fig. 1a). This arrangement is conserved both in PthXo1 and dHax3. The second region involves the RVDs, which interact directly with the bases. Among the structurally characterized RVDs in the different structures available, NN and HD form hydrogen bonds to their target nucleotides, while NI and NG associate with their target bases through van der Waals interactions. Thus, the energy involved in the different interactions establishes a hierarchy between the different RVDs (Streubel et al., 2012). Nevertheless, even the HD dipeptide, which shows a preferential interaction with cytosine and is one of the energetically selective RVDs, can accommodate adenine through a hydrogen bond to its NH 2 group (Fig. 2a), suggesting a certain promiscuity of the dipeptides even in the energetically more selective RVDs. Although an RVD-nucleotide preference exists, the dipeptide-base interactions do not build a strict binary code since the same dipeptide can interact with different bases, promoting a certain degree of degeneration of the protein-DNA recognition. Therefore, the energetic contribution of the different dipeptides during binding seems to be crucial to generate a selective TALE, suggesting that TALE specificity would depend on the energy balance between the region involved in indirect readout (Fig. 1c) and the contribution of the RVD.

The T 0 recognition
A possible limitation to engineering new recognition sequences in this scaffold arises from the presence of a T at the zero position of the target DNA at the 5 0 -end. This base interacts with the N-terminus of the protein and appears to be critical for the TALE-DNA interaction (Boch et al., 2009;. Although the crystal structure of the TALE dHax3 DNA-binding domain lacks the N-terminal domain (Deng et al., 2012), the structure of the PthXo1-DNA complex suggests that the conserved Trp232 is involved in the recognition mechanism of the T 0 at the 5 0 -end (Mak et al., 2012). However, this residue does not display direct interactions with the base. The N-terminal region of AvrBs3 reveals that two degenerate repeats seem to cooperate to interact with the conserved thymine that precedes the RVD-specified sequence (Fig. 2b). We termed these the 0 and À1 repeats (Fig. 1b, Supplementary Fig. S1) composed of residues 225-254 and 255-288, respectively. They contain an arginine residue (Arg266 and Arg236, respectively) at position 13 that research papers interacts with the DNA (Fig. 2c). The side chains of these residues converge near the adjacent T À1 and T 0 bases, contacting the methyl groups of these bases through van der Waals interactions. Moreover, the side chains of Thr270, Gln305 and Gly267 are involved in a network of hydrogen bonds surrounding the phosphate of T 0 . In contrast to PthXo1, the Trp232 in AvrBs3 is located four positions away from the DNA. This difference between PthXo1 and AvrBs3 arises from a different conformation in the protein section preceding these residues, which is more elongated in the AvrBs3  structure, displacing Trp232 away from the DNA (Fig. 2d and Supplementary Fig. S5).
These differences could arise from the intrinsic flexibility of the TALE repeats, which seem to display a large conformational change (Murakami et al., 2010). This flexibility has also been observed in assemblies of these repeats composing a DNA-binding domain in the absence of nucleic acid by SAXS (Murakami et al., 2010). In addition, the crystal structure of dHax3 without its target DNA (Deng et al., 2012) shows an elongated shape, suggesting that the protein conformation is adjusted to the target DNA and is stabilized upon nucleic acid binding. This flexible behaviour of the TALEs could facilitate DNA binding. On the other hand, the fact that the crystallized proteins lack a section of the N-terminal sequence could favour these conformational changes.

TALE-DNA binding and thermodynamics
To characterize the thermodynamic parameters of the interactions between AvrBs3 and its target DNA, we quantitatively analyzed their association by fluorescence anisotropy (FA) and isothermal titration calorimetry (ITC) (Fig. 3). Oligonucleotides with different lengths containing the target sequences were initially tested (Supplementary Figs. S6 and S7), and the 21 bp probe was selected as the minimum binding length for specific recognition of the TALE AvrBs3 (Fig. 3a). The K d values measured by ITC display higher values by a factor of around 2-4 compared with the FA experiments. This difference is consistent with experimental variations and could arise from the physical properties measured in each approach, which require a different range of concentrations. However, despite these differences both techniques show the same tendencies for the same set of experiments. To examine the ability of the TALE AvrBs3 to discriminate between target DNA and other DNA sequences, we performed both FA and ITC experiments in the presence of a 24 bp nonspecific DNA (see Supplementary Material).
The TALE AvrBs3 shows binding to the Bs3 duplex oligonucleotide with a dissociation constant of 33 nM by FA (Fig. 3b). The TALE-DNA association is not affected when the affinity is measured in the presence of competitor DNA (K d,FA = 36.5 nM; Fig. 3d). The ITC binding measurements show the same behaviour (Figs. 3c and 3e). In addition, the ITC revealed that the protein-DNA association is exothermic (ÁH = À31.6 kcal mol À1 ) and the stoichiometry is close to one. The measurement of the reaction in the presence of competitor DNA (see Materials and methods and Figs. 3d and 3e) showed only minor variations in the thermodynamic parameters, indicating that the TALE AvrBs3 is able to bind its DNA target with high specificity in a spontaneous reaction (Fig. 3f).
3.6. Functional relevance of the nucleotide in position 0 TALE proteins bind to the promoter regions enhancing and modulating the transcription of plant genes (Boch & Bonas, 2010). For example, the PthXo1 binding site is downstream of the TATA box, while the T at position 0 for the AvrBs3 target appears to be part of the TATA box. The initial T 0 position in the target sequence has been reported to be an important nucleotide for TALE function (Boch et al., 2009;Rö mer et al., 2010) and for the binding of the protein to its target (Mahfouz et al., 2011). The recognition of this nucleotide involves the less well conserved repeats À1 and 0. However, we did not observe direct interactions of these repeats with the base of T 0 in the AvrBs3-DNA structure. A similar situation was detected in the PthXo1-DNA structure, in which T 0 does not show direct interactions with the protein (Mak et al., 2012). Instead, in AvrBs3 we observed a new conformation that allows the interaction of the N-terminal domain with T 0 (Figs. 2b and 2c).
To address the preferences of AvrBs3 for the nucleotide at position 0 of its target DNA, we assessed its binding and thermodynamic parameters using Bs3 A 0 , G 0 and C 0 oligo-   nucleotides in which T 0 was substituted by the corresponding base (Figs. 3b, 3c and 3f). TALE AvrBs3 binds Bs3 A 0 and C 0 with a similar K d to the original T 0 . The presence of competitor DNA barely altered the affinity (Figs. 3d, 3e and 3f). Only G 0 displayed an increase of fourfold; however, the binding still showed a reasonable affinity that was hardly disturbed by the presence of competitor DNA. The ITC data support the FA binding measurements, although with small variations in the ÁH of the reaction. Thus, AvrBs3 is able to recognize and bind its DNA target with similar thermodynamic parameters when T 0 is substituted by C 0 . When a bulkier base such as A is found at position 0, the K d of the reaction is slightly higher using both methods (Figs. 3b, 3c and 3f) and the ÁH of the reaction is less negative, indicating that even though the binding reaction is less efficient the presence of the larger base does not hamper binding. Only the presence of G 0 showed an affinity decrease that did not impede target recognition even in the presence of competitor DNA (Figs. 3b-3f). Hence, in vitro AvrBs3 can bind its target sequence including the C 0 , G 0 and A 0 mutations with an affinity similar to the wild-type DNA target. The presence of a bulkier base at the 0 position does not seem to hamper binding to the AvrBs3 target in vitro, which is in agreement with the protein-DNA interactions in this region, which involve only the DNA backbone.
To analyze this effect in vivo, we used a single-strand annealing assay (SSA) to assess whether changes at the zero position of the target sequence could affect the binding of an AvrBs3 fused to a FokI nuclease domain, disturbing its activity (see Materials and methods; Arnould et al., 2007;Cermak et al., 2011;Grizot et al., 2009;Fig. 4a). The homodimeric Bs3 target sequence containing either T 0 , A 0 , C 0 or G 0 was inserted into an episomal plasmid to assess the preference of AvrBs3 for the nucleotide at position 0. The assay showed that AvrBs3-FokI (TALEN) was able to target its site independently of the nucleotide at position 0. Although the T 0 -containing target displays the higher activity (Fig. 4b), our results suggest that T 0 substitution does not seem to exhibit a large effect in TALE binding and only the G 0 target, with the bulkiest base, displayed a marked decrease in activity, in agreement with the in vitro binding measurements, suggesting that this approach could be employed to estimate the efficiency of in vivo applications. Functional analysis in engineered TALE activating the endogenous human NTF3 gene has also shown that constructs containing repeats À1 and 0 can target DNA sequences lacking T 0 in vivo (Miller et al., 2011). Hence, our in vitro data suggest that T 0 may not play an essential role in DNA binding of the TALE AvrBs3, increasing the number of DNA sequences that could be targeted by this scaffold. A possible explanation for the prevalence of T at position zero in the natural TALE targets could be attributed to the AT-rich sequence within the promoter region rather than to selective recognition of the nucleotide at this position.

Concluding remarks
The assembly of a redesigned TALE recognizing new DNA targets has confirmed the modularity of these DNA-binding domains to engineer new specificities. This property makes this scaffold a good candidate to tailor devices that when fused with other catalytic domains, such as nucleases, methylases acetylases etc., could shuttle to specific genome loci a determined activity for controlling gene expression, genome modification or gene repair . However, it is not clear whether the binding mechanism depends on a minimum number of perfect matches within a given DNA target length or perhaps involves differential contributions of different associations between the RVDs and the nucleotides (Boch et al., 2009;Streubel et al., 2012). In this context, activity assays have supported the important influence of T 0 in target recognition (Boch et al., 2009). Only the presence of a purine influences target binding. This effect is emphasized in the case of guanine, the bulkiest base. Therefore, other nucleotides could be accommodated in this position, thus increasing the number of target sequences that may be engineered in this scaffold.