Split green fluorescent protein as a modular binding partner for protein crystallization

A strategy using a new split green fluorescent protein (GFP) as a modular binding partner to form stable protein complexes with a target protein is presented. The modular split GFP may open the way to rapidly creating crystallization variants.

A modular strategy for protein crystallization using split green fluorescent protein (GFP) as a crystallization partner is demonstrated. Insertion of a hairpin containing GFP -strands 10 and 11 into a surface loop of a target protein provides two chain crossings between the target and the reconstituted GFP compared with the single connection afforded by terminal GFP fusions. This strategy was tested by inserting this hairpin into a loop of another fluorescent protein, sfCherry. The crystal structure of the sfCherry-GFP(10-11) hairpin in complex with GFP(1-9) was determined at a resolution of 2.6 Å . Analysis of the complex shows that the reconstituted GFP is attached to the target protein (sfCherry) in a structurally ordered way. This work opens the way to rapidly creating crystallization variants by reconstituting a target protein bearing the GFP(10-11) hairpin with a variety of GFP(1-9) mutants engineered for favorable crystallization.

Introduction
Structural characterization of proteins, protein complexes and small molecules is essential to understand cellular functions from enzymology to macromolecular machines. Knowledge of protein structures has led to the redesign of protein function and folding using rational and semi-rational approaches, and has promoted the discovery of new and improved smallmolecule drugs Yeung et al., 2009;Lin et al., 2010). Yet obtaining well ordered crystals, a prerequisite of macromolecular crystallography, remains a major obstacle; as many as 70% of purified proteins fail to crystallize (Terwilliger et al., 2009).
A number of current approaches to improve protein crystallization involve constructing variant forms of the target protein molecule. Examples include engineering proteins with enhanced solubility by site-directed mutagenesis (Nasreen et al., 2006;Eichinger et al., 2007) or directed evolution (Farinas et al., 2001;Pé delacq et al., 2002;Waldo, 2003;Cabantous, Pedelacq et al., 2005) and the removal of disordered regions, often at the N-or C-terminus (Thornton & Sibanda, 1983), by proteolysis (Dong et al., 2007) or targeted deletion (Pantazatos et al., 2004) based on disorder prediction. Proteins may also contain internally disordered regions such as loops or subdomains, which can sometimes be removed, shortened or replaced by a short linker to reduce conformational heterogeneity, thereby increasing crystallization propensity (Kwong et al., 1998(Kwong et al., , 1999Derewenda, 2010). Other methods such as surface-entropy reduction (Longenecker et al., 2001;Derewenda, 2004;Cooper et al., 2007) and lysine methylation (Rypniewski et al., 1993;Walter et al., 2006;Kim et al., 2008) drive crystallization by changing the surface properties of proteins and promoting lattice contacts. The surface-entropy reduction method has been successfully applied not only to individual proteins but also to protein-protein complexes and membrane proteins (Berman et al., 2007;Levinson et al., 2008;Yanez et al., 2008;Pornillos et al., 2009;Yip et al., 2005).
Other methods for modifying and potentially improving the crystallization properties of a protein involve connecting it to another protein intended to act as a carrier. Highly soluble proteins have been used as fusion partners to the N-terminus or C-terminus of proteins to enhance their folding and solubility and to mediate crystal contacts (Wiltzius et al., 2009;Kuge et al., 1997;Center et al., 1998;Monné et al., 2008;Ullah et al., 2008;Smyth et al., 2003;Moon et al., 2010). Carrier proteins have been inserted into loops of transmembrane proteins (Engel et al., 2002) and the insertion of T4 lysozyme into a loop of the 2 -adrenergic receptor is an example of a successful application of this strategy Cherezov et al., 2007). Noncovalent crystallization chaperones such as Fab and Fv fragments of antibodies (Kovari et al., 1995;Lange & Hunte, 2002;Lee et al., 2005;Ostermeier et al., 1995;Monroe et al., 2011) and designed ankyrin-repeat protein (DARPin; Monroe et al., 2011) have alternatively been used to produce complexes with target molecules. These complexes often show improved solubility and crystallizability in comparison to the isolated targets (Derewenda, 2010).
Synthetic symmetrization of proteins offers a further approach to expand crystallization opportunities. Variant forms of a target protein molecule are constructed, with each designed to produce a structurally distinct oligomer. Disulfidebased synthetic dimerization (Banatao et al., 2006;Forse et al., 2011) and designed metal-mediated oligomerization have both been demonstrated (Laganowsky et al., 2011). Other examples using different motifs such as leucine zippers to drive the self-association of a target protein have also been shown to promote protein symmetrization and crystallization (Yamada et al., 2007).
With current strategies for expanding the crystallization opportunities for a target protein, the effort required to produce many structural variants is a major challenge. A modular approach could offer important advantages. In particular, an ideal strategy might factor the problem of repeatedly re-engineering a protein of interest into two separate problems: (i) connecting the target protein to a carrier protein and (ii) creating variant forms of the carrier protein. In order to fully separate the two problems, the connection between the target protein and the carrier protein should occur by noncovalent molecular recognition rather than by genetic covalent attachment, so that repeated genetic modification and purification of the protein of interest can be avoided. Additionally, the target protein and the carrier protein should ideally be attached in a way that minimizes the flexibility between them, as too much flexibility would reduce the chances of forming well ordered crystals of the complex. Finally, the structural feature that drives the noncovalent association between the target protein and the carrier protein should ideally be transferable from one target system to another. In this way, one set of variational forms of the carrier protein can be utilized, without continual re-engineering, for a range of target proteins.
In this work, we demonstrate a system that meets the design requirements above and is based on green fluorescent protein (GFP). GFP has been employed before in crystallization experiments based on protein fusions (Suzuki et al., 2010). Also, previous studies on GFP have shown that it can form the basis for a complementation system: fragments composed of either -strand 11 or a hairpin comprised of -strands 10 and 11 can reassemble with truncated forms of GFP lacking these segments Cabantous et al., 2013). We show here that -strands 10 and 11 of GFP can be inserted as a hairpin into a protruding loop of a target protein, which when complemented by GFP(1-9) gives rise to a well ordered complex with two polypeptide-chain crossings between the two components which is amenable to crystal structure analysis. Prospects for developing the system for general applications are discussed.

Engineering superfolder Cherry (sfCherry)
The monomeric fluorescent protein Cherry (Shaner et al., 2004) was cloned as a C-terminal fusion to ferritin in a modified pET expression plasmid as described by Pé delacq et al. (2006). The N-terminal and C-terminal GFP sequence extensions (residues MVSKG and MDELYK, respectively; Supplementary Fig. S2b 1 ) that were added to improve mCherry protein solubility in an earlier study (Shaner et al., 2004) were omitted here to increase the stringency of selection for better solubility and stability. The DNA encoding mCherry was amplified by PCR using vector flanking primers and was subjected to DNA fragmentation and shuffling using published protocols (Stemmer, 1994). The cDNA library plasmid pool was transformed into Escherichia coli BL21 (DE3) Gold (Novagen) competent cells for protein expression. The library was plated on nitrocellulose membranes using two sequential 400-fold dilutions of a 1.0 OD 600 nm cell stock frozen in 20% glycerol/Luria-Bertani (LB), yielding $3 Â 10 3 colonies per plate. Cells were grown overnight at 305 K and proteins were expressed by transferring the membrane to an LB-agar plate containing 35 mg kanamycin per millilitre of medium and 1 mM isopropyl -d-1-thiogalactopyranoside (IPTG) for 3 h at 310 K. Clones displaying the brightest fluorescence (550 nm excitation/610 nm emission) were selected, grown overnight and frozen in 20% glycerol/LB freezer stocks at 193 K. These brightest clones were selected as templates for the next round of evolution. After three rounds of directed evolution, the sequences of the constructs were confirmed by DNA sequencing and the brightest clone coding sfCherry was chosen. 2.2. Insertion of the GFP strands 10-11 hairpin and selection of a clone with a permissive loop GFP strands 10-11 (DLPDDHYLSTQTILSKDLNEKRD-HMVLLEYVTAAGITDAS, with residues in strand 10 and strand 11 shown in bold and those in the three-residue linker DAS italicized) were inserted into permissive loops of sfCherry by PCR. Primer sequences are included in Supplementary Table S1. Fragments were cloned into pTET ColE1 vector and transformed into E. coli BL21 (DE3) competent cells containing pET GFP strands 1-9 for in vivo testing. In vivo protein expression and solubility screenings were performed as described previously . 1 OD 600 nm frozen cell stocks in 20% glycerol/LB were thawed and diluted 400-fold (twice) in LB and plated onto a nitrocellulose membrane with selective LB-agar containing 35 mg ml À1 kanamycin (Kan) and 75 mg ml À1 spectinomycin (Spec). After overnight growth at 305 K, the membrane was transferred to a pre-warmed plate containing 0.3 mg ml À1 anhydrotetracycline (AnTet), 1 mM IPTG for 4 h at 303 K for protein expression screening. For protein solubility testing, the membrane was transferred to a pre-warmed plate containing 0.3 mg ml À1 AnTet for 2 h, rested back to its original LB-Kan-Spec plate for 1 h to allow the AnTet to diffuse out and followed by induction on an LB-Kan-Spec plate with 1 mM IPTG at 303 K for 1 h. The induced plates were illuminated using an Illumatool Lighting System (LightTools Research) equipped with 488/520 nm (for GFP) and 550/610 nm (for sfCherry) excitation/emission filters.
2.3. Expression and refolding of GFP(1-9) and GFP(1-10) fragments GFP(1-9) and GFP(1-10) proteins were expressed and prepared as described previously . Briefly, 1 l cultures of E. coli BL21(DE3) cells expressing GFP(1-9) or GFP(1-10) constructs were grown until an OD 600 nm of 0.5-0.7 was reached, protein expression was induced with 1 mM IPTG and the cells were harvested after 5 h of induction at 310 K. The harvested cells were resuspended in 50 mM Tris pH 7.4, 0.1 M NaCl, 10% glycerol (TNG buffer) and lysed by sonication on ice. Inclusion bodies containing GFP(1-10) and GFP(1-9) were recovered by centrifugation at 20 000g. Inclusion bodies were washed and prepared in individual Eppendorf tubes ($75 mg inclusion bodies per tube) as described previously . Prepared inclusion bodies can be stored at 193 K for at least several months. 75 mg of the washed inclusion bodies prepared in a 1.5 ml Eppendorf tube was unfolded with 1 ml 9 M urea in TNG buffer and refolded by adding 25 volumes of TNG buffer. The soluble solutions were filtered through a 0.2 mm syringe filter and the protein was quantified using the Bio-Rad Protein Assay reagent (Bio-Rad). This refolded protein solution is ready for protein complementation and can also be stored for up to a week at 253 K for later use.

Expression and purification of sfCherry and
sfCherry-GFP(10-11)-GFP(1-9) complex sfCherry with GFP strands 10-11 inserted at position Asp169/Gly170 was subcloned into pET with a noncleavable C-terminal His 6 tag. The C-terminal in-frame BamHI site introduced a GS amino-acid motif between sfCherry and the His 6 tag. Proteins were expressed in E. coli BL21(DE3) cells under the control of the IPTG-inducible T7 promoter. A 1 l culture of E. coli BL21(DE3) cells expressing sfCherry or sfCherry with GFP strand 10-11 inserted was grown to an OD 600 nm of $0.5-0.7 and induced with 1 mM IPTG for 7 h at 303 K. The harvested cells were suspended in TNG buffer and lysed by sonication on ice for 10 min at 70% duty cycle. The mixture was then centrifuged at 15 000g for 30 min at 283 K to remove cell debris. For sfCherry and sfCherry-GFP(10-11) the supernatant was incubated with pre-equilibrated Talon metalaffinity resin (Clontech) and the mixture was incubated at room temperature with gentle shaking for 1 h to allow the protein to bind to the resin. The protein bound to the resin was separated from unbound protein by centrifugation at 3000g for 5 min and the resin was washed two times with column buffer before it was packed into a gravity-flow column. The column was then washed with 50 ml column buffer (50 mM sodium phosphate buffer pH 7, 300 mM NaCl, 10% glycerol) followed by 50 ml binding buffer (50 mM sodium phosphate buffer pH 7, 300 mM NaCl, 10% glycerol, 5 mM imidazole) and 20 ml washing buffer (50 mM sodium phosphate buffer pH 7, 300 mM NaCl, 10% glycerol, 20 mM imidazole) to remove unbound and nonspecifically bound proteins, respectively. The purified proteins were completely eluted with 250 mM imidazole in TNG buffer with a good yield of about 40 mg per litre of cell culture. The protein solutions were concentrated and exchanged to final buffer [20 mM Tris-HCl pH 8, 150 mM NaCl, 1 mM dithiothreitol (DTT)] using an Amicon Ultra-15 centrifugal filter device (10 kDa cutoff; Millipore).
To create the sfCherry-GFP(10-11)-GFP(1-9) protein complex, purified sfCherry-GFP(10-11) was complemented overnight in the cold room with an excess amount of refolded GFP(1-9) such that the amount of GFP(1-9) was not limiting. The protein mixture was applied onto pre-equilibrated Talon metal-affinity resin and the protein complex was subsequently purified using the same purification protocol used for sfCherry and sfCherry-GFP(10-11) as indicated above. For each purification step, the protein elution samples were resolved on a 4-20% gradient Criterion SDS-PAGE gel (Bio-Rad, Hercules, California, USA) and stained using Gel Code Blue stain reagent (Pierce, Rockford, Illinois, USA).

Crystallization
SfCherry (at a concentration of $25 mg ml À1 ) and the sfCherry-GFP(10-11)-GFP(1-9) complex (at a concentration of $22 mg ml À1 ) were both crystallized using the sitting-drop vapor-diffusion method by mixing 0.15 ml protein stock with 0.15 ml reservoir solution and equilibrating the drop against 30 ml reservoir solution at 298 K. A set of 384 crystallization reagents consisting of Crystal Screen, Crystal Screen 2 (Hampton Research), PACT suite (Qiagen) and JCSG Core Suites I and II (Qiagen) was used to screen for the propensity of crystallization. Subsequent optimization fine-tuning of pH, salt, precipitants and additives were employed as needed until diffraction-quality crystals were obtained.
For sfCherry crystallization, six conditions from the initial screening, including four closely related conditions from the A and B rows of the PACT suite, appeared to be in the crystal-lization zone of sfCherry and yielded long clustered needles or rods. The best crystals ($200 Â 20 Â 10 mm) were obtained from a condition consisting of 0.1 M SPG (succinic acid, phosphate, glycine) buffer pH 5.0, 25%(w/v) PEG 1500. The diffraction data from these crystals contained satellite lattices, but one of the data sets was suitable for structure determination of sfCherry.
For crystallization of the sfCherry-GFP(10-11)-GFP(1-9) complex, clustered plates were observed in initial screening experiments in five conditions from rows E, F and H of the PACT suite. Subsequent optimization, including the use of glycerol as an additive to reduce nucleation, yielded diffraction-quality crystals (100 Â 30 Â 20 mm) from a condition consisting of 0.1 M bis-tris buffer pH 8.3, 20%(w/v) PEG 3350, 6%(v/v) glycerol. Fluorescence microscopy was used to verify the existence of fluorophores in the crystals. Images of crystals taken under white light and photographs of protein solutions taken with white light and under 488/520 nm and 550/610 nm excitation/emission filters are shown in Supplementary Fig. S3.

Data collection, molecular replacement and refinement
Data were collected from crystals of sfCherry and the sfCherry-GFP(10-11)-GFP(1-9) complex on beamline 5.0.2 at the Advanced Light Source (ALS) and were processed with the HKL-2000 program (Otwinowski & Minor, 1997). The crystals of sfCherry belonged to space group P2 1 , with unitcell parameters a = 85.105, b = 96.294, c = 105.957 Å , = 104.56 . The data set was processed to 2.0 Å resolution with an R merge of 9.5% and a completeness of 97.0%. Cellcontent analysis gave a Matthews coefficient of 2.17 Å 3 Da À1 and a solvent content of 43% with eight copies of sfCherry in the asymmetric unit. The crystals of the sfCherry-GFP(10-11)-GFP(1-9) complex belonged to space group P2 1 2 1 2 1 , with unit-cell parameters a = 74.360, b = 86.490, c = 167.941 Å . The data set for sfCherry-GFP was processed at 2.6 Å resolution with an R merge of 6.5% and a completeness of 98.8%. The Matthews coefficient of the sfCherry-GFP(10-11)-GFP(1-9) complex crystals was 2.70 Å 3 Da À1 , suggesting a solvent content of 54% with two copies of the complex in the asymmetric unit.
The crystal structure of sfCherry was determined by the molecular-replacement (MR) method using the Phaser program (McCoy et al., 2007) in the PHENIX suite (Adams et al., 2010). The mCherry structure (PDB entry 2h5q; Shu et al., 2006) was used as a search model. Model rebuilding was carried out with AutoBuild (Terwilliger et al., 2008) and refinement with phenix.refine . The final R and R free values for sfCherry were 22.2 and 26.5%, respectively.
The crystal structure of the sfCherry-GFP(10-11)-GFP(1-9) complex was also determined with the MR method. Similar procedures and programs as those used in the sfCherry structure determination were employed but with the following differences. The sfGFP (PDB entry 2b3q;Pé delacq et al., 2006) and partially refined sfCherry structures were used as search models. The sequences belonging to strands 10 and 11 of  Table 1 Statistics of data collection and refinement for sfCherry (PDB entry 4kf4) and the sfCherry GFP(10-11)-GFP(1-9) complex (PDB entry 4kf5). sfGFP were pruned from sfGFP and grafted between the original strands 8 and 9 of sfCherry based on the designed constructs (Fig. 4a). This modified sequence pair was used in model rebuilding with AutoBuild. Reference-structure restraints  were used in early stages of refinement and were released at later stages. The refined structure of the sfCherry-GFP(10-11)-GFP(1-9) complex had an R value of 20.5% and a free R value of 24.9%. Detailed data-collection and refinement statistics of sfCherry and the sfCherry-GFP(10-11)-GFP(1-9) complex are listed in Table 1. The atomic coordinates and structure factors are available in the Protein Data Bank under accession codes 4kf4 for sfCherry and 4kf5 for sfCherry GFP(10-11)-GFP(1-9).

Strategy for modular design
The structure, stability and folding of GFP have been well studied (Ö rmo et al., 1996;Tsien, 1998;Crameri et al., 1996). Its relatively simple topology, combined with its utility as a fluorescent reporter when correctly folded (Waldo et al., 1999;Pé delacq et al., 2006), has made it an attractive system for reconstitution from separately expressed protein fragments . Following such a strategy, by fusing terminal segments of GFP to a crystallization target the resulting construct might be recombined with the remaining complementary fragment of GFP to create a new complex for crystallization. In the context of crystallization strategies, a challenge presented by typical fusion methods is the flexibility introduced at the site of connection between the two protein components; free torsion angles are present where the polypeptide backbone makes its (single) crossing from one natural protein fold to the other. The value of having the polypeptide chain cross twice instead of once between two connected proteins has been demonstrated in experiments in which T4 lysozyme was inserted into a loop of GPCR membrane proteins, giving a construct that yielded well ordered crystals Cherezov et al., 2007). The split GFP system [GFP(1-9) + GFP(10-11)] allows a similar advantage. If strands 10 and 11, which ostensibly form a natural hairpin, can be inserted as a long extension into a surface loop of a target protein, then reconstitution with complementary GFP(1-9) should give a tight noncovalent complex with two chain crossings between natural protein folds (Fig. 1). In practice, rational choices for the points of insertion of strands 10-11 into exposed loops might be based on homology models, where available, or on bioinformatic predictions of loops (Lambert et al., 2002;Dovidchenko et al., 2008;Jones, 1999). Here, we chose a target for crystallization for which the structure was known, in order to test the strategy of loop insertion and crystallization in a favorable case.

Cherry fluorescent protein as a target protein
In the present study, the protein chosen as a target for crystallization was superfolder Cherry (sfCherry), a version of red fluorescent protein engineered in our laboratory. sfCherry was chosen as a test protein so that the folding of the target could be monitored by red fluorescence while the GFP reconstitution could be monitored by green fluorescence. The well folding sfCherry protein was created from the fluorescent monomeric Cherry protein (mCherry; Shaner et al., 2004) by directed evolution of mCherry carrying the poorly folding and aggregation-prone bullfrog red-cell H-subunit ferritin as an N-terminal fusion, as described previously (Pé delacq et al., 2006). Owing to the naturally poor folding properties of the ferritin, colonies expressing the initial ferritin-mCherry fusion at 310 K showed only faint fluorescence ( Supplementary Fig.  S1a). After three rounds of DNA shuffling, during which we selected brighter fluorescent clones expressed at 310 K, we obtained highly fluorescent ferritin-sfCherry protein fusions. E. coli colonies and liquid cultures of cells expressing ferritin-sfCherry fusions after three rounds of directed evolution were about 100-fold brighter than cells expressing ferritin-mCherry at 310 K ( Supplementary Fig. S1a). Our new folding-enhanced sfCherry contains six mutations: R36H, K92T, R125L, S147T, K162N and N196D. A native polyacrylamide gel at $10 mg ml À1 protein concentration indicated that the protein is approximately 50% dimer and 50% monomer ( Supplementary Fig. S1b).

Selection of a permissive insertion site in sfCherry
Our strategy of inserting GFP strands 10-11 into a target protein requires that permissive sites be identified. In order to guide the choice of sites that might be permissive for insertion into our target protein, sfCherry, we relied partly on earlier Principle of the work: insertion of GFP hairpin strands S10 and S11 into a permissive loop of a target protein, followed by reconstitution of the intact GFP by attachment of GFP(1-9) (i.e. the GFP molecule missing the hairpin). (a) In vivo protein expression (left panel) and solubility screens (right panel) for the sfCherry-GFP(10-11) hairpin inserted at Pro52/Gly53 and Asp169/ Gly170. Pictures were taken of the plates after 4 h of co-induction (to monitor protein expression by GFP fluorescence) and after 2 h of induction with anhydrotetracycline (AnTet) followed by 1 h rest and 1 h induction of GFP(1-9) (to monitor soluble protein by GFP fluorescence). Fluorescence from folded sfCherry was monitored using 550 nm excitation/610 nm emission (red fluorescence) and reconstituted GFP fluorescence was monitored using 488 nm excitation/520 nm emission (green fluorescence). Pictures are shown with 0.5 s exposure times for red fluorescence and 0.25 s exposure times for green fluorescence. (b) In vitro sensitivity characterization of sfCherry-GFP(10-11) complementation with GFP(1-9). 20 ml aliquots containing 1.56-200 pmol of sfCherry-GFP(10-11) hairpin were mixed with 180 ml aliquots containing 800 pmol GFP(1-9) to start the complementation. AU, arbitrary fluorescence units. (c) Superimposition of scaled progress curves for complementation of 200, 100, 50, 25, 12.5, 6.25, 3.13 and 1.56 pmol samples. The curves can be superimposed well by linear scaling, indicating that the shape of the progress curves does not depend on the concentration of the tagged protein or the depletion of the pool of unbound GFP(1-9) fragment (see x3). experimental data for circular permutants of superfolder GFP (sfGFP), which has 23.3% sequence identity to sfCherry and a similar structure (Pé delacq et al., 2006). On this basis, the GFP(10-11) hairpin with a short linker of three residues (DAS) was inserted at two different loop sites (Gly52/Pro53 or Asp169/Gly170) of sfCherry. The three-residue linker was included to improve protein solubility as guided by our previous experiments (data not shown). These two sfCherry-GFP(10-11) hairpin constructs were screened for expression and solubility in vivo in E. coli colonies using a complementation assay with GFP(1-9) as previously described for GFP11 and GFP(1-10) . The construct with the GFP hairpin inserted at Asp169/Gly170 clearly showed brighter red and green fluorescence compared with the Gly52/Pro53 insertion (Fig. 2a). We concluded that insertion of the GFP hairpin at the permissive site Asp169/ Gly170 of sfCherry was the better choice for folding of the target and subsequent binding to GFP(1-9). This construct was chosen for further crystallization and structural characterization.

Crystal structure of sfCherry alone
To allow subsequent comparisons, the crystal structure of sfCherry (without a loop insertion) was determined at 2 Å resolution from the protein expressed in E. coli (Table 1). The C superposition of sfCherry and mCherry (PDB entry 2h5q; Shu et al., 2006) has a root-mean-square deviation (r.m.s.d.) of only 0.17 Å for residues 6-223 (Fig. 3a). The chromophore is formed from residues Met66-Tyr67-Gly68 and is buried in the middle of the central helix. Unlike mCherry, sfCherry crystallized as a symmetric dimer. The dimer interface includes the hydrophobic residues Val96, Val104 and Leu125 and the hydrophilic residues Asn23, Glu94, Thr106, Thr108, Thr127 and Asn128. Similar to the AB dimer interface found in the Dsred tetramer (Yarbrough et al., 2001), the sequence Val104, Thr106, Thr108 is central to the dimer interface in the sfCherry structure, in which Thr106A forms a hydrogen bond to its counterpart Thr106B. A sequence alignment of sfCherry, mCherry and Dsred ( Supplementary  Fig. S2b) suggests that the R125L mutation in sfCherry is likely to contribute to the observed dimerization. In both the Dsred tetramer (PDB entry 1g7k) and the sfCherry structures (this work), either Ile125 (Dsred) or Leu125 (sfCherry) may stabilize the dimer through hydrophobic interactions. In the mCherry structure (PDB entry 2h5q), the bulky charged side chain Arg125 is likely to prevent dimerization by charge repulsion. In the sfCherry structure, the side chains of Asp196 form hydrogen bonds to Arg220 via O 2 and to Thr147 via O 1 , while the corresponding interactions between Asn196 and Arg220/Ser147 are not present in the mCherry structure (Fig. 3b). This change in the hydrogen-bonding network, together with the R125L mutation, may explain in part why sfCherry is more stable and more tolerant to folding interference compared with mCherry when fused to a poorly folding and aggregation-prone protein such as H-subunit ferritin ( Supplementary Fig. S1a).
3.6. Structure of sfCherry with GFP strands 10-11 inserted at Asp169/ Gly170 in complex with GFP(1-9) The structure of sfCherry-GFP(10-11) in complex with GFP(1-9) was determined at 2.6 Å resolution, with final R and R free values of 0.205 and 0.247, respectively (Table 1). No major elements of disorder, conformational heterogeneity or anisotropy were observed. The structure of the complex (Fig. 4b) shows sfCherry to be clearly linked to the GFP(10-11) hairpin and Three-dimensional structures of mCherry and sfCherry. (a) Structure of mCherry (left; PDB entry 2h5q; Shu et al., 2006) and sfCherry (right;this work) showing the locations of sfCherry mutations and the corresponding residues in mCherry. (b) Region of mCherry and sfCherry close to residues that the GFP(10-11) hairpin complements GFP(1-9) to form an intact GFP molecule. The crystal asymmetric unit contains two copies of the complex. With two complexes in the asymmetric unit, and two linking chain segments between the two protein components in each case, there are four linking polypeptide segments. All of these segments are well ordered and clearly visible in the final electron-density map (Fig. 5). Furthermore, the relative orientation of the GFP and sfCherry components in the complex is very similar in the two instances visualized in the asymmetric unit. When the GFP components of the two independent complexes are spatially overlapped, the sfCherry components differ in the two cases by a rotation of only 9 (Fig. 6).
The GFP domains form a dimer in the crystal with local twofold symmetry ( Fig. 4b and Supplementary Fig. S4a). The GFP dimer interface is mediated through -strand 10 (inserted in sfCherry) via the sequence Gln180, Ile182 and Leu183 and the loop Phe145, Asn146 and Ser147 connecting strand 6 and strand 7 of GFP(1-9). Residue Gln180 of GFP -strand 10 is hydrogen-bonded to the backbone of its counterpart Leu183 via the N " and O " atoms. Position Ile182 in the sfCherry-GFP(10-11) hairpin construct corresponds to Ala206 in the folding reporter GFP and to Val206 in sfGFP (Pé delacq et al., 2006). The dimer interface found in the crystal structure of the folding reporter GFP (PDB entry 2b3q) was also mediated through Gln204, Ala206 and Leu207 of strand 10 and Tyr145, Asn146 and Ser147 of the loop connecting strand 6 and strand 7 (Pé delacq et al., 2006), similar to the interface found in our sfCherry-GFP(10-11)-GFP(1-9) complex structure. The sfCherry domains are arranged in the crystal as a dimer that is essentially identical to the dimer formed when crystallized by itself (Fig. 4b and Supplementary Fig. S4b). The crystal structure exhibits strong packing interactions in all three dimensions owing to the dimerization of the reconstituted GFP, the linkage between GFP and sfCherry (creating linkages in the xy plane) and the dimerization of sfCherry (creating linkages in the z direction).

Discussion
The purpose of these experiments was to develop a modular framework for using split GFP as a crystallization partner.
Here, we present a proof-of-principle experiment in which we used GFP reconstitution to monitor the success of GFP hairpin insertion into sfCherry, a red fluorescent protein, and then characterized the atomic structure of the sfCherry-GFP(10-11)-GFP(1-9) protein complex by X-ray diffraction. We note that the GFP(10-11) hairpin described here was originally optimized as a protein-interaction detector with each -strand separately attached to an interacting protein (Cabantous et al., 2013). Part of this optimization involved eliminating any aggregation and selfassembly between the -strands. This could potentially destabilize the GFP(10-11) hairpin prior to complementation by GFP(1-9), affecting the stability of target proteins. Despite these caveats, we found a site for insertion of the GFP(10-11) hairpin sequence that did not substantially disrupt the folding of the well folded sfCherry. However, the insertion of the GFP(10-11) hairpin might affect the stability of less stable target proteins. The choice of insertion site might therefore be important in more general applications. For choosing the permissive sites of sfCherry in this study, we relied partly on homology models (below) and partly on our previous experimental data for circular permutants of sfGFP, as indicated in x3.3. A homology model obtained for the Three-dimensional structure of the sfCherry-GFP(10-11) hairpin complexed with GFP(1-9). (a) The amino-acid sequence of the sfCherry-GFP(10-11) hairpin is colored red for the sfCherry component, blue for the GFP(10-11) hairpin and cyan for the three-residue linker. (b) Structure of the sfCherry-GFP(10-11)-GFP(1-9) complex with the same color scheme used as in the amino-acid sequence. sfCherry forms dimers in the crystal through an interface involving the side chains of Thr106 (shown as spheres). sfCherry sequence using SWISS-MODEL (Arnold et al., 2006;Guex & Peitsch, 1997;Schwede et al., 2003) has an r.m.s.d. of 0.2 Å for the C atoms of residues 6-222 compared with the actual structure that we obtained for sfCherry in this study. The GFP(10-11) hairpin sequence could have been inserted into any of several loop sites of sfCherry based on this homology model; in vivo experiments (x2.2) could have been used to screen for the most permissive site. The GFP(10-11) hairpin sequence reported in this paper is likely to be suitable for insertion into various other target proteins. We are currently engineering the GFP(10-11) hairpin sequence specifically as an insertion in order to minimize the effects that it might have on the stability of target proteins.
While some structures have previously been obtained for proteins fused terminally to full-length GFP, the use of the GFP hairpin insertion instead as a fusion partner has potential benefits for crystallization. The hairpin is small and may be less perturbing of protein folding than a fusion of intact GFP. Further, the hairpin is topologically well suited for insertion into the loops and turns of a target protein. Finally, instead of the single-chain crossing afforded by terminal GFP fusions, the hairpin provides two chain crossings between the target and the reconstituted GFP. We expect this to be an important feature, as it would be expected to reduce the flexibility between the connected components. This expectation was confirmed by the crystal structure of our complex. We observed that the chain-crossing segments were well ordered. Perhaps more compellingly, the two instances of the complex seen in the asymmetric unit of the crystal suggest that the two connected components, GFP and the sfCherry target protein, sample a rather limited range of relative orientations. The relative orientation of the two components differs by 9 when the two complexes are compared. This appears essentially as a minor hinge motion through the two points of connection; twisting and rotation about the other orthogonal direction is evidently limited by the Stereoview showing an overlap of the two instances of the complex in the asymmetric unit. When the two GFP components (cyan, bottom) are superimposed, the sfCherry components in the two copies of the complex (yellow and pink) are observed to differ by only a small rotation. The images were created with PyMOL (DeLano, 2002).

Figure 5
A 2mF o À DF c A -weighted electron-density map (Winn et al., 2011) was calculated using a model that was constructed before any connections between sfCherry and the GFP(10-11) hairpin had been built. The connections between the green arrows (not included in the phasing model) are between residues 168 and 172 and residues 205 and 211. This unbiased map (contoured at 0.5 in gray and at 1 in blue) shows clear connections between sfCherry and the GFP(10-11) hairpin that was inserted into sfCherry at an exposed loop. double connection. The connection therefore appears to be relatively rigid.
The ease with which the current version of GFP strands 10-11 could be inserted into a test protein and then readily crystallized as a complex with GFP(1-9) suggests that the approach may be widely applicable, especially after further optimization of the GFP(10-11) hairpin. The case presented here held the advantage that the target protein had already been structurally characterized, so that the surface loops for insertion could be defined easily. For more realistic applications, homology modeling could be valuable in selecting prospective insertion sites. In the most challenging cases, such as where the target protein has no homologs of known structure, a library of constructs with the hairpin randomly inserted could be created and the in vivo solubility assay with GFP(1-9) (described in this study) could be used to screen for permissive sites.
The natural modularity of our split system for crystallization opens the possibility of engineering and testing many variants of GFP(1-9) that might be expected to have distinct crystallization behaviors. In this way, a single target protein construct bearing a GFP(10-11) insertion could be combined with any number of different variants of the GFP(1-9) carrier, leading to greatly expanded chances of crystallization. This strategy would circumvent the labor associated with exhaustively reengineering a protein being targeted for crystallization, since the purified target protein bearing the GFP hairpin could be complemented with different pre-purified GFP(1-9) mutants without further genetic manipulation, protein expression and purification. The strategy shown here could be applied to detergent-solubilized membrane proteins, inserting the GFP(10-11) hairpin into exposed cytoplasmic loops, as well as soluble proteins. Another benefit of this system is that GFP can potentially be used as the search model in molecular replacement, making it possible to obtain diffraction phases and electron-density maps even for a target protein with an unknown fold.
Many of the techniques that have been used to vary the crystallization behavior of proteins could be employed to modify the GFP(1-9) carrier. In particular, synthetically symmetrized versions of GFP(1-9) should lead to highly distinct constructs, with each providing essentially independent opportunities for forming lattice contacts during crystallization. The creation of unique GFP(1-9) modules supporting the formation of new lattices and the development of methods to attach them to target proteins via engineered versions of a GFP(10-11) hairpin are ongoing projects in our laboratories.