Diffraction Structural Biology Synchrotron Radiation Creation and Structure Determination of an Artificial Protein with Three Complete Sequence Repeats

Symfoil-4P is a de novo protein exhibiting the threefold symmetrical-trefoil fold designed based on the human acidic fibroblast growth factor. First three asparagine–glycine sequences of Symfoil-4P are replaced with glutamine– glycine (Symfoil-QG) or serine–glycine (Symfoil-SG) sequences protecting from deamidation, and His-Symfoil-II was prepared by introducing a protease digestion site into Symfoil-QG so that Symfoil-II has three complete repeats after removal of the N-terminal histidine tag. The Symfoil-QG and SG and His-Symfoil-II proteins were expressed in Eschericha coli as soluble protein, and purified by nickel affinity chromatography. Symfoil-II was further purified by anion-exchange chromatography after removing the HisTag by proteolysis. Both Symfoil-QG and Symfoil-II were crystallized in 0.1 M Tris-HCl buffer (pH 7.0) containing 1.8 M ammonium sulfate as precipitant at 293 K; several crystal forms were observed for Symfoil-QG and II. The maximum diffraction of Symfoil-QG and II crystals were 1.5 and 1.1 A ˚ resolution, respectively. The Symfoil-II without histidine tag diffracted better than Symfoil-QG with N-terminal histidine tag. Although the crystal packing of Symfoil-II is slightly different from Symfoil-QG and other crystals of Symfoil derivatives having the N-terminal histidine tag, the refined crystal structure of Symfoil-II showed pseudo-threefold symmetry as expected from other Symfoils. Since the removal of the unstructured N-terminal histidine tag did not affect the threefold structure of Symfoil, the improvement of diffraction quality of Symfoil-II may be caused by molecular characteristics of Symfoil-II such as molecular stability.


Introduction
Symmetry is one of the important thema in developing protein structure, function, evolution and design. Although complete structural symmetry is observed in many different natural proteins as homo-oligomerized architectures, structural pseudosymmetry is also observed in some monomeric proteins. These pseudosymmetric architectures are generally hypothesized as a result of gene duplication and fusion (Sepulveda et al., 1975;Tang et al., 1978;McLachlan, 1979;Inana et al., 1983). Two distinctly different evolutionary models for the emergence of symmetric protein architecture from a primordial peptide motif have been proposed (Mukhopadhyay, 2000;Ponting & Russell, 2000;Liu et al., 2002;Yadid & Tawfik, 2007;Akanuma et al., 2010;Richter et al., 2010).
In a previous report, we described an experimental topdown symmetric deconstruction (TDSD) of symmetric protein architecture (the -trefoil fold) using human fibroblast growth factor-1 (FGF-1), a 140 amino acid single-domain globular protein exhibiting characteristic threefold symmetry of thetrefoil architecture. The TDSD involved sequential introduction of symmetric mutations (targeting core, reverse-turn and -strand secondary structure, respectively) until a purely threefold symmetric primary structure solution was achieved. Through this approach, we obtained a simplified -trefoil protein (Symfoil-4P) having a reduced amino acid alphabet size of 16 letters, and enriched in prebiotic amino acids (to 71%) (Lee & Blaber, 2011;Longo et al., 2013).
In order to obtain Symfoil with more complete symmetry, and greater chemical stability, we designed a monomeric protein (Symfoil-II) based on the Symfoil-4P protein (Lee & Blaber, 2011). In the sequence of Symfoil-II, three aspargineglycine sequences were introduced to improve chemical stabilization from producing charge isomers by deamidation reaction. Furthermore, a protease digestion site was introduced to make the three repeats of Symfoil more complete after removing the N-terminal histidine tag. Here, we report the crystal structure and characteristics of Symfoil-II. Symfoil-II with complete threefold axis may be useful as a scaffold that can capture small C3 symmetric compounds using the threefold axis within the Symfoil-II protein.

Site-directed mutagenesis
To construct expression plasmids for the mutants, sitedirected mutagenesis on the Symfoil-4P in pET-21a vector (Brych et al., 2001) was achieved by using polymerase chain reaction (PCR). PrimeStar Max DNA polymerase (Takara Bio) was used for the PCR. The PCR products were transfected into Escherichia coli HST08 strain without ligation (Takara Bio). Primers used for PCR are listed in Table 1. For creation of Symfoil-SG and QG, subcloning of the PCR product was repeated three times. In the first reaction, Asn100 was replaced with serine or glutamine. In the secondary reaction, Asn58 was replaced with serine or glutamine. Finally, primers of Cdel_F and Cdel_R were used for deletion of C-terminal three amino acids. The resulting amino acid sequences of Symfoil-QG and Symfoil-SG are shown in Fig. 1. For preparation of the expression plasmid of His-Symfoil-II, the plasmid template of Symfoil-QG was amplified by using primers N_delQG_F and N_delQG_R as listed in Table 1. The DNA sequences of the coding region in all plasmids constructed here were confirmed by using ABI Prism 310 DNA sequencer (Applied Biosystems).
The resultant crude protein solutions were centrifuged at 12000 Â g for 20 min. The obtained supernatants were dialyzed against buffer A, and were applied to a HisTrap FF column (5 ml) (GE-Healthcare) equilibrated by buffer A. The column was washed by buffer A containing 20 mM imidazole and eluted by a step gradient of 250 mM imidazole. The eluted fractions were dialyzed against buffer A and passed through a HiTrap Heparin HP column (5 ml) (GE-Healthcare) equilibrated by buffer A to remove impurities. The flow-through fractions were collected, and loaded onto a ResourceQ column (3 ml) (GE-Healthcare) equilibrated by buffer A. Elution from the ResourceQ column was achieved by a linear gradient of NaCl.
The histidine tag of His-Symfoil-II was further removed by trypsin (Wako Pure Chemical Industries, Japan), and purified by ResourceQ column chromatography to generate Symfoil-II. Extinction coefficients of E280nm (0.1%, 1 cm) of 3.8, 3.8, 2.9 and 3.1 were used to calculate protein concentrations for the Symfoil-SG, Symfoil-QG, His-Symfoil-II and Symfoil-II, respectively. The final yield was about 15 mg from 1 l of culture.

Gel filtration
To characterize the self-assembly of Symfoil-II, gel filtration chromatography using a Superdex 200 10/300GL column (GE Healthcare) was conducted. The column was equilibrated with 50 mM potassium phosphate buffer (pH 7.5) containing 0.2 M NaCl. The molecular mass of eluted Symfoil-II was determined by multi-angle laser light scattering (SEC-MALLS). Light-scattering analysis was performed using a miniDAWN detector (Wyatt Technologies).

Crystallization
The purified Symfoil proteins were dialyzed against 50 mM sodium phosphate buffer (pH 7.5) containing 100 mM NaCl,  Table 1 Primers used for site-directed mutagenesis.

Mutation
Name of primer Sequence Sequence alignment of Symfoil proteins. Mutated sites are boxed. The arrow indicates the thrombin cleavage site newly introduced in Symfoil-II.
10 mM ammonium sulfate and 0.5 mM EDTA, and then concentrated to 20 mg ml À1 . Crystallization was performed by hanging-drop vapor diffusion in 0.1 M Tris-HCl buffer (pH7.0) containing 1.5-2.0 M ammonium sulfate as precipitant. Drops consisting of 2 ml protein solution and 2 ml mother liquor were equilibrated against 1 ml of reservoir solution at 293 K for one week.

Data collection and refinement
Diffraction data of crystals of Symfoil-QG and Symfoil-II were collected using synchrotron radiation sources ( = 1.00 Å ) at beamlines in SPring-8 and KEK, Japan. The crystals were mounted using a nylon cryo-loop (Hampton Research) and were frozen in a liquid-nitrogen stream at 100 K. Diffraction data were collected and indexed, integrated and scaled using the HKL2000 software package (Otwinowski & Minor, 1997). A molecular replacement search for nonisomorphous space groups was carried out using the program Phaser from the CCP4 suite (McCoy et al., 2007;Winn et al., 2011) and coordinates of Symfoil-4P de novo designed protein [Protein Data Bank (PDB) code 3o4d] as a search model. Model building and visualization was performed using the X-tal View molecular graphics software (McRee, 1992). The PHENIX software package (Zwart et al., 2008) and the program REFMAC  were used for refinement, in which 5% of the data in the reflection files were set aside for R free calculations. The ARP/wARP automated procedure was used to add solvent molecules (Lamzin & Wilson, 1993). Atomic models were drawn using the graphics program Pymol (DeLano, 2002).

Results and discussion
Symfoil-II was designed based on the crystal structure of Symfoil-4P to have more perfect sequence repeats as shown in Fig. 1. We first removed three NG (Asn-Gly) sequences by changing Asn58 and Asn100 to Ser or Gln and by deleting the Gly141-Asn142-Gly143 sequence to give a Symfoil-SG or Symfoil-QG, respectively, to protect from deamidation during the crystallization experiments. Then, we added a thrombin cut site and GQG sequence to the N-terminal of the first sequence repeat of Symfoil-QG to make His-Symfoil-II. After the removal of the N-terminal histidine tag of His-Symfoil-II, Symfoil-II will be expected to have three complete sequence repeats in one protein as shown in Fig. 1.
Symfoil-SG, Symfoil-QG and His-Symfoil-II were prepared after expression using the E. coli expression system. The purity of Symfoil-SG, Symfoil-QG and His-Symfoil-II with N-terminal histidine tag and Symfoil-II without N-terminal histidine tag was confirmed by SDS-PAGE (Fig. 2). The SDS-PAGE showed that the molecular size of His-Symfoil-II (lane 5 and 10) was slightly smaller than those of Symfoil-4P (lane 2 and 7), Symfoil-QG (lane 3 and 8) and Symfoil-SG (lane 4 and 9) because of the removal of the N-terminal YKK sequence. After removal of the histidine tag, Symfoil-II became smaller than other Symfoils as seen in lane 11 with heat treatment.
Without heat treatment, Symfoil-II looked extremely large (similar size to its dimer), suggesting that Symfoil-II might form a larger complex. To identify the actual molecular size of Symfoil-II, the molecular size was evaluated by gel filtration equipped with a multi-angle light-scattering detector. The molecular weight of Symfoil-II was, however, estimated to be 14 Â 10 3 , which is similar to the theoretical value (13932) for the monomeric Symfoil-II calculated from its primary structure. Although the mechanism for the size shift seen in SDS PAGE is still unclear, stabilization of Symfoil-II against the denaturation by SDS may be a part of the reason why this band shifts. Further assay of the melting experiment is under way. Now, we obtained Symfoil-II with three complete sequence repeats. We next investigated the effect of the removal of the histidine tag and the Asn-Gly sequence on X-ray diffraction using three independent crystal forms of Symfoil-QG and two independent crystal forms of Symfoil-II. Symfoil-QG crystals diffracted to 2.0, 2.0 and 1.8 Å resolution, respectively, whereas Symfoil-II crystals diffracted to 1.4 and 1.15 Å resolution, respectively, as summarized in Table 2. Symfoil-II was crystallized into different space groups. The C2 space group was uniquely obtained in Symfoil-II and the crystal diffracted to 1.15 Å resolution. The diffraction limit and also the Wilson B-factor are shown in Table 2. This indicates that Symfoil-II diffracted better than the other symfoils with lower B-values. These improved diffraction and lower B-values in Symfoil-II may be caused by structural stabilization. The close location of the N-terminal and C-terminal in Symfoil-II may give a chance to form an ion pair and the electrostatic stabilization may be part of the reason for its stabilization. Evaluation of the stability of Symfoil-II with and without the histidine tag is the next subject to be investigated.
Crystal structures of the Symfoil-QG and Symfoil-II were determined and the refinement statistics are summarized in Table 2. The overall structure of Symfoil-II is shown in Fig. 3(a) SDS-PAGE analysis of the purified Symfoil proteins. From lanes 2 to 6, the sample is not boiled before loading. From lanes 7 to 11, the sample is boiled before loading. Symfoil-4P: lanes 2 and 7; Symfoil-SG: lanes 3 and 8; Symfoil-QG: lanes 4 and 9; His-Symfoil-II: lanes 5 and 10; Symfoil-II: lanes 6 and 11. Protein size markers of Mark12 (Life Technologies) are shown in lanes 1 and 12.
Symfoil-QG and Symfoil-II in the I222 crystal form is 0.39 Å , indicating that the structural difference caused by the removal of the N-terminal sequence is quite small. Location of the Nterminal histidine tag of Symfoil-QG was not determined in any crystals obtained in this study. Electron densities for the loop region connecting three repeats in Symfoil-QG and Symfoil-II were still invisible, but became clearer in the structure of Symfoil-II determined to 1.15 Å resolution (Fig. 3b). Assuming that Symfoil-II with complete three sequence repeats has an ion pair at the N-and C-terminal, the structure is almost perfect threefold symmetry. RMS positional differences after application of the rotation matrix calculated using the structures of each repeat of Symfoil-QG and Symfoil-II were less than 0.37 Å for Symfoil-QG and less than 0.37 Å for Symfoil-II, indicating that both Symfoils have threefold symmetry including the shape of the central cavity.
In conclusion, we succeeded in preparation of artificial protein having three complete sequence repeats. Prepared Symfoil-II resulted in improving the X-ray diffraction and the structural details were figured out. We are now attempting to convert the ion pair of the N-and C-terminal with an amide bond to prepare a circular Symfoil, which may be useful as a scaffold to capture molecules having C3 symmetry (Gibson & Castaldi, 2006) by virtue of specific interaction with the threefold axes of symmetry present in Symfoil-II.
We thank the staff at SPring-8 and Photon Factory. The synchrotron radiation experiments were performed at the BL38B1 beamline in SPring-8 with the approval of the Japan Synchrotron Radiation Research Institute (proposal No. Structure of Symfoil-II in space group C2. (a) Overall structure of Symfoil-II represented by a ribbon model. The first repeat (residues 12-53 in Fig. 1) is colored in green, the second repeat (54-95) is colored in cyan and the third repeat (96-137) is colored in orange. (b) Structure of N-and C-terminal residues in Symfoil-II. The 2F o À F c electron density map is contoured at the 1.0 level.