Structural Biology and Crystallization Communications Wheat Germ Cell-free Expression System as a Pathway to Improve Protein Yield and Solubility for the Ssgcid Pipeline

Recombinant expression of proteins of interest in Escherichia coli is an important tool in the determination of protein structure. However, lack of expression and insolubility remain significant challenges to the expression and crystallization of these proteins. The SSGCID program uses a wheat germ cell-free expression system as a rescue pathway for proteins that are either not expressed or insoluble when produced in E. coli. Testing indicates that the system is a valuable tool for these protein targets. Further increases in solubility were obtained by the addition of the NVoy polymer reagent to the reaction mixture. These data indicate that this eukaryotic cell-free expression system has a high success rate and that the addition of specific reagents can increase the yield of soluble protein.


Protein expression
An impediment to the successful production of large quantities of natively folded proteins in Escherichia coli is the tendency of many proteins to become insoluble when overexpressed. In order to successfully produce quantities of these proteins sufficient for crystallization, additional methods are necessary; in vitro systems using other organisms as well as cell-free systems utilizing extracts from prokaryotic or eukaryotic organisms have been developed (Grä slund et al., 2008;Endo & Sawasaki, 2006). Eukaryotic systems utilize a protein-folding apparatus that has evolved to direct the folding of more complex proteins, while cell-free systems are not dependent on the survival of a cell (Klammt et al., 2006).
Prokaryotic cell-free systems exist, although these systems have demonstrated limited improvement over prokaryotic in vivo methods; the misfolding of proteins remains a significant problem (Hillebrecht & Chong, 2008). The wheat germ cell-free expression system combines the advantages of cell-free and eukaryotic systems and is well suited for expression of difficult-to-express proteins such as disulfide-bond-containing or integral membrane proteins (Endo & Sawasaki, 2006;Kawasaki et al., 2003;Spirin, 2004;Vinarov, Loushin Newman, Tyler et al., 2006;Klammt et al., 2006;Tyler et al., 2005). This system has been used as a rescue pathway for human proteins that are not soluble in both in vivo and in vitro E. coli systems (Langlais et al., 2007).
A recent analysis of in vivo and in vitro expression of Arabidopsis thaliana proteins found that 95-97% of a set of protein targets were soluble when expressed via a wheat germ cell-free system in comparison to 40% when expressed using the E. coli cell-based system (Langlais et al., 2007). These two systems were also tested on a Plasmodium falciparum protein set and while detectable protein was obtained for 30% of the proteins in E. coli, protein was obtained for 75% when expressed in the eukaryotic cell-free system (Tyler et al., 2005). Thus, eukaryotic in vitro systems, specifically the wheat germ cell-free system, hold significant promise.

Solubility
There is extensive literature on the variables leading to insoluble recombinant expression of proteins. Protein aggregation remains a significant problem in E. coli expression systems. Tags used to purify proteins often affect the solubility, and the addition of various tags can lead to the soluble expression of a previously insoluble protein (Gordon et al., 2008;Ohana et al., 2009;Sun et al., 2011). Modification of the sequence, such as the addition of highly acidic sequences, can also solubilize a previously insoluble protein (Zhang et al., 2004) and different tags may affect solubility (Widakowich et al., 2011). Proteins with similar features in their native sequences may have a greater tendency towards solubility when recombinantly expressed in vivo. The frequency of individual amino acids as well as the frequency of different types of amino acids within a protein has been shown to affect in vivo solubility in E. coli. It is likely that secondary structure also plays a role in protein solubility and the tendency to form amyloid bodies in vivo (Idicula-Thomas & Balaji, 2005. If the protein produced in E. coli is primarily insoluble, denaturing and refolding can be attempted. Common denaturing reagents include guanidinium and urea. The refolding process can be aided by the addition of stabilizing agents such as l-arginine (Kudou et al., 2011). Cell-free systems have an additional advantage in the production of soluble protein, as agents that aid in protein folding can be directly added to the translation reaction. Eukaryotic expression systems, including the wheat germ cell-free system, have been demonstrated to raise the solubility of a protein (Dadashipour et al., 2011;Klammt et al., 2006;Langlais et al., 2007).

Protein selection
Proteins in this analysis are a subset of protein targets entered into the Seattle Structural Genomics Center for Infectious Disease (SSGCID) pipeline. The target organisms were NIAID category A-C pathogens. Proteins within target organisms were selected bioinformatically for homology to current drug targets or nominated for structure determination by the scientific community. Proteins were eliminated if they contained more than eight cysteines or a predicted transmembrane domain in the absence of a signal peptide. A total of 44 proteins were used in this analysis, a summary of which can be  Table 1 Protein set used in this analysis.

Species
Accession ID

Cloning
DNA of the target proteins was cloned into the pAVA0421 vector via ligation-independent cloning (LIC) and grown on LB-carbenicillin plates. The pAVA0421 vector contains an N-terminal hexahistidine affinity tag (MAHHHHHH) for imobilized metal-ion affinity chromatography (IMAC). Plasmids were purified using a GenElute HP Plasmid Mini-Prep Kit (Sigma-Aldrich, Dallas, Texas, USA) and transformed into E. coli BL21 (DE3) Rosetta cells (EMD Chemicals, San Diego, California, USA) for expression screening. Small-scale protein expression was carried out and evaluated by Western blotting. All constructs were sequenced in the forward direction to confirm that the correct protein target had been cloned.
DNA templates were obtained from the SSGCID pipeline (Myler et al., 2009). Following E. coli in vivo expression trials, PCR products of the target gene including the six-His tag were amplified from the pAVA0421 vector. The PCR products were then cloned into the cellfree expression vector pEU-E01-LIC1 (pEU-LIC), which had previously been modified to accommodate ligation-independent cloning. Targets were PCR-amplified from the prokaryotic expression vector with RedTaq (Sigma, St Louis, Missouri, USA) using the primers F, CTCACCACCACCACCACCATATG, and R, ATCC-TATCTTACTCACTTAGCAGCCGGATCCTCGAG, inserted into pEU-LIC using ligation-independent cloning and transformed into Top10 cells (Invitrogen, Carlsbad, California, USA), which were then grown on LB-carbenicillin plates. Individual colonies were screened for insertion via colony PCR. DNA from the positive clones was maxi-prepped (Sigma, St Louis, Missouri, USA) and the full insert was sequenced in both the forward and reverse directions to confirm that the correct sequence had been cloned and that the insert was free of mutations.

Expression and solubility testing
Transcription reactions for small-scale screening were performed in PCR strip tubes. In each of the reaction tubes, 2 mg plasmid DNA was mixed with transcription buffer (80 mM HEPES-KOH pH 7.8 containing 20 mM MgCl 2 , 2 mM spermidine hydrochloride, 10 mM dithiothreitol), 3 mM NTP mix, 2.4 U ml À1 SP6 RNA polymerase and 1.2 U ml À1 RNase inhibitor; RNase-free water was used to bring the final volume to 20 ml. Transcription reactions were then incubated for 4-6 h at 310 K. A Microcon YM-30 filter (Millipore, Billerica, Massachusetts, USA) was used for small-scale mRNA clean-up.
Small-scale translation reactions were performed in 96-well plates and synthesized RNA was added to the translation mixture; largescale reactions were performed using either the Protemist DT II robot (Cell Free Sciences, Yokohama, Japan), which also performs sequential transcription, translation and purification steps, or the Protemist XE robot (Cell Free Sciences, Yokohama, Japan), a robot that performs continuous translation for high yields of protein production. Small-scale and large-scale translations were performed using WEPRO1240H (Cell Free Sciences, Yokohama, Japan) cell extract according to the manufacturer's instructions. For large-scale purification, the translation mixture was clarified by centrifugation at 6000 rev min À1 for 30 min at 277 K. The supernatant was purified by IMAC using a HisTrap FF 5 ml column (GE Biosciences, Piscataway, New Jersey, USA) equilibrated with binding buffer consisting of 25 mM HEPES pH 7.0, 300 mM NaCl, 5% glycerol, 30 mM imidazole, 1 mM tris(2-carboxyethyl)phosphine (TCEP) and eluted with 500 mM imidazole in the same buffer. Concentrated pure protein was flash-frozen in liquid nitrogen and stored at 193 K. Additional translations were performed using WEPRO7240H extract (Cell Free Sciences, Yokohama, Japan) according to the manufacturer's instructions. For selected protein targets, NVoy (also known as NV10), a commercially available linear carbohydrate-based polymer of 5 kDa molecular weight (Expedeon, San Diego, California, USA), was used to improve solubility yields. NVoy was added directly to the translation-reaction mixture in the Protemist XE system at a concentration of 1 mg ml À1 . Solubility was assessed in the presence and the absence of the NVoy polymer.
From the translation reaction, two 10 ml aliquots were taken. The total protein from the first aliquot was mixed with 10 ml sample buffer. The other aliquot was spun in a microcentrifuge at top speed for 60 s; the supernatant (soluble) was separated from the pellet and mixed with 10 ml sample buffer, while the pellet (insoluble) was resuspended in 20 ml sample buffer. All samples were boiled at 368 K for 10 min and 10 ml of each sample was loaded onto a gradient SDS-PAGE gel (Pierce Bioscience, Rockford, Illinois, USA) for analysis. Gels were stained with SimplyBlue SafeStain (Invitrogen, Carlsbad, California, USA) and expression levels were based on visual inspection of the SDS-PAGE gels and scaling the intensity of the expected protein bands. The identities of the hexahistidine-tagged proteins were confirmed by Western blotting using an anti-His antibody (Qiagen, Valencia, California, USA). In addition to confirming that the protein was of the correct size, the protein molecular-weight standard (Bio-Rad, Hercules, California, USA) was used as a reference for the quantity of His-tagged protein. Briefly, the approximate quantity of protein in each marker band was estimated and the band intensities of the His-tagged proteins were then compared with the marker proteins. A 'high' expression score (+++) corresponded to greater than 0.75 mg ml À1 target protein in the reaction mixture. A 'medium' expression rating (++) corresponded to more than 0.30 mg ml À1 but less than 0.75 mg ml À1 and a 'low' expression rating (+) corresponded to a visible band of less than 0.30 mg ml À1 . A 'no detectable protein' rating (À) corresponded to no detectable expression on either SDS-PAGE or Western blot gels.  Table 2 Summary of protein-expression level and solubility from the cell-free expression system using small-scale expression, the Protemist DT II and the Protemist XE robotic platforms.
Conditions included WEPRO1240H and 7240H cell-free extracts in the presence and absence of NVoy. 1240H, WEPRO1240H extract; 7240H, WEPRO7240H extract; (À), without NVoy; (+), with 1 mg ml À1 NVoy. Ratings of the solubility were also determined by visual inspection of the protein bands on SafeStained SDS-PAGE gels using a system similar to that employed for the scoring of expression. The ratio of the band intensities resulting from the pairs of soluble and insoluble proteins was used to rate solubility. A high solubility score (+++) was assigned when 75-100% of the total protein was in the soluble fraction. A medium solubility rating (++) was assigned when the soluble and the insoluble fractions were approximately equal in band intensity. A low solubility rating (+) was assigned when less than 25% of the total protein intensity was in the soluble fraction. Samples with an absence of detectable soluble protein were assigned an insoluble rating (À).

Small-scale expression
A total of 44 protein targets were analyzed. Two were from eukaryotic organisms (Trypanosoma brucei and Leishmania infantum); the remainder originated in prokaryotes (the genera Anaplasma, Bartonella, Borrelia, Brucella, Burkholderia, Ehrlichia and Mycobacterium). In this set, 20% of the protein targets were soluble when expressed in E. coli, 52% were insoluble and the remaining 22% were not expressed ( Table 1). The first stage of the wheat germ cell-free expression pipeline was designed for small-scale screening. During these tests, detectable protein was achieved for 87% of targets based on Western blot analysis (Tables 1 and 2). 83% of the targets expressing insoluble protein in E. coli expressed soluble protein in the cell-free system (Table 1). These data demonstrate that the wheat germ cell-free expression system is effective at production of soluble protein.

Large-scale expression
A collection of 29 proteins were selected from the first set of robust small-scale-screened targets for further scaling up in the Protemist DT II using WEPRO1240H extract. (Transcription and translation reactions as well as affinity purification are performed sequentially in the robot.) From this set, 28 targets produced detectable protein, with medium to high quantities of protein for 17 of these targets (Table 2). Additionally, the small-scale expression testing successfully predicted the expression level and degree of solubility of proteins produced on the large scale in the DT II robot (data not shown). Of the 29 targets, five with varying levels of solubility were selected for testing in the Protemist XE robot with WEPRO7240H cell-free extract, which has been optimized for high-level expression of His-tagged proteins using the Protemist XE (Table 3). Of the five proteins tested in this system, large quantities of protein were obtained for four (Table 3, data not shown), although only small quantities of the protein were soluble.
Small quantities of the fifth protein were produced, but these quantities were sufficient for crystallization using the microcapillary method (Gerdts et al., 2008;Yadav et al., 2005). As a result, additional methods were necessary to produce significant quantities of soluble protein.

Solubility testing
One of the more prominent challenges in the WEPRO7240H expression system is the tendency of protein products to form precipitates: some runs of the Protemist XE resulted in a mixture so turbid that the visible-light sensor was unable to function correctly. Likewise, large-scale expression using WEPRO1240H with the Protemist XE resulted in increased protein yields accompanied by a decrease in solubility (data not shown). One of the advantages of the in vitro system is the ability to add reagents as necessary, such as those that help to increase solubility, directly to the translation reaction. One of the factors contributing to protein aggregation is the interaction of exposed hydrophobic patches owing to incorrect protein folding. We therefore chose to investigate the use of NVoy polymer in the cell-free system. Nvoy consists of a carbohydrate backbone with hydrophobic side chains which mask any hydrophobic patches on the protein, thereby limiting nonspecific interactions which can cause aggregation. The effects of the NVoy polymer on solubility and its impact on expression levels were examined in a series of experiments carried out in the Protemist DT II on a subset of five protein targets (Table 2). This subset consisted of targets for which less than 75% of the proteins were soluble when expressed in the DT II or XE; when expressed in E. coli, the protein was predominantly insoluble. WEPRO1240H in the presence and absence of NVoy was tested for five proteins and WEPRO7240H in the presence and absence of NVoy was tested for five proteins, four of which were also used in the WEPRO1240H testing. The NVoy polymer did not decrease the solubility of any of the proteins tested. For two proteins, the expression levels were lower in the presence of NVoy, but this decrease in expression level was accompanied by an increase in solubility (data not shown, Table 3). One protein, which was completely insoluble when expressed with WEPRO7240H, was over 75% soluble when expressed in the same system in the presence of NVoy. Overall, in seven of the 11 paired NVoy(À)/NVoy(+) comparisons the NVoy reagent increased the percentage soluble protein yield and in half the experiments the reagent more than doubled the yield of soluble protein; for one protein, we were able to obtain more than 1 mg ml À1 soluble protein (Table 3). These tests demonstrate that NVoy does not substantially reduce total protein yields and in most tests increased the quantity and the percentage of soluble protein produced.  Table 3 Summary of large-scale expression data in the presence or absence of NVoy.
Numbers indicate the numbers of milligrams of soluble protein purified in a 1 ml reaction cup on the Protemist DT II from each run. (+) and (À) indicate the presence or absence of NVoy at a concentration of 1 mg ml À1 .

Discussion
This analysis of expression of proteins from the SSGCID pipeline in a eukaryotic in vitro system validates the use of the wheat germ cellfree system for expression of proteins that are either not expressed or are primarily insoluble when expressed in E. coli. The quantities of soluble protein were sufficient for microcapillary crystallization as developed by Emerald BioSystems (Gerdts et al., 2008;Yadav et al., 2005), although no crystal structures of any of the protein targets in this set have yet been obtained. Combining the wheat germ cell-free expression system with a crystallization method such as microcapillary crystallization may yield structural data for proteins that have been difficult to express. A lingering concern is the tendency for insolubility to increase as the quantity of protein produced increases. As a result, for most targets similar quantities of soluble protein were obtained from the use of WEPRO1240H (optimized for the DT II) and WEPRO7240H (optimized for high-level expression in Protemist XE) in the absence of additional solubilizing agents. This may reflect the tendency of the protein to form insoluble aggregates at higher concentrations. The addition of NVoy substantially increased the quantities of soluble protein produced, although even in the presence of NVoy the quantities of soluble protein were insufficient for standard crystallization studies. This demonstrates that NVoy can be added to the translation reaction to increase solubility, potentially eliminating the need to denature and refold the protein for solubility. The addition of other solubilizing reagents may lead to further increases in solubility.
These data demonstrate the utility of the wheat germ cell-free system for expression of proteins that are insoluble when expressed in E. coli. The addition of NVoy substantially increased the yield of soluble protein; this reagent is likely to increase the production of soluble proteins that are insoluble in E. coli.