research papers
Implementation of semi-automated cloning and prokaryotic expression screening: the impact of SPINE
aUnité de Biochimie Structurale, Institut Pasteur, 25–28 Rue du Dr Roux, 75724 Paris CEDEX 15, France, bDepartment of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-109 51 Stockholm, Sweden, cOxford Protein Production Facility and Division of Structural Biology, Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, England, dYork Structural Biology Laboratory, Department of Chemistry, University of York, York YO10 5YW, England, eInstitut de Génétique et de Biologie Moléculaire et Cellulaire, 1 Rue Laurent Fries, BP 163, 67404 Illkirch CEDEX, France, fArchitecture et Fonction des Macromolécules Biologiques UMR6098, CNRS/Universités de Provence/Université de la Méditerranée Parc Scientifique et Technologique de Luminy, Case 932163, Avenue de Luminy, 13288 Marseille CEDEX 09, France, gDivision of Molecular Carcinogenesis, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands, hBijvoet Center for Biomolecular Research, NMR Spectroscopy, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands, iEMBL Hamburg Outstation, Notkestrasse 85, D-22603 Hamburg, Germany, jEMBL Grenoble, c/o ILL, BP 181, 6 Rue Jules Horowitz, F-38042 Grenoble CEDEX 9, France, kMax-Planck-Institute of Biochemistry, Department of Proteomics and Signal Transduction, Am Klopferspitz 18, 82152 Martinsried, Germany, lDepartment of Biotechnology, Royal Institute of Technology, AlbaNova University Centre, S-10691 Stockholm, Sweden, mInstitut de Biochimie et de Biophysique Moléculaire et Cellulaire, UMR8619, Bâtiment 430, Université de Paris-Sud, 91405 Orsay CEDEX, France, and nThe Israel Structural Proteomics Centre, The Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
*Correspondence e-mail: ray@strubi.ox.ac.uk
The implementation of high-throughput (HTP) cloning and expression screening in Escherichia coli by 14 laboratories in the Structural Proteomics In Europe (SPINE) consortium is described. Cloning efficiencies of greater than 80% have been achieved for the three non-ligation-based cloning techniques used, namely Gateway, ligation-indendent cloning of PCR products (LIC-PCR) and In-Fusion, with LIC-PCR emerging as the most cost-effective. On average, two constructs have been made for each of the approximately 1700 protein targets selected by SPINE for protein production. Overall, HTP expression screening in E. coli has yielded 32% soluble constructs, with at least one for 70% of the targets. In addition to the implementation of HTP cloning and expression screening, the development of two novel technologies is described, namely library-based screening for soluble constructs and parallel small-scale high-density fermentation.
Keywords: cloning; expression screening.
1. Introduction
High-throughput (HTP) sequencing of eukaryotic, viral and bacterial genomes is providing a huge database of proteins with potential for structure–function analysis. In response to this opportunity, structural proteomics projects have been initiated worldwide with the aim of establishing HTP e.g. multi-channel pipette dispensers. The motivation to implement automation is largely to enable processes to be scaleable and run routinely as error-free operations, leading to greater reproducibility compared with procedures carried out manually. The EU-funded Structural Proteomics In Europe (SPINE; https://www.spineurope.org ) programme has provided the opportunity for method developments at a number of European centres and the exchange of experience during the 3 y of the project.
on a genome-wide scale. Crucial to this effort has been the development of production technologies for the HTP cloning, expression and purification of recombinant proteins. In all projects, there has been an emphasis on parallel processing for molecular cloning, expression and purification. This has been driven by the need to accommodate relatively large numbers of potential targets for structural biology at an acceptable cost and has led to varying degrees of automation. Most of the groups involved have set up semi-automated liquid-handling systems to carry out some or all of their protocols. However, the protocols can be carried out equally well by hand with appropriate equipment,For structural biology, expression of recombinant proteins in Escherichia coli remains the most widely used approach for obvious reasons of speed, ease of use and low cost. The SPINE project has been no exception and approximately 200 new structures have been obtained from proteins produced in E. coli. The development and implementation of HTP approaches to cloning and expression in E. coli have made a major contribution to this output. In this article, we review the technical developments that have come from SPINE and provide protocols for carrying out cloning and expression screening, including refolding, in a relatively HTP and parallel approach. Apart from sponsoring the establishment of Europe-wide HTP activity, the SPINE programme has enabled the development of novel technology focused at specific sites. In the context of protein production in E. coli, two examples will be reported which demonstrate such innovation. Firstly, a library-based approach to high-density screening for the expression of soluble fragments has been set up at the EMBL laboratories in Grenoble. This builds on to recent developments in which random screening strategies are being used to define solubly expressing constructs (reviewed in Hart & Tarendeau, 2006). Secondly, novel systems for small-scale parallel high-density of E. coli have been developed at the Pasteur Institute (Bellalou et al., manuscript in preparation; Hedrén et al., 2006).
2. Experimental
2.1. Vector construction
Amongst SPINE groups, ligation-dependent, ligation-independent and site-specific recombinatorial cloning have been used to construct the expression vectors required for protein production (see Table 1). Of these methods, the Gateway system of recombinatorial cloning has been the most widely used (six laboratories). This is a modification of the recombination system of phage λ (Hartley et al., 2000; Walhout et al., 2000) and the Gateway system utilizes a minimum set of components of the λ system for in vitro transfer of DNA. Directional cloning of the DNA insert is ensured by using two nearly identical but non-compatible versions of the λ att recombination site. The system works by a two-step process: firstly in the BP reaction PCR products are cloned into a generic `entry vector', whilst in the second (LR) step a recombination reaction transfers the gene of interest into the chosen expression vector. Implementation of Gateway in SPINE has been according to the manufacturer's protocol (https://www.invitrogen.com ) with reaction volumes generally down-scaled, e.g. to 5 µl (BP) and 10 µl (LR) (Vincentelli et al., 2003; Busso et al., 2005). In all cases, new and improved Gateway-compatible vectors have been constructed to customize the system (Table 2). An example of the use the Gateway system in HTP mode comes from the OPPF, in which 342 PCR products have been cloned into the Gateway expression vector pDEST 14 (InVitrogen) with a success rate of 86% as assessed by PCR screening of constructs. This is consistent with reports from other groups outside SPINE (e.g. Thao et al., 2004). It appears that the overall cloning efficiency of the Gateway system is largely determined by the BP reaction, since the LR recombination step is generally 100% efficient.
|
The major alternative to Gateway in SPINE has been the use of ligation-independent cloning of PCR products (LIC-PCR), which was developed over 10 y ago (Aslandis & deJong, 1990; Haun et al., 1992). LIC-PCR is based on the use of T4 DNA polymerase in the presence of a single deoxyribonucleotide to produce 12–15 bp overhangs in a PCR product that are complementary to sequences generated in the recipient vector. These extensions anneal sufficiently strongly to allow transformation of E. coli without the need for a ligation step which is carried out by repair enzymes in the host. LIC-PCR has been successfully implemented in 96-well format by three groups using variants of the original protocol. The LIC-PCR procedure is exemplified by the protocol developed by the York Structural Biology Laboratory (York SPINE partner; Fogg et al., unpublished work) and is representative of those used by other groups in SPINE (Fig. 1). The York LIC-PCR vector has been used to clone 263 PCR products with a success rate of 85%. Finally, a second method for LIC cloning has been implemented by the Oxford partners during the course of SPINE using the In-Fusion technology (Clontech; Berrow et al., manuscript in preparation). The In-Fusion method is both insert-sequence independent and enables cloning of PCR products directly into any cloning or expression vector via primer extensions defined by the user. Integration of the insert and vector DNA is achieved in a catalysed by the proprietary In-Fusion enzyme (https://www.clontech.com ). In-Fusion cloning has been used by the Oxford partners to clone a set of 347 PCR products with an efficiency of 89% as assessed by PCR screening of recombinant clones. This is consistent with an efficiency of 85% reported by Hartman et al. (2005) for the cloning of 874 PCR amplicons by In-Fusion. A generic workflow for HTP cloning is shown in Fig. 2 based on the working practice of the Oxford partners.
The 10–15% cloning failure observed for all the strategies outlined above probably arises from a proportion of poor-quality PCR products in the experiments. Typically, in HTP mode only one PCR condition for reaction volume, cycling parameters, amount of template and primer is used to amplify all inserts. This is chosen to give the best overall coverage, but may compromise the yield of particular target sequences leading to subsequent cloning failures. However, these can frequently be rescued in an individual experiment in which PCR conditions are optimized for specific sequences.
2.2. Expression screening in 96-well plate format
A major feature of the HTP protein-production pipelines developed in SPINE is the inclusion of a screening step on a relatively small scale to identify constructs suitable in terms of soluble protein yield for scale-up and subsequent protein purification. Given the relatively high cost of the latter steps in terms of time and resources, the screening stage is seen as crucial to the overall process (Table 1). All groups make use of T7-based vectors and the BL21 (DE3) strain of E. coli and its derivatives for expression screening, although the number of different strains evaluated in parallel varies between groups (Aslandis & deJong, 1990; Haun et al., 1992; Quevillon-Cheruel et al., 2003; Folkers et al., 2004; Busso et al., 2005; Vincentelli et al., 2005). Typically, small-scale expression screens are carried out in 96- or 24-well deep-well plates using enriched complex media e.g. TB, 2YT, GS96 (QBiogene) to ensure maximum biomass. Generally, these media support growth to optical densities (OD) of 5–10 OD600 units compared with 2–3 OD600 units in standard Luria Broth (LB). Expression is induced by either the addition of IPTG (0.1–1.0 mM) or by the autoinduction method of Studier (2005).
The starting point for the expression assay is lysing the cells after harvesting and then separation of soluble from insoluble fractions. Cell lysis is carried out using standard protocols by either a freeze–thaw cycle followed by treatment with DNAse/lysozyme or by sonication with/without lysozyme or using commercial detergent-based lysis reagents, e.g. BugBuster (Merck). Chemical methods lend themselves to 96-well formats, although sonicators which can accommodate a 96-well plate are available (Misonix) and used by the Strasbourg group for cell lysis in this format. Generally, soluble products are then purified on a small scale using either IMAC magnetic beads (e.g. Qiagen, Novagen) or IMAC resins in filter plates via a His tag on the protein. Standard manufacturers' protocols are used for this step, with most groups using robotic liquid-handling systems to automate the process (Table 1). Two main assay formats have been adopted for detecting soluble protein expression, namely immunodetection using dot-blots of either lysates or IMAC-purified soluble proteins (Knaust & Nordlund, 2001; Cornvik et al., 2005; Vincentelli et al., 2005) or conventional SDS–PAGE of samples (Folkers et al., 2004; Busso et al., 2005). A generic workflow for expression screening based on the Oxford procedures is shown in Fig. 2. Details of each protocol are given in the accompanying paper (Berrow et al., 2006), in which a set of 96 expression constructs were screened by different SPINE groups to benchmark the protocols against each other.
Proteins assessed by screening to be suitable for scale-up in terms of quality and quantity, typically >95% pure by SDS–PAGE with a projected yield of ≥0.5 mg l−1, were purified from 1–2 l E. coli cultures for subsequent structural studies.
2.3. Deletion-library construction and screening
Both the EMBL Grenoble and the Stockholm groups have developed methodologies for the construction and screening of gene-deletion libraries for identification of truncated reading frames that express soluble proteins. The Stockholm group combined the Erase-a-base protocol (Promega Corp.) with a colony-lift procedure to identify such constructs (Cornvik et al., 2005). The Grenoble group have adapted library-handling and screening robotics, originally developed for genomics applications, to the purpose of systematically screening gene-truncation and gene-fragmentation libraries. Both groups analyse protein expression at the colony level, which avoids the logistical difficulties of simultaneous protein expression and testing of solubility for thousands of different clones in multi-well plate format. The Grenoble process, ESPRIT (expression of soluble proteins by random incremental truncation), involves the robotic picking of 30 000 individual constructs into 384-well plates followed by the printing of high-density colony arrays on nitrocellulose filters. Colonies are grown at three separate temperatures and protein expression is induced by shifting the filters onto LB agar containing inducing agent (IPTG or arabinose). After lysis and fixing of cellular proteins onto the nitrocellulose filter, soluble protein expression is detected using a fused linear peptide that is efficiently post-translationally modified in vivo only if the protein is both soluble and stable (manuscript in preparation). The geometric format of the printed colony filters permits the use of array-analysis software, originally designed for DNA arrays, for quantification of signal intensity and ranking of clones. Positive clones are then robotically re-arrayed into a single 96-well plate for further confirmation of protein solubility and characterization of the domain boundaries by DNA sequencing.
2.4. Refolding from inclusion bodies
A number of SPINE partners have developed protocols to recover protein from inclusion bodies. Some of the approaches taken by the Marseille, Orsay and Weizmann groups have been described in detail elsewhere (Vincentelli et al., 2004; Tresaugues et al., 2004; Albeck et al., 2005). Briefly, for those proteins that are expressed exclusively in the insoluble fraction based on the screen, washed inclusion bodies are prepared and solubilized in guanidinium hydrochloride. The His-tagged proteins are then purified under denaturing conditions and protein refolding attempted by dilution into a set of refolding buffers in a 96-well format. Folding of protein is followed by measuring at 350 or 390 nm and the screen has been fully automated (Vincentelli et al., 2004). Further assessment of refolding and validation of any hits requires scale-up of the process under the conditions identified by the screen and biophysical characterization of the products. Generic methods used to assess authentic folding include (SEC; monodispersity), (secondary structure) and dynamic (DLS; monodispersity). Similarly at the Weizmann, His-tagged protein is partially purified in the denatured state by capture on nickel–NTA before dilution into various buffers containing additives such as salts, polar additives, osmolites, detergent additives and chaotropes at three different pH values (Albeck et al., 2005).
As an alternative to screening for refolding, a high level of success for particular groups/classes of proteins is possible using a single condition. For example, a generic approach has been used by the Oxford group for extracellular proteins stabilized by disulfide bonds (Gao et al., 1998). The procedure involves preparation of inclusion bodies on a large scale (from 2 l cultures) followed by solubilization in 6 M guanidine–HCl, 50 mM Tris pH 8, 100 mM NaCl, 10 mM EDTA and 10 mM DTT. Denatured protein is then rapidly diluted into a large volume of refolding buffer containing 200 mM Tris–HCl pH 8, 10 mM EDTA, 1 M L-arginine, 0.1 mM PMSF, 6.5 mM cysteamine and 3.7 mM cystamine.
2.5. High-density parallel fermentation
Most groups have a target yield of at least 5 mg purified protein per litre of E. coli culture and with the general take-up of nanolitre crystallization methods this is usually sufficient protein for crystallization screening (see Berry et al., 2006). Thus, for many projects growing cells in simple shake-flask cultures to 1 l is usually adequate and several different production runs can easily be handled in parallel. However, for some lower yielding but high-value proteins larger volumes would be required and the use of fermenter systems in which growth and productivity can be controlled and optimized becomes appropriate. In the SPINE consortium, dedicated faculties have been established at the EMBL, Hamburg to carry out high-density fermentations (OD600 > 100) up to the 1 l scale using four individual fermenters. In addition, the Strasbourg group use a commercially available system comprising a battery of six small (0.5 l working volume) fermenters (SixFors, Infors). However, there remains the need for a more highly parallel system for high-density With these requirements in mind, the Pasteur group have built a computer-controlled battery comprising eight miniaturized fermenters of 80 ml capacity. The reactors consist of glass vessels with a square section, which enables continuous monitoring of the of the cultures by means of an external mobile optical sensor which moves from one reactor to the next. The temperature of each reactor is controlled by an internal probe and by Peltier elements, which can be programmed independently. The fermenters can be fitted with pH and pO2 probes and with an automated injection system for adding inducer at a defined stage of the culture. Highly efficient oxygen-enriched aeration provided by sintered glass spargers enables the batch-wise cultivation of E. coli to high cell densities using a home-developed enriched medium. OD600 values of 60–100 are routinely achieved. Finally, a second parallel system has been developed by the Stockholm group which is similar to the SixFors but comprises 12 culture vessels (see Hedrén et al., 2006).
3. Results and discussion
3.1. Vector construction
3.1.1. Choice of cloning method
All the cloning methods have given efficiencies of at least 80% in HTP format. Of these, the Gateway recombinatorial cloning method has been the most widely adopted in the SPINE consortium. However, in using the Gateway system it is important to be aware of the effect the att recombination sequences may have on expression and/or solubility of the cloned protein if they form part of the translated sequence. In a comparison of Gateway vectors in which several genes were expressed as N-His-tagged constructs either with or without att sequences in the translated insert, the Weizmann and Oxford SPINE partners (unpublished) observed that the presence of the att sequence was associated with a marked reduction in the level of soluble expression of several proteins (e.g. Fig. 3). This effect can be avoided by positioning the att sites outside of the ORF, but some of the flexibility of the system is lost since only a single fusion format is possible. Thus, two vector formats have emerged amongst SPINE groups for using Gateway. Firstly, suites of vectors for generating a variety of fusion proteins have been constructed but with protease cleavage sites to enable removal of the att and tag sequence. Secondly, Gateway has been used to insert genes into the vector pDEST14 (InVitrogen) with either short N- or C-terminal His6 tags via att sites immediately downstream of the T7 promoter/enhancer. In this case the RBS is incorporated into the 5′ PCR primers used to amplify the target gene as follows.
This extension would be attached directly to the 5′ gene-specific sequence in the case of a C-His-tagged construct. Alternatively, six histidine codons would be added between the extension and the gene-specific sequence to produce an N-His-tagged construct.Issues with cost, the uncertainty about the effects of the Gateway recombination sequences on expression and the long primers in general required to avoid this have stimulated the development of alternative approaches to HTP cloning. The most widely adopted has been the LIC-PCR system, which has been commercialized by Novagen. The Utrecht group have found that the advantage of this system is that it does not require specialized vectors and reagents are relatively inexpensive, requiring limited amounts of vector (1–3 ng) and insert (1–20 ng) DNA (data from the Utrecht group; Folkers et al., manuscript in preparation). The success rate critically depends on the preparation of high-quality linearized vector which will require batch checking to ensure high efficiency of cloning. A limitation is that one of four bases has to be pre-selected as the `lock' in the compatible overhangs and hence the base-pair composition of the annealing regions is limited to using the other three bases (Fig. 1). Consequently, the method is not entirely sequence-independent and cannot be used to join any sequence to any other sequence.
The enzyme-free cloning (EFC) method (Tillett & Neilan, 1999; Neilan & Tillett, 2002) has the potential to overcome these limitations. A pairwise comparison of LIC and EFC carried out by the Utrecht group revealed that while for both methods all PCR products (n = 24) were successfully cloned, the latter method appeared superior as a larger percentage of the analysed colonies had the correct insert (91% versus 79%; Folkers et al., manuscript in preparation). In-Fusion cloning in addition overcomes the limitation since the enzyme-catalysed reaction is sequence-independent. However, In-Fusion cloning is much less widely used than either Gateway or LIC-PCR at present, although a recent comparison of the two methods concluded that In-Fusion was equally efficient for HTP cloning (Marsischky & LaBaer, 2004). The availability of vectors specifically designed for use with In-Fusion by SPINE partners should increase the utility of the system (Table 2). In comparing the three methods for constructing expression vectors, it is clear that there are advantages and disadvantages to all three. LIC-PCR is the system of choice for minimizing the cost per vector since in contrast to Gateway and In-Fusion it does not require the use of relatively expensive specialized enzyme(s). Only Gateway combines ease of use with maximum flexibility in terms of generating multiple vectors from a single PCR product. In-Fusion cloning probably lies between the other two in terms of unit cost and is the only established HTP to date that is entirely sequence independent.
3.1.2. Choice of fusion tag
HTP protein purification depends on affinity tags to provide a generic strategy. In addition, certain tags have a beneficial effect on protein solubility, especially for the expression of heterologous proteins in E. coli (Smith & Johnson, 1988; Kapust & Waugh, 1999). The most widely used format in the SPINE consortium has been the short His6 tag placed at either the N- or C-terminus of the target gene. In some cases, vectors have been engineered to include a protease cleavage site between the N-terminal tag and the inserted gene to enable removal of the tag post-purification. The most commonly used are from tobacco etch virus (TEV; Parks et al., 1994) and 3C from human rhinovirus (Cordingley et al., 1989). Both enzymes have highly specific linear recognition sequences that are very rarely encountered in other sequences, minimizing the risk of cleavage within the target.
In SPINE, vectors have been constructed using all three cloning strategies described above to produce His-tagged proteins (Table 2). In addition, several groups have developed suites of Gateway-compatible expression vectors that enable different larger fusion proteins to be constructed from a common entry clone (Table 2). The fusion proteins incorporate N-terminal His tags to combine the potential benefits of improved expression levels/solubility of the fusion partner with a generic purification strategy. In specific cases, expression as a fusion protein has rescued otherwise insoluble proteins. For example, the Marseille group found that the hypothetical protein Rv115 from Mycobacterium tuberculosis was expressed solubly as a His-maltose-binding protein (MBP) fusion, whereas the His-only version was insoluble. The availability of soluble protein enabled subsequent crystallization and structure solution following removal of the His MBP tag (Canaan et al., 2005).
At present, there is no clear consensus as to which fusion partners give the best performance in terms of enhancing expression/solubility, although the consensus from the literature (including SPINE groups) suggests that MBP and thioredoxin should be the first choice (Braun et al., 2002; Hammarstrom et al., 2002; Shih et al., 2002; Dyson et al., 2004; Busso et al., 2005). Multiple examples from several SPINE groups, however, have revealed loss of solubility after cleavage, raising doubts on the general usefulness of screening different fusion tags.
3.2. Expression screening
The large-scale and parallel construction of expression vectors either for multiple targets or multiple versions of fewer targets creates a need for parallel expression screening on a small scale. Practical considerations involved in setting up such a screen include the choice of culture conditions (E. coli strain, culture volume and media), cell lysis and protein-detection method for soluble expression. For all these variables the key issue is that for a screen to be useful it has to be predictive of the outcome on a larger scale. Amongst SPINE partners different decisions have been made regarding the format of the E. coli expression screen. In the accompanying paper (Berrow et al., 2006) the different screening protocols are compared by analysing the results of expression screening a common set of 96 vectors. Here, overall results from applying HTP methods in SPINE are presented.
3.2.1. Solubility screening results
Data from the laboratories of each of the authors were collected via a website and analysed (Table 3). It is clear that the technology development described in this report has enabled a very large number of vectors (n = 3847) to be constructed for screening expression in E. coli. In all laboratories at least two constructs have been made for each gene targeted for structural studies. The majority of these constructs comprised either N- or C-terminal His6 tags. Each expression experiment has also been carried out on average three times, giving a total of over 10 000 expression trials. The overall result from all the screening experiments was that 32% of the constructs yielded soluble protein, which means that 70% of the targets produced at least one soluble construct (Table 3). Most interestingly, the bacterial and human targets gave similar results, with the viral ones performing slightly worse. These data are comparable to screening results reported by other large-scale structural proteomics projects largely focused on microbial genomes (Christendat et al., 2000; Lesley et al., 2002; Chance et al., 2004).
|
Follow-on results from the SPINE crystallography groups who used the protocols reviewed in this article indicate that approximately 20% of the soluble proteins identified for scale-up by screening gave diffraction-quality crystals. From these, a total of 150 structures have been solved to date. Details of some of this work are provided in accompanying papers in this volume (e.g. Fogg et al., 2006).
3.3. Refolding proteins from inclusion bodies
In the study reported by the Marseille group (Vincentelli et al., 2004) 24 proteins that formed inclusion bodies were subjected to a small-scale refolding screen to identify conditions that solubilized the proteins. 17 of 24 remained soluble in at least one of the 96 refolding buffers and 15 were scaled and entered for crystallization trials. Of these, five proteins gave crystal hits (some 20% of the starting set), a notable success rate for this set of targets. It will be of interest to test this screening approach with larger numbers of targets in order to assess the general applicability of the method. In an alternative strategy, the group in Oxford have used a simple generic approach to refolding based on the work of Gao et al. (1998), who refolded the extracellular domain of CD8 by rapid dilution into a redox buffer containing arginine. To date, 20 different human and eukaryotic viral proteins have been processed using this protocol. These proteins were selected as either naturally secreted or the ectodomains of cell-surface proteins, all of which formed inclusion bodies following overexpression in E. coli. 12 of the proteins (60%) were recovered by refolding as soluble protein and seven have been crystallized (60%), leading to four solved structures (Gao et al., 1998; Mongkolsapaya et al., 1999; Brown et al., 2002; Brown et al., manuscript in preparation; Bahar et al., manuscript in preparation). This indicates that a high success rate (20% of targeted proteins resulting in structures) can be achieved with a single refolding regime for a specific class of protein. However, these examples are limited to secreted proteins and more generally proteins are less amenable to refolding (see, for example, the experiences of Tarbouriech et al., 2006).
3.4. New technology developments
3.4.1. Library methods
Screening is used widely to solve problems where large numbers of variables make it too complex to rationally predict the solution, e.g. screening of precipitant conditions in protein crystallization. The design of expression constructs is usually a highly rational process involving careful study of sequence alignments and the use of bioinformatic tools. Within the SPINE project, notable successes in terms of crystallisable protein have resulted from screening multiple constructs from a target protein (see, for example, Berry et al., 2006). Such an approach is operationally limited, not least by cost, to <100 variants per target. To go much beyond this requires a library-based approach in which thousands of variants can be produced and assayed. Therefore, in this SPINE workpackage both the Grenoble and Stockholm nodes have designed and validated methods in which large numbers of constructs are randomly generated and then tested for production of soluble protein. These methods offer a rescue route for proteins where successful constructs are unable to be generated by standard means. Additionally, once streamlined, library-based screening may become a primary approach, especially for proteins where there is an absence of homologues preventing generation of sequence alignments. The use of automation can greatly improve the reliability by permitting highly repetitive routines such as colony picking whilst aiding sample tracking since these strategies involve the handling of very large numbers of clones (typically 103–105). A representative result from a high-density colony screen is shown in Fig. 4. Once a panel of hits has been isolated from the library, the constructs can be plugged into the standard structural genomics platforms for expression optimization and purification screening using liquid-handling robots.
3.4.2. High-density parallel fermentation
The eight-vessel micro-fermenter system developed by the Pasteur group has been benchmarked with over 50 different E. coli clones producing various proteins of M. tuberculosis and M. leprae. In most cases, the yield of soluble recombinant protein produced in a 70 ml micro-fermenter culture with the high-density medium was as good as or better than the yield obtained in 1 l shake-flask cultures using LB medium. Furthermore, culture protocols developed in micro-fermenters for optimizing recombinant protein production could be reproducibly scaled up when larger quantities were required. Thus, the micro-fermenter battery has been adopted for routine cell culture at the Pasteur Institute and the possibility of making the system more widely available through commercialization is under discussion. A comparable system has been developed by the Novartis Institute and consists of 96 × 50 ml reaction vessels (Lesley et al., 2002; Lesley & Wilson, 2005). However, the Novartis device does not have the feedback-control systems of the eight-vessel unit, limiting the yield of the system. In addition, the modular design of the Pasteur micro-fermenter would enable a battery of 8 × 12 units to be assembled to provide a degree of parallel operation equivalent to the Novartis system. Within SPINE, the Stockholm group have also developed parallel equipment specifically optimized for structural proteomics; this project is detailed in an accompanying paper (Hedrén et al., 2006).
4. Conclusion
Obtaining recombinant proteins in a soluble form suitable for crystallization remains a major bottleneck for HTP structural biology. However, technical advances in cloning and expression are beginning to accelerate protein-production activity. Within the SPINE consortium most laboratories have implemented HTP cloning and expression screening in E. coli and this has had a major impact on the ability to process multiple constructs in parallel. As a result, soluble expression of at least domains has been obtained for a relatively high proportion of proteins selected for structural studies. For high-value targets, new methods for high-density screening of libraries of deletion mutants offers a way of identifying sub-regions which express in soluble form in E. coli without prior knowledge of domain organization. Overall, the HTP production of recombinant proteins in E. coli remains central to increasing the rate of protein structure solution in a cost-effective manner.
Acknowledgements
We thank all our colleagues for their contributions to the work reviewed in this article. This project is funded by the European Commission as SPINE (Structural Proteomics In Europe) Contract No. QLG2-CT-2002-00988 under the Integrated Programme `Quality of Life and Management of Living Resources'.
References
Albeck, S., Burstein, Y., Dym, O., Jacobovitch, Y., Levi, N., Meged, R., Michael, Y., Peleg, Y., Prilusky, J., Schreiber, G., Silman, I., Unger, T. & Sussman, J. L. (2005). Acta Cryst. D61, 1364–1372. Web of Science CrossRef CAS IUCr Journals Google Scholar
Aslandis, C. & deJong, P. J. (1990). Nucleic Acids Res. 18, 6069–6074. PubMed Web of Science Google Scholar
Berrow, N. S. et al. (2006). Acta Cryst. D62, 1218–1226. Web of Science CrossRef CAS IUCr Journals Google Scholar
Berry, I. M., Dym, O., Esnouf, R. M., Harlos, K., Meged, R., Perrakis, A., Sussman, J. L., Walter, T. S., Wilson, J. & Messerschmidt, A. (2006). Acta Cryst. D62, 1137–1149. Web of Science CrossRef CAS IUCr Journals Google Scholar
Betton, J. M. (2004). Biochimie, 86, 601–605. Web of Science CrossRef PubMed CAS Google Scholar
Braun, P., Hu, Y., Shen, B., Halleck, A., Koundinya, M. & LaBaer, J. (2002). Proc. Natl Acad. Sci. USA, 99, 2654–2659. Web of Science CrossRef PubMed CAS Google Scholar
Brown, J., Esnouf, R. M., Jones, M. A., Linnell, J., Harlos, K., Hassan, A. B. & Jones, E. Y. (2002). EMBO J. 21, 1054–1062. Web of Science CrossRef PubMed CAS Google Scholar
Busso, D., Delagoutte-Busso, B. & Moras, D. (2005). Anal. Biochem. 343, 313–121. Web of Science CrossRef PubMed CAS Google Scholar
Busso, D., Poussin-Courmontagne, P., Rose, D., Ripp, R., Litt, A., Thierry, J.-C. & Moras, D. (2005). J. Struct. Funct. Genomics, 6, 81–88. CrossRef PubMed CAS Google Scholar
Canaan, S., Sulzenbacher, G., Roig-Zamboni, V., Scappuccini-Calvo, L., Frassinetti, F., Maurin, D., Cambillau, C. & Bourne, Y. (2005). FEBS Lett. 579, 215–221. Web of Science CrossRef PubMed CAS Google Scholar
Chance, M. R., Fiser, A., Sali, A., Pieper, U., Eswar, N., Xu, G., Fajardo, J. E., Radhakannan, T. & Marinkovic, N. (2004). Genome Res. 14, 2145–2154. Web of Science CrossRef PubMed CAS Google Scholar
Christendat, D. et al. (2000). Nature Struct. Biol. 7, 903–909. CrossRef PubMed CAS Google Scholar
Cordingley, M. G., Register, R. B., Callahan, P. L., Garsky, V. M. & Colonno, R. J. (1989). J. Virol. 63, 5037–5045. CAS PubMed Web of Science Google Scholar
Cornvik, T., Dahlroth, S. L., Magnosdottir, A., Herman, M. D., Knaust, R., Ekberg, M. & Nordlund, P. (2005). Nature Methods, 2, 507–509. Web of Science CrossRef PubMed CAS Google Scholar
Dyson, M. R., Shadbolt, S. P., Vincent, K. J., Perera, R. L. & McCafferty, J. (2004). BMC Biotechnol. 4, 32. Google Scholar
Fogg, M. J. et al. (2006). Acta Cryst. D62, 1196–1207. Web of Science CrossRef CAS IUCr Journals Google Scholar
Folkers, G. E., van Buuren, B. N. & Kaptein, R. (2004). J. Struct. Funct. Genomics, 5, 119–131. CrossRef PubMed CAS Google Scholar
Gao, G. F., Gerth, U. C., Wyer, J. R., Willcox, B. E., O'Callaghan, C. A., Zhangm Z., Jones, E. Y., Bell, J. I. & Jakobsen, B. K. (1998). Protein Sci. 7, 1245–1249. Google Scholar
Hammarstrom, M., Hellgren, N., van Den Berg, S., Berglund, H. & Hard, T. (2002). Protein Sci. 11, 313–321. Web of Science CrossRef PubMed CAS Google Scholar
Hart, D. & Tarendeau, F. (2006). Acta Cryst. D62, 19–26. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hartley, J. L., Temple, G. F. & Brasch, M. A. (2000). Genome Res. 10, 1788–1795. Web of Science CrossRef PubMed CAS Google Scholar
Hartman, S., Trinklein, N. D., Anton, E. D., Marticke, S. S., Nguyen, L. & Myers, R. M. (2005). Clonetechniques, Jan. 2005. https://www.clontech.com/clontech/archive/JAN05UPD/tech_notes_cloning.shtml. Google Scholar
Haun, R. S., Serventi, I. M. & Moss, J. (1992). Biotechniques, 13, 515–518. PubMed CAS Web of Science Google Scholar
Hedrén, M., Ballagi, A., Mörtsel, L., Rajkai, G., Stenmark, P., Sturesson, C. & Nordlund, P. (2006). Acta Cryst. D62, 1227–1231. Web of Science CrossRef IUCr Journals Google Scholar
Kapust, R. B. & Waugh, D. S. (1999). Protein Sci. 8, 1668–1674. Web of Science CrossRef PubMed CAS Google Scholar
Knaust, R. K. & Nordlund, P. (2001). Anal. Biochem. 297, 79–85. Web of Science CrossRef PubMed CAS Google Scholar
Lesley, S. A. et al. (2002). Proc. Natl Acad. Sci. USA, 99, 11664–11669. Web of Science CrossRef PubMed CAS Google Scholar
Lesley, S. A. & Wilson, I. A. (2005). J. Struct. Funct. Genomics, 6, 71–79. CrossRef PubMed CAS Google Scholar
Marsischky, G. & LaBaer, J. (2004). Genome Res. 14, 2020–2028. Web of Science CrossRef PubMed CAS Google Scholar
Mongkolsapaya, J., Grimes, J. M., Chen, N., Xu, X. N., Stuart, D. I., Jones, E. Y. & Screaton, G. R. (1999). Nature Struct. Biol. 6, 1048–1053. CrossRef PubMed CAS Google Scholar
Neilan, B. A. & Tillett, D. (2002). Methods Mol. Biol. 192, 125–132. PubMed CAS Google Scholar
Parks, T. D., Leuther, K. K., Howard, E. D., Johnston, S. A. & Dougherty, W. G. (1994). Anal. Biochem. 216, 413–417. CrossRef CAS PubMed Web of Science Google Scholar
Quevillon-Cheruel, S. et al. (2003). J. Synchrotron Rad. 10, 4–8. Web of Science CrossRef CAS IUCr Journals Google Scholar
Quevillon-Cheruel, S., Liger, D., Leulliot, N., Graille, M., Poupon, A., de La Sierra-Gallay, I. L., Zhou, C. Z., Collinet, B., Janin, J. & van Tilbeurgh, H. (2004). Biochimie, 86, 617–623. Web of Science CrossRef PubMed CAS Google Scholar
Shih, Y. P., Kung, W. M., Chen, J. C., Yeh, C. H., Wang, A. H. & Wang, T. F. (2002). Protein Sci. 11, 1714–1719. Web of Science CrossRef PubMed CAS Google Scholar
Smith, D. B. & Johnson, K. S. (1988). Gene, 67, 31–40. CrossRef CAS PubMed Web of Science Google Scholar
Studier, F. W. (2005). Protein Expr. Purif. 41, 207–234. Web of Science CrossRef PubMed CAS Google Scholar
Tarbouriech, N., Buisson, M., Géoui, T., Daenke, S., Cusack, S. & Burmeister, W. P. (2006). Acta Cryst. D62, 1276–1285. Web of Science CrossRef CAS IUCr Journals Google Scholar
Teplyakov, A., Obmolova, G., Bir, N., Reddy, P., Howard, A. J. & Gilliland, G. L. (2003). J. Struct. Funct. Genomics, 4, 1–10. CrossRef PubMed CAS Google Scholar
Thao, S., Zhao, Q., Kimball, T., Steven, E., Blommel, P. G., Riters, M., Newman, C. S., Fox, B. G. & Wrobel, R. L. (2004). J. Struct. Funct. Genomics, 5, 267–276. CrossRef PubMed Google Scholar
Tillett, D. & Neilan, B. A. (1999). Nucleic Acids Res. 27, e26. Web of Science CrossRef PubMed Google Scholar
Tresaugues, L., Collinet, B., Minard, P., Henckes, G., Aufrere, R., Blondeau, K., Liger, D., Zhou, C. Z., Janin, J., van Tilbeurgh, H. & Quevillon-Cheruel, S. (2004). J. Struct. Funct. Genomics, 5, 195–204. CrossRef PubMed CAS Google Scholar
Vincentelli, R., Bignon, C., Gruez, A., Canaan, S., Sulzenbacher, G., Tegoni, M., Campanacci, V. & Cambillau, C. (2003). Acc. Chem. Res. 36, 165–172. Web of Science CrossRef PubMed CAS Google Scholar
Vincentelli, R., Canaan, S., Campanacci, V., Valencia, C., Maurin, D., Frassinetti, F., Scappuccini-Calvo, L., Bourne, Y., Cambillau, C. & Bignon, C. (2004). Protein Sci. 13, 2782–2792. Web of Science CrossRef PubMed CAS Google Scholar
Vincentelli, R., Canaan, S., Offant, J., Cambillau, C. & Bignon, C. (2005). Anal. Biochem. 346, 77–84. Web of Science CrossRef PubMed CAS Google Scholar
Walhout, A. J., Temple, G. F., Brasch, M. A., Hartley, J. L., Lorson, M. A., van den Heuvel, S. & Vidal, M. (2000). Methods Enzymol. 328, 575–592. CrossRef PubMed CAS Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.