Pi sampling: a methodical and flexible approach to initial macromolecular crystallization screening

Pi sampling, derived from the incomplete factorial approach, is an effort to maximize the diversity of macromolecular crystallization conditions and to facilitate the preparation of 96-condition initial screens.


Introduction
A crucial aspect of macromolecular crystallographic studies is finding suitable conditions for the crystallization of a sample. This can be difficult because many factors alter the crystallization behaviour of macromolecules, including the type and the concentration of the chemicals employed to formulate the conditions (McPherson, 1990). A condition includes at least a precipitant and most conditions also include a buffer and an additive. During the initial crystallization experiments, the structure of the macromolecule is not known and hence the most efficient formulation cannot be predicted. As a consequence, one should be cautious when making initial assumptions and limiting choices in subsequent optimizations (Rupp, 2003). Nonetheless, the number of initial crystallization conditions cannot be unreasonably large since purified protein is often difficult and expensive to produce in large quantities.
There are essentially two approaches to restrict an initial screen to a limited number of crystallization conditions. Firstly, a sparse-matrix formulation can be used, which consists of an empirically derived combination of components based on known or published crystallization conditions (Jancarik & Kim, 1991). Secondly, an incomplete factorial formulation can be generated in which selected components are combined to prepare new conditions in accordance with principles of randomization and balance (Carter & Carter, 1979). Numerous commercial screens based on these two main approaches are available. Automated systems have been implemented at the Medical Research Council (MRC) Laboratory of Molecular Biology (LMB) to test these as routine initial screens using the 96-well crystallization plate format (Stock et al., 2005). However, for various reasons, many laboratories opt for a minimal screen (Kimber et al., 2003) and still perform at least some aspects of the work manually (Bergfors, 2007).
Here, we present a development based on the incomplete factorial formulation: the Pi sampling method. The name of the method was inspired by the story of Archimedes, who used the 'method of exhaustion' (i.e. an empirical approach) with a 96-sided polygon in order to reach the first good numerical approximation of (Smith, 1958). Pi sampling uses modular arithmetic to form combinations of three stock solutions across a 96-condition grid. Maximally diverse conditions can be produced by taking into account the properties of the chemicals used in the formulation and the concentrations of the corresponding stock solutions. We have implemented this approach in a web-based application called Pi Sampler: user input consists of the details of up to 36 stock solutions, from which the application generates the formulations for a 96-condition screen. The Pi sampling method is intended to help laboratories to test new crystallization-screen formulations on a day-to-day basis based on the properties of the macromolecules investigated, as has been performed previously with RNA (Doudna et al., 1993).
Firstly, we tested Pi sampling with ten commercially available soluble proteins. For this, the 'Pi minimal screen' was employed including a wide variety of well known chemicals frequently used for macromolecular crystallization.
We then investigated the impact of Pi sampling on the crystallization of a G-protein-coupled receptor (GPCR) that had been difficult to crystallize: the adenosine A 2A receptor (construct A 2A R-GL31). We formulated another Pi screen, the 'Pi-PEG screen', taking into consideration general observations made about crystallization of integral membrane-protein samples. Previous crystallization experiments on another GPCR (the 1 -adrenergic receptor) had indicated that the use of simple proprietary screens formulated with poly(ethylene glycol) (PEG) and buffers gave a greater yield of crystals than all commercially available screens, including those geared towards membrane proteins (Warne et al., 2009), and the 2.7 Å resolution structure was solved using conditions optimized from a proprietary screen essentially based on PEGs (Warne et al., 2008). This has been observed previously with other membrane-protein targets (Lemieux et al., 2003). In addition, mixtures of polyethylene glycols have been used successfully to develop a minimal screen (Brzozowski & Walton, 2001) and to study crystal structures of the Kir potassium channel (Clarke et al., 2010). Such mixtures were incorporated into the Pi-PEG screen.  Pi sampling: combinations of stock solutions from three different sets (see also http://pisampler.mrc-lmb.cam.ac.uk/).

Figure 2
Pi sampling: combinations of the stock solutions in a 96-condition plate layout (well A1 is at the top left corner). Each solution of set 1 (ID 1-12) is seen in the eight conditions forming a column of the plate. The Á values of set 1 increase from left to right in the screen layout. The positions of the solutions A-L (set 2) shift across five columns and down one row (Á values not represented). The positions of solutions M-X (set 3) shift across ten columns and down one row. Gradients of concentration for sets 2 and 3 are represented on the left and right, respectively.

Pi sampling
Pi sampling begins with up to 36 stock solutions, divided into three sets of 12. The first set of solutions is used in the screen at constant concentration. The second and third sets are added according to a gradient between specified minimum and maximum concentrations. Typically, the first set is composed of buffers and the second and third sets are precipitants/ additives.
The combinations of three stock solutions (one from each set) are generated according to Fig. 1, where 1-12 refer to the IDs for solutions of the first set, A-M to those of the second set and N-X to those of the third set. The number in each cell shows which solution of the first set will be combined with the corresponding solutions of the second and third sets. Blank spaces show when no such combinations are generated. Fig. 2 summarizes the distribution of the stock solutions in a standard 96-condition plate layout (i.e. 12 columns and eight rows).
Set 1: each solution (ID 1-12) is seen in the eight conditions forming a column of the plate. A variable Á should be associated with the stock solutions. The variable Á corresponds to a property of the solution selected (e.g. pH, molecular weight of the main chemical, absorption properties or others). Á values increase from left to right in the screen layout.
Set 2: each solution (ID A-L) is represented once in each row. The final concentrations decrease gradually from the top to the bottom of the screen layout, forming a gradient. The distribution of solutions A-L is based on the sequence of Á values established for set 1: the positions of the solutions shift across five columns and down one row. Solutions A-L should also be associated with a variable Á and hence a sequence is formed for the distribution of the third set of solutions.
Set 3: each solution (ID M-X) is also represented once in each row. The final concentrations increase gradually from the top to the bottom of the screen, forming another gradient. The solutions M-X are distributed with the same modulo arithmetic operation as previously, but with respect to the Á values of solutions A-L. For example, solution M is mixed with solution A in the first row, solution F in the second row, solution K in the third row and so on, as shown in Fig. 2. This means that both the second and third sets are arranged according to the same modulo arithmetic operation (5 modulo 12); however, when looking at the plate layout, the positions of solutions M-X shift across ten columns and down one row.

Pi Sampler
Pi Sampler can be accessed via the internet at http:// pisampler.mrc-lmb.cam.ac.uk/. Users can enter the details of up to 36 stock solutions, including stock concentrations, desired screen concentration ranges and Á values. The application then generates a 96-condition screen formulation following the Pi sampling method described above. Formulations, recipes and total required volumes of stock solutions are presented and may conveniently be downloaded in comma-separated variable format (CSV), allowing the user to import them into other software for automated screen making (Cox & Weber, 1987), formulation analysis (Hedderich et al., 2011) and data mining (Kantardjieff & Rupp, 2004). The parameters used to generate the screen can also be saved and uploaded in the same format. Further details and instructions can be found on the website.

Pi minimal screen preparation and crystallization assays with commercially available soluble proteins
The final formulation of the Pi minimal screen can be found in Table 1. There are 36 starting stock solutions overall. Each solution composing the first set (ID 1-12) is a mixture of an acid with its corresponding base (e.g. HEPES pH 7.5: 1 M HEPES solution mixed with 1 M HEPES sodium salt in order to reach pH 7.5), except for buffer 11 (AMPD mixed with Tris base). Note that this is also true for the precipitant phosphate (phosphate system: sodium dihydrogen phosphate/dipotassium hydrogen phosphate). Values of pH (4.0-9.5) were chosen as the variable Á for the first set, whilst arbitrary values were chosen for additives of various natures composing the second set (ID A-L). Eventually, a few conditions were made without additive/buffer because of chemical incompatibilities (Table 1).
Highest purity grade chemicals (Molecular Biology grade when available) were purchased from Sigma-Aldrich to prepare 36 stock solutions. The solutions were mixed in 96 Falcon tubes. The screen was dispensed into 'MRC original plates' (96-well, two-drop, Swissci; Stock et al., 2005).
Commercial proteins that had been crystallized before were chosen to prepare test samples. Protein concentrations were chosen randomly between 7 and 150 mg ml À1 (Table 2). Vapour-diffusion experiments were set up at 295 K, mixing two different sample: condition ratios (1:3 and 3:1) to give a final volume of 400 nl. The plates were then stored at 291 K. A condition was considered to be a hit when at least one of the two corresponding drops contained crystals with well known morphology after one week. Table 3 shows the 'hits per condition' observed and the corresponding results expected for the binomial distribution (see x4.2).

Results
There were 116 crystallization hits overall for the experiments with the Pi minimal screen (Table 2). Some conditions produced hits for several samples ( Table 3).
The Pi-PEG screen yielded crystals that diffracted to 3.0 Å resolution for A 2A R-GL31 with bound agonist. Fig. 3      conditions is then accentuated using a number of different concentrations of solutions (Fig. 2). If the first and second sets of solutions are ordered according to physico-chemical properties, the generated screen will be an incomplete factorial sampling of interactions between chemicals with these properties. If the chemicals selected have completely different natures, they can be arranged randomly (see x2.3).
The ordering of the third set of solutions can be used to avoid obvious chemical incompatibilities (e.g. mixing phosphate and magnesium salts). It is also possible to design simpler screens with only two sets of stock solutions.

The Pi minimal screen
In order to check the homogeneity of the hits across the screen with the ten samples, we compared the results obtained with what would be expected if each condition had the same probability of hits overall (Table 3). This can be approximated by a binomial distribution. The probability of success for the binomial distribution is the observed probability for ten attempts: 116/ (10 Â 96) = 0.12083. The 2 statistic for the data is 3.48. This can be compared with the quantiles of a 2 distribution with two degrees of freedom, which gives a p value of 0.18 (calculations not shown). This 2 test indicates that no  conditions are obvious outliers with regard to success or failure. There are, however, a multitude of possible biases implied when proceeding with crystallization experiments (which would be even more accentuated with the use of novel samples); hence, any statistical analysis should be taken with precaution. Nonetheless, it is interesting to see that the analysis of the distribution is in accordance with the original approach based on balanced randomization (Carter & Carter, 1979;Rupp, 2003). In addition, the conditions of the Pi minimal screen show no identities to the extensive list of conditions (7230) from commercial screens stored in the 'PICKScreens' database (Hedderich et al., 2011).

The Pi-PEG screen
The extent of effects on crystallization for precipitants such as PEGs is correlated with their concentrations (McPherson, 1976) and molecular weights (Forsythe et al., 2002). The Pi-PEG screen covers a wide range of parameters (kinetics of equilibrium, protein stabilization etc.). In addition, the concentrations of the two different PEGs in a condition can be adjusted for condition optimization (Stock et al., 2005) and for crystal cryoprotection (Berejnov et al., 2006). Furthermore, the PICKScreens database shows that the Pi-PEG screen is unique (as for the Pi minimal screen; see x4.2).
Samples of A 2A R-GL31 purified in a number of different detergents rarely crystallized in commercially available screens used at the LMB (Stock et al., 2005) and when they did the crystal quality was not sufficient for structure determination. The first quality crystals were recently obtained using the Pi-PEG screen.

Conclusions
We have demonstrated that the Pi sampling is a methodical and flexible approach to initial screening for macromolecular crystallization. Two unique screens produced de novo have resulted from this strategy. The Pi minimal screen potentially has an ideal formulation for crystallization of novel soluble protein samples. The Pi-PEG screen is a tailor-made screen for GPCRs and potentially other membrane proteins generated by biasing the formulation towards components known to be essential.
Further screens can be formulated with the Pi Sampler on a day-to-day basis in order to test chemicals and techniques, with the aim of increasing the yield of quality crystals. Also, new crystallization techniques are constantly emerging for macromolecular targets such as membrane proteins and hence formulations with special considerations are required: one may want to formulate screens compatible with the lipidic cubic phase (LCP) concept (Landau & Rosenbusch, 1996) or make extensive use of detergents (Koszelak-Rosenblum et al., 2009).
In order for laboratories to be able to handle many Pi screen formulations and the flow of resulting data, we are working on the integration of Pi Sampler into the 'xtalPiMS' Laboratory Information Management System (LIMS; Morris et al., 2011; see http://www.pims-lims.org).
Thanks to Simon Byrne (Cambridge University Statistics Clinic; http://www.statslab.cam.ac.uk/clinic/) for discussions. Thanks to the LMB members Jan Lö we, John Kendrick-Jones, Christopher Aylett, Chris Tate, Jake Grimmett and Graham Lingley for various contributions. Finally, thanks to Karen Law (MRC Technology), Chris Morris (STFC, funded by CCP4) and Tanja Hedderich (Max Planck Institute). Conflicting commercial interest: we hereby state that we have a conflicting commercial interest in that MRC Technology (http://www.mrctechnology.org/) will commercialize Pi screens under an exclusive licence to Jena Bioscience (http:// www.jenabioscience.com/).