A tutorial for learning and teaching macromolecular crystallography
Five experiments have been designed to be used for teaching macromolecular crystallography. The three proteins used in this tutorial are all commercially available; they can be easily and reproducibly crystallized and mounted for diffraction data collection. For each of the five experiments the raw images and the processed data of a sample diffraction data set as well as the refined coordinates and phases are provided for teaching the steps of data processing and structure determination.
Since the beginning of the 1990s, structural biology and macromolecular crystallography in particular have advanced enormously, as can be seen from the growth statistics of the Protein Data Bank (PDB; Berman et al., 2000) shown in Fig. 1. While in the late 1980s one new macromolecular structure was determined every week worldwide, the rate has increased to about 20 new structures per day deposited with the PDB during the year 2007. In April 2008, the total content of the PDB exceeded 50 000 entries, the majority of which (> 85%) were determined by X-ray crystallographic techniques. This massive growth goes along with an increased importance of macromolecular crystallography in basic biological and medicinal research, but is hardly reflected adequately in the relevant biochemistry curricula at most universities around the world. Frequently, structural biology as a subject, or in more general terms the concept of thinking about biological problems in three dimensions, is introduced to students of biology or biochemistry at a relatively late stage in their career so that it is often too late for students to develop a deeper interest in this area. This has been realized and, consequently, there are voices advocating a much earlier introduction to the subject (Jaskólski, 2001). One of the problems, it seems, is that macromolecular crystallography taught as a lecture course tends to be a rather dry subject for students with a pure biology or biochemistry background. Since not all universities have a local macromolecular crystallography research group, access to experimental crystallography courses may be difficult or problematic. In order to address this issue, we have designed and assembled five macromolecular crystallography experiments, which can be carried out with relatively little effort. They are based on three commercially available protein samples, which can be easily and reproducibly crystallized and mounted for diffraction data collection. Should access to a data collection facility be not practical or difficult, this step can be skipped and example data sets, which are provided for each project, can be used instead. These can then be integrated and scaled using the available software packages. The resulting diffracted intensities and anomalous differences, which are also provided as part of the tutorial, can then be used for structure determination according to standard crystallographic practice.
The five experiments designed for this tutorial are (i) a sulfur single-wavelength anomalous diffraction (S-SAD) structure determination of cubic insulin, (ii) a bromide multiple-wavelength anomalous diffraction (Br-MAD) structure determination of thaumatin, (iii) a structure determination by molecular replacement using monoclinic lysozyme, (iv) the identification of bound solvent ions in lysozyme using a longer-wavelength data set and (v) the identification of a weakly bound ligand at the active site of lysozyme. These five cases (Fig. 2) represent some of the most common scenarios encountered in structural biology using macromolecular crystallographic techniques.
Insulin regulates the cellular uptake, utilization and storage of glucose, amino acids and fatty acids and inhibits the breakdown of glycogen, protein and fat. It is a two-chain polypeptide hormone produced by the β cells of pancreatic islets (Voet et al., 2006). The two chains comprise a total of 51 amino acids. Three disulfide bonds hold the two chains together: one intra-chain SS bridge between Cys6 and Cys11 in chain A, and two inter-chain SS bridges, one between Cys7 from chain A and Cys7 from chain B, and the other between Cys20 from chain A and Cys19 from chain B (Fig. 2a).
Experimental phase determination using SAD from the S atoms inherently present in nearly all protein molecules has in the past few years experienced a huge boost in popularity. After the initial success with the small protein crambin (46 amino acids, six S atoms) by Hendrickson & Teeter (1981), it took 18 years until the method was revived by Dauter et al. (1999), who were able to demonstrate by the successful structure determination of hen egg-white lysozyme (HEWL) based on the anomalous scattering of the protein S atoms and surface-bound chloride ions alone that the requirements in terms of data quality are less strict than was commonly believed and that they are indeed manageable. Since then, approximately 100 structures have been obtained using this so-called sulfur-SAD or S-SAD approach. In view of the fact that the anomalous signal from the light atoms at the typically used wavelengths in macromolecular crystallography is small, it has been suggested and experimentally verified that diffraction data collection at somewhat longer wavelengths may be beneficial (Djinović Carugo et al., 2005; Mueller-Dieckmann et al., 2005; Weiss et al., 2001). However, while a larger anomalous signal may be obtained at longer X-ray wavelengths, additional experimental complications arising mainly from X-ray absorption have to be dealt with. In this experiment, cubic crystals of bovine insulin are used for experimental phase determination using diffraction data collected at a wavelength of λ = 1.77 Å.
Thaumatin is a mixture of three intensely sweet proteins isolated from the seed vessel of the Katemfe plant (Thaumatococcus daniellii). It is about 1000 times sweeter than sucrose on a weight basis, 100 000 times on a molar basis, and is therefore used in the food industry as a sweetener. The commercially available thaumatin is a mixture of thaumatin I and thaumatin II with traces of other sweet proteins. The amino acid sequence of thaumatin contains 207 residues, where thaumatin I and II differ from one another in five amino acids only. As it is a mixture, it is hard to examine the ratio between thaumatin I and II in the crystal structure during refinement. All PDB entries for thaumatin (Fig. 2b) are therefore modelled using the thaumatin I sequence (Ko et al., 1994).
Over the past two decades, MAD has been the standard method for de novo structure determination in macromolecular crystallography (Hendrickson & Ogata, 1997; Hendrickson, 1999). In MAD, the wavelength-dependent anomalous scattering properties from heavy atoms that are part of or bound to the macromolecule of interest are utilized. The heavy atoms can be directly incorporated into the protein (e.g. seleno-methionine derivatives or metal-containing proteins) or they can be soaked into the crystal. MAD experiments are carried out at different X-ray energies around an absorption edge of the heavy atom where the anomalous scattering factors of the heavy atom are significantly different from each other. Up to four diffraction data sets are collected, at the peak wavelength where Δf′′ reaches its maximum, at the inflection point wavelength where Δf′ reaches its minimum, and away from the absorption edge at wavelengths at least 100 eV remote from the absorption edge on the high-energy and low-energy sides.
2.3. Molecular replacement of lysozyme
Lysozyme is a 129 amino acid enzyme that dissolves bacterial cell walls by catalysing the hydrolysis of 1,4-β linkages between N-acetylmuramic acid and N-acetyl-D-glucosamine residues in the peptidoglycan layer and between N-acetyl-D-glucosamine residues in chitodextrins. It is abundant in a number of secreted fluids, such as tears, saliva and mucus. Lysozyme is also present in cytoplasmic granules of the polymorphonuclear neutrophils (Voet et al., 2006). In addition, large amounts of lysozyme can be found in egg whites. The crystal structure of HEWL based on crystals belonging to the tetragonal space group P43212 was the first enzyme structure published (Blake et al., 1965). Over the years, HEWL has been crystallized in many crystal forms (for an overview, see Brinkmann et al., 2006) and has become a standard substance for methods development and teaching purposes.
In this experiment, the structure of monoclinic HEWL (Fig. 2c) is determined by molecular replacement (MR) using the structure of tetragonal HEWL as a search model. MR is a method to determine a structure in cases where a similar structure is already known. If the similar structure can be correctly oriented and positioned in the unit cell of the structure to be solved, it can be used as a starting point for phase calculation and refinement. Currently, about two-thirds of all new structures deposited with the PDB (Berman et al., 2000) are solved using MR (Long et al., 2008).
Recently, it has been observed, by examining longer-wavelength diffraction data sets collected from 23 different protein crystals, that most macromolecular crystal structures contain some sort of solvent ions (Mueller-Dieckmann et al., 2007) more or less tightly bound to their surface. This is in stark contrast to the findings in the PDB (Berman et al., 2000), where in most cases only water molecules were modelled to be bound to the surface of the crystallized macromolecules. This suggests that in many structures surface-bound ions might have mistakenly been assigned as water molecules. One way to remedy the situation is to complement every structure determination with a longer-wavelength data set, in which the presence of peaks in the anomalous difference Fourier electron density map is clearly able to distinguish water molecules from ions such as sulfate, phosphate, chloride or potassium. In this experiment, good quality anomalous differences are collected from a tetragonal crystal of HEWL at a wavelength close to the copper Kα-edge (λ = 1.5418 Å), in order to identify the complete anomalously scattering substructure of the protein (Fig. 2d).
Knowledge of the three-dimensional structure of a drug target protein is the first step in crystallographic structure-based drug design. The architecture of the active site of the target molecule can serve as the basis for the design of potential molecules, which can bind to the active site and may inhibit the enzymatic reaction of the target enzyme. The use of X-ray crystallography nowadays allows the fast and high-throughput screening of a large number of chemically different ligands, which may become potential lead compounds. For this purpose, advantage can be taken of the recent advances towards automatization and high throughput in crystallization and data collection. Data collected from crystals of a particular protein soaked with many different ligands can be quickly evaluated using the direct refinement approach, i.e. by skipping the molecular replacement step. This method is also more generally applicable and allows the identification of inhibitors, substrate or products bound to a macromolecule in order to evaluate reaction mechanisms and other things. In this experiment, tetragonal crystals of HEWL grown from HEPES buffer in the presence of 2-methyl-2,4-pentanediol (MPD) are examined in order to identify a weakly bound HEPES molecule at the active site of the enzyme (Fig. 2e).
All crystallization experiments were carried out using the hanging-drop method at room temperature (293 K) in the screw-cap crystallization plates (EasyXtal Tool) available from Nextal (now Qiagen).
Bovine insulin (Mr = 5.73 kDa, Sigma catalogue No. I5500) crystals were prepared by mixing 4 µl of protein solution (20 mg ml−1 insulin in 20 mM Na2HPO4 and 10 mM Na3EDTA pH 10.0–10.6) and 4 µl of reservoir solution containing 225–350 mM Na2HPO4/Na3PO4 (pH 10.0–10.6) and 10 mM Na3EDTA and equilibrating the drop against the reservoir solution. Crystals belonging to the cubic space group I213 with the unit-cell parameter a = 78.0 Å grew within a few days up to a final size of 100–300 µm (Fig. 3a). All chemicals used for the experiment were from Sigma, unless specified otherwise.
Thaumatin (Mr = 22.2 kDa, Sigma–Aldrich catalogue No. T-7638) crystals (Fig. 3b) grew within a few days after mixing 2 µl of protein solution [15 mg ml−1 in 0.1 M N-(2-acetamido)iminodiacetic acid (ADA) pH 6.5] and 2 µl of reservoir solution (0.1 M ADA pH 6.5, 1 M sodium/potassium tartrate) and equilibrating the drop against the reservoir. The tetragonal crystals (space group P41212) exhibit unit-cell parameters a = 57.7, c = 150.2 Å and diffract X-rays to beyond 1.5 Å resolution (Mueller-Dieckmann et al., 2005). All chemicals used for the experiment were from Sigma, unless specified otherwise.
Monoclinic HEWL (Mr = 14.6 kDa, Fluka catalogue No. 62970) crystals (Fig. 3c) were prepared according to a recipe described by Saraswathi et al. (2002) by mixing 12 µl of protein solution (20 mg ml−1 lysozyme in 50 mM sodium acetate pH 4.5) and 12 µl of reservoir solution containing 50 mM sodium acetate pH 4.5 and 4%(w/v) sodium nitrate and equilibrating the drop against the reservoir. Crystals belonging to the monoclinic space group P21 with unit-cell parameters a = 27.5, b = 62.4, c = 59.6 Å, β = 90.5° grew within a few days. All chemicals used for the experiment were from Sigma, unless specified otherwise.
Tetragonal crystals of HEWL were grown as described by Weiss et al. (2000) by mixing 4 µl of protein solution (30 mg ml−1 in water) and 4 µl of reservoir solution containing 50 mM sodium acetate pH 4.6 and 5%(w/v) sodium chloride and equilibrating the drop against the reservoir. Crystals belonging to space group P43212 and exhibiting the usual unit-cell parameters of a = 78.8, c = 37.2 Å appeared within a few days (Fig. 3d). All chemicals used for the experiment were from Sigma, unless specified otherwise.
Another tetragonal crystal form of HEWL was grown by mixing 4 µl of protein solution (30 mg ml−1 in water) and 4 µl of reservoir solution containing 50–100 mM HEPES pH 7.2 and 65–70%(v/v) MPD and equilibrating the drop against the reservoir. The crystals (space group P43212) appeared within a few days and exhibited the unit-cell parameters a = 78.7, c = 37.1 Å. All chemicals used for the experiment were from Sigma, unless specified otherwise.
Prior to the diffraction experiment the crystals were mounted in nylon loops, cryo-protected if necessary and shock-cooled in a cold nitrogen gas stream directly on the beamline. Insulin crystals were cryo-protected in a solution containing 250 mM Na2HPO4/Na3PO4 pH 10.2, 10 mM Na3EDTA and 30%(v/v) glycerol (Sigma–Aldrich). Thaumatin crystals were simultaneously derivatized and cryo-protected by soaking them for a few seconds in a solution containing 1.0 M NaBr and 25%(v/v) glycerol. Monoclinic and tetragonal HEWL crystals (NaCl form) were cryo-protected using dry paraffin oil (Fluka catalogue No. 76235), and tetragonal HEWL crystals (MPD form) were used directly without further cryo-protection. The diffraction data from the five crystals were collected on the beamlines X12 (EMBL Hamburg, c/o DESY, Hamburg, Germany), BL14.1 and BL14.2 (Freie Universität Berlin and BESSY, Berlin-Adlershof, Germany). Beamlines X12 and BL14.1 are equipped with a MAR-Mosaic CCD detector (225 mm) mounted on a MARdtb goniometer system, and beamline BL14.2 is equipped with a smaller MARCCD (165 mm), which is also mounted on a MARdtb.
The wavelengths for data collection were chosen on the basis of the requirements of the project. For the S-SAD experiment on insulin, data were collected at a slightly longer wavelength of λ = 1.77 Å in order to enhance the anomalous signal of the S atoms for the subsequent structure determination steps. The same rationale was applied to the experiment on the identification of a weakly bound ligand in tetragonal lysozyme (λ = 1.70 Å). The weak but significant anomalous signal from the S atom of HEPES helps to identify the bound ligand with confidence. The four wavelengths at which the thaumatin MAD data were collected were determined on the basis of an X-ray fluorescence scan carried out at the bromine K-absorption edge. The scan (see example picture in the accompanying tutorial) can be evaluated manually or automatically using the program CHOOCH (Evans & Pettifer, 2001). The wavelengths chosen for data collection were those at the peak of the bromine absorption edge where Δf′′ is maximal, at the inflection point where Δf′ is minimal, and away from the absorption edge at wavelengths on the high-energy and low-energy sides of the absorption edge. Even though most MAD data collections nowadays consist of three or fewer wavelengths, we decided to include all four wavelengths in the experiment. It is mostly the data set collected at the low-energy side of the edge that is omitted, because it provides close to no additional signal for phasing. Nevertheless, it is instructive to see that the low-energy remote data set is often the one of the highest quality, because the lack of X-ray fluorescence results in a much reduced background in the diffraction images. The wavelengths for the two remaining experiments were chosen arbitrarily. With λ = 1.00 Å (for the molecular replacement experiment on monoclinic lysozyme) and λ = 1.54 Å (for the experiment on the identification of bound solvent ions), they merely reflect a typical synchrotron and a typical home source diffraction experiment, respectively.
All data sets were indexed, integrated and scaled using the program XDS (Kabsch, 1993). For teaching purposes, the detailed protocol employed for data processing is described in the following, although much of this information can be gathered from the XDS program manual. XDS needs only one input file called XDS.INP. XDS is able to recognize compressed images, but in the file XDS.INP the image name given must not include the zipping-format extension. Furthermore, XDS has a very limited string length (80 characters) to describe the path to the images. Before running XDS, the file XDS.INP has to be edited so that it contains the correct data collection parameters. To let the program automatically determine the space group and the cell parameters, the space-group number in XDS.INP should be set to 0 and the parameter JOBS to XYCORR INIT COLSPOT IDXREF. The results of the indexing procedure and the relevant parameters can then be found in the output file IDXREF.LP. For insulin, for instance, the correct space group is I213 (space-group number 199) with the unit-cell parameter a = 78.0 Å. After the determination of the space group and the cell parameters, all images are integrated in a second XDS run with the space-group number set to the correct number and the parameter JOBS to DEFPIX XPLAN INTEGRATE CORRECT. If the cell parameters and the space group are known before the first run one can run XDS with JOBS = ALL, which will make a second run obsolete. After integration, scaling and various corrections, the output file CORRECT.LP contains the statistics for the complete data set. A file named XDS_ASCII.HKL is written out, which contains all the integrated and scaled reflections. The program XSCALE carries out the final scaling and merging steps and writes out a *.ahkl file, which can be converted with XDSCONV for use within the CCP4 suite (Collaborative Computational Project, Number 4, 1994) or other programs. XDSCONV creates an input file F2MTZ.INP for the final conversion to the binary mtz format used by CCP4. Alternatively, the XDS_ASCII.HKL file can be converted using the program COMBAT (Collaborative Computational Project, Number 4, 1994) and the resulting mtz file can be used as input file for the CCP4 program SCALA (Collaborative Computational Project, Number 4, 1994) for scaling and merging. A detailed description of the data processing including the relevant lines of the input files is provided in the tutorials, which can be downloaded from http://www.embl-hamburg.de/Xray_Tutorial and http://www.mx.bessy.de/bessy-ws/bessy.html . The data processing statistics obtained with XDS for the various data sets are given in Table 2.
Because the focus of the workshop for which this tutorial was originally designed was on crystal mounting, data collection and processing, the actual structure determination was carried out automatically mainly to validate the success of the diffraction experiment. For this purpose, the respective protocols of the EMBL-Hamburg automated crystal structure determination platform Auto-Rickshaw (Panjikar et al., 2005) were utilized. Auto-Rickshaw can be accessed from outside EMBL under http://www.embl-hamburg.de/AutoRickshaw/ (a free of charge registration may be required; please follow the instructions on the web page). Of course, the tutorial may also be used with the emphasis placed on teaching the process of structure determination. In such cases, the subsequent steps of a structure determination can be performed the conventional manual way and the numbers provided here may simply serve as a rough guide.
For the first two experiments (S-SAD on insulin and Br-MAD on thaumatin) phases need to be derived from the measured anomalous differences. This is typically carried out as a sequence of steps. The first step is the calculation of ΔF values in the case of SAD or FA values in the case of MAD. This step can be performed, for instance, with the program SHELXC (Sheldrick et al., 2001; Sheldrick, 2008). SHELXC will also write an input file for the subsequent program SHELXD (Schneider & Sheldrick, 2002), which is often used to determine the anomalously scattering substructure. SHELXD uses a dual-space approach for that purpose. Since there is a random element in starting a job, it can be run many times (a typical number of cycles is 100, but it is often worthwhile to run up to 10 000 cycles). A correct solution can then be identified by looking at the two correlation coefficients CC(All) and CC(weak). Values of 30.0 and 15.0 probably indicate a correct solution, although it may be the case that correct solutions appear with lower values. The next step is the actual phase calculation and the improvement of the phase by density modification. This can be achieved by using the program SHELXE (Sheldrick, 2002), or programs such as MLPHARE and DM (Collaborative Computational Project, Number 4, 1994) or others. At this point, the correct hand of the substructure also has to be established. Once the phases have been determined an electron density map can be calculated, for instance inside the program COOT (Emsley & Cowtan, 2004), and displayed and examined for parts that can be interpreted in terms of an amino acid chain or a secondary structure element. With high enough resolution at hand, however, it is also possible to attempt a completely automatic density interpretation and model building using the program ARP/wARP 7.0 (Perrakis et al., 1999; Morris et al., 2002).
In experiment 3 (MR on monoclinic lysozyme), structure solution will be performed by molecular replacement. In order to make the experiment realistic the search model should ideally be derived from a different crystal form of HEWL. A good model would, for instance, be the high-resolution structure of tetragonal HEWL (PDB code 193l; Vaney et al., 1996). The model should be stripped of all water molecules and used as input to the program MOLREP (Collaborative Computational Project, Number 4, 1994; Vagin & Teplyakov, 1997) or any other molecular replacement program. Since two molecules are expected in the asymmetric unit, the program will try to orient and position the first molecule, fix this one, and then try to orient and position the second. Once this is achieved the model can be subjected to refinement using CNS (Brünger et al., 1998) or REFMAC5 (Murshudov et al., 1997).
For experiments 4 and 5 (bound solvent ions and ligand identification in HEWL), the structure can be determined by just taking another HEWL structure of the same crystal form from the PDB (Berman et al., 2000). Again, a good model would be the PDB entry 193l (Vaney et al., 1996) or the PDB entry 1dpw (Weiss et al., 2000). Since the packing of the molecules is the same and the cell dimensions are very similar, no molecular replacement needs to be performed and one can proceed directly to refinement. It is advisable, however, to first carry out rigid-body refinement at lower resolution, for instance 3.0 Å, and then the normal restrained refinement to the maximum resolution of the data collected. The phases derived from the refined models will then be used to calculate the (2Fobs–Fcalc, αcalc), the (Fobs–Fcalc, αcalc) and the (ΔF, αcalc − 90°) electron density maps, which can be displayed in COOT in order to identify the complete anomalously scattering substructure (experiment 4) or the weakly bound ligand (experiment 5). Alternatively, one may try to use the anomalous differences to attempt experimental phasing for structure determination as described for experiment 1 above.
The structure can be solved using the SAD protocol of Auto-Rickshaw run in the advanced version. The input diffraction data (file XDS_ASCII.HKL) were uploaded and then prepared and converted for use in Auto-Rickshaw using programs of the CCP4 suite. ΔF values were calculated using the program SHELXC. On the basis of an initial analysis of the data, the maximum resolution for substructure determination and initial phase calculation was set to 1.8 Å. All of the six heavy atoms requested were found using the program SHELXD with correlation coefficients CC(All) and CC(weak) of 53.3 and 32.2, respectively, and with a clear drop in occupancy after site No. 6. This indicates that the correct solution was most likely found. The correct hand for the substructure was determined using the programs ABS (Hao, 2004) and SHELXE. Initial phases were calculated after density modification using the program SHELXE and extended to 1.60 Å resolution. 90% of the model was built using the program ARP/wARP 7.0. The complete Auto-Rickshaw run in the advanced version took around 20 min. The model was then further modified using COOT and refined using REFMAC5. Panels (a) and (b) of Fig. 4 show snapshots of the final model superimposed with the anomalous difference electron density map and the experimental electron density map after density modification in SHELXE.
The structure was solved using the 3W-MAD protocol of Auto-Rickshaw using the peak, inflection point and high-energy remote-wavelength data sets and the amino acid sequence of thaumatin. The input diffraction data (file XDS_ASCII.HKL) were uploaded and then prepared and converted using programs of the CCP4 suite. FA values were calculated using the program SHELXC. On the basis of an initial analysis of the data, the maximum resolution for substructure determination and initial phase calculation was set to 2.4 Å. All of the 20 heavy atoms requested were found using the program SHELXD with correlation coefficients CC(All) and CC(weak) of 30.9 and 13.5, respectively. The correct hand for the substructure was determined using the programs ABS and SHELXE. Initial phases were calculated after density modification using the program SHELXE. 99% of the model was built using the program ARP/wARP. The model was then further modified using COOT and refined using REFMAC5. Figs. 5(a) and 5(b) show the experimental map after density modification followed by a model building step in ARP/wARP 7.0 and the model-phased electron density map after the final refinement.
The structure was solved using the MR protocol of Auto-Rickshaw with tetragonal HEWL (PDB code 193l; Vaney et al., 1996) as a starting model. The input diffraction data (file XDS_ASCII.HKL) were uploaded and then prepared and converted using programs of the CCP4 suite. The molecular replacement step was performed using MOLREP with a resolution cut-off of 4.0 Å to find the two molecules in the asymmetric unit. Despite a very high initial R factor of 73% (correlation coefficient 43%), the solution was correct, as was demonstrated by subsequent refinement. This was performed to a resolution of 3.0 Å using the program CNS in four consecutive steps: rigid-body refinement, a minimization step, B-factor refinement and a second minimization step. At this point the R and Rfree values were 24.9 and 33.5%, respectively. Further refinement was carried out in REFMAC5 using all available data to R and Rfree values of 28.3 and 31.5%. The model was completed and further modified using COOT and refined using REFMAC5. Fig. 6 shows the final electron density with some nitrate ions clearly visible.
The structure was solved using the MR protocol of Auto-Rickshaw with tetragonal HEWL (PDB code 193l; Vaney et al., 1996) as a starting model. The input diffraction data (file XDS_ASCII.HKL) were uploaded and then prepared and converted using programs of the CCP4 suite. Because the cell parameters of the model structure and the target structure were very similar, Auto-Rickshaw skipped the MOLREP step and proceeded directly to refinement. The refinement was then performed to a resolution cut-off of 3.0 Å in CNS in four consecutive steps: rigid-body refinement, a minimization step, B-factor refinement and a second minimization step. At this point the R and Rfree values were 22.6 and 32.4%, respectively. Further refinement was then carried out in REFMAC5 using all available data to R and Rfree values of 25.5 and 28.2%. The model was completed using COOT and further refined using REFMAC5. An anomalous difference Fourier electron density map using the final model phases was calculated to identify the bound Cl and Na atoms. Table 3 contains the highest peaks found in the anomalous difference Fourier electron density map (threshold 4.5σ above the mean value of the map). Nine chloride ions can be identified, in addition to the intrinsically present ten S atoms of HEWL. Fig. 7 shows the refined structure of HEWL overlaid with the anomalous map indicating the bound chloride ions.
The structure was solved using the MR protocol of Auto-Rickshaw with tetragonal HEWL (PDB code 1dpw; Weiss et al., 2000) as a starting model. The input diffraction data (XDS_ASCII.HKL) were uploaded and then prepared and converted using programs of the CCP4 suite. Because the cell parameters of the model and the target structure were very similar, Auto-Rickshaw skipped the MOLREP step and proceeded directly to refinement. The refinement was performed to a resolution cut-off of 3.0 Å in CNS in four consecutive steps: rigid-body refinement, a minimization step, B-factor refinement and a second minimization step. At this point the R and Rfree values were 22.2 and 32.8%, respectively. Further refinement was carried out in REFMAC5 using all available data to R and Rfree values of 27.4 and 31.5%. The model was completed using COOT and further refined using REFMAC5. Residual electron density was found for a bound HEPES molecule (Fig. 8). To verify the presence and the position of the ligand, an anomalous difference Fourier electron density map was calculated to identify the S atom (Fig. 8). Fig. 8 also shows the final electron density of HEPES after refinement.
The five experiments have been designed such that they can easily be carried out by beginners in the field but they are also challenging in some respect. The three proteins used for crystallization are all available commercially, and the crystallization experiments themselves usually work reliably and reproducibly.
The diffraction data collection experiments can be carried out on any X-ray source, be it a synchrotron beamline or a home source. Except for the bromide-MAD experiment, all experiments should work just as well with data collected from a home source at the Cu Kα wavelength. As a matter of fact, the wavelength for the experiment of identification of ions bound to the surface of HEWL (experiment 4) has been chosen so as to mimic data collection at a typical home source. The resolution for data collection is not very critical. At a synchrotron beamline, all crystals typically diffract much further than 1.8 Å. It is advisable, however, to choose a maximum resolution of at least 2.0 Å for the diffraction experiment, since it makes the following structure determination steps easier. The diffraction data provided with the tutorial were collected as part of a workshop on Diffraction Data Collection Using Synchrotron Radiation (http://www.embl-hamburg.de/workshops/2007/diffraction/ ). The crystals were mounted by the students of the workshop and the data were collected by them, albeit under the guidance of and supervision by the tutors. The intention was not to collect perfect data but real case data sets, which do pose some challenges for data processing and structure determination.
Data processing is mostly straightforward for all the presented cases. The only somewhat challenging project in this respect is the monoclinic HEWL data set, where the β angle is close to 90° and the b and c axes are close in length, so that mis-indexing is easily possible. During the workshop, the only data processing program used was XDS, but any other popular data processing program suite will work as well. It might even be an interesting teaching experiment to have different groups of students use different programs for processing the same data set and have them compare the results, both in terms of the resulting merging statistics and in terms of whether the structure can be determined or not. As a rough guide to what kind of merging statistics can be expected, those obtained from XDS are provided in Table 2.
Structure determination has only been attempted automatically using the Auto-Rickshaw platform (Panjikar et al., 2005), because the focus of the workshop was on crystal mounting, data collection and processing. Needless to say, for teaching purposes the processed data provided can also be used as input to any appropriate set of macromolecular crystallography programs in order to attempt the structure determination in the conventional way step by step.
In the following, some ideas are presented for carrying out the structure determinations and possibly modifying them for teaching purposes.
S-SAD has become more and more popular in the past decade. Nevertheless, the very small signal that can be obtained from the anomalous scattering of light atoms still places such an experiment at the forefront of what is currently possible with good crystals and good data collection equipment. Cubic insulin is among the successful S-SAD cases, certainly one of the easier examples. Provided that the data are properly processed, the sulfur substructure can be determined straightforwardly. Because of the relatively high resolution of the data (1.6 Å), an interpretable map should be achieved quickly. In order to increase the requirements for good data processing, the redundancy of the data set can be decreased by taking only 90, 120 or 150° instead of the full 180° of data. In doing so, structure determination experiments of increasing difficulty can be created.
Quick-soaking of protein crystals using highly concentrated halide solutions is a fast and easy way to produce heavy-atom derivatives for phase determination (Dauter & Dauter, 1999). The thaumatin crystal used for data collection was soaked for a few seconds only, in order to limit the bromide penetration into the crystal. As a consequence, structure solution is successful only when two, three or four wavelengths of the four-wavelength MAD data sets are used. Again, various combinations of wavelengths may be tried to make up experiments of increasing difficulty. For more experienced students, it may even be possible to solve the structure based on a single data set (peak or inflection point).
When crystallizing monoclinic HEWL, one has to be aware of the fact that sometimes triclinic HEWL crystals also appear in the same experiment. These cannot be distinguished from the monoclinic crystals visually, but only when they are mounted for data collection and indexed. The major difficulty in this project stems from the fact that the β angle of the crystal is close to 90° and that the b and c axes are almost equal in length. This makes the indexing of the diffraction images difficult. If this happens to be the case, different wedges of the data set may be tried for indexing. Structure solution by molecular replacement is straightforward, although high R values are obtained initially. The correctness of the solution can, however, easily be assessed by refinement. During the refinement process many triangular features will show up in the electron density maps, indicating surface-bound nitrate ions.
The NaCl form of tetragonal HEWL is probably the most widely used crystal form for teaching in macromolecular crystallography. However, one difficulty here is that the crystals tend to grow very large, which is nice from an aesthetic point of view, but which makes shock-cooling using paraffin oil as a cryoprotectant tricky. The crystals are often damaged and exhibit large diffraction spots and high mosaicity. It is therefore advisable to choose smaller crystals for data collection. Apart from that, the experiment is straightforward and poses no problems in data collection and processing. One aspect to be aware of is that the goal of the experiment is to obtain anomalous signals from light atoms. Therefore, good data quality is essential. This is achieved by collecting a full revolution (360°) around the rotation axis, which results in a highly redundant data set. Should a three- or four-circle diffractometer or kappa-geometry be available, the experiment may be modified by applying different data collection strategies. Rotating the crystal about more than one axis will result in higher data quality and therefore less redundancy will be required to achieve the same signal-to-noise ratio. If the data quality turns out to be exceptionally high, experimental phase determination may be attempted, similar to experiment 1.
This crystal form of HEWL is a little harder to grow than the NaCl form and it grows less reproducibly. Furthermore, the crystals are often grown together, as shown in Fig. 3(e). Other than that, they pose no significant difficulty. The major challenge of this experiment is to identify the ligand, which is weakly bound and probably not fully occupied. This is of course a situation that researchers are often faced with in the pharmaceutical industry, where the binding mode of lead compounds to their target protein may have to be identified although the binding may still be weak. After carefully refining the protein structure the model phases are good enough to reveal residual electron density in the (Fobs–Fcalc, αcalc) map. Additional support for the binding comes from inspecting the (2Fobs–Fcalc, αcalc) and the (ΔF, αcalc − 90°) maps as well. The occupancy of the ligand may be estimated by comparing the peak height of the protein S atoms in the anomalous difference electron density map with the peak height of the S atom of HEPES.
The tutorial presented here is not the only one existing. A number of well known courses and workshops for macromolecular crystallography have been designed around the world. Many of them are run on a regular basis. Just to name a few, there is the M2M course in Hamburg, the RapiData course in Brookhaven and the Cold Spring Harbor Crystallography School. However, all these have in common that only a limited number of students is accepted and that course material beyond lecture notes is not made publicly available. Some tutorials also exist on the Internet, such as, for instance, Bernhard Rupp's Crystallography 101 (http://www.ruppweb.org/Xray/101index.html ) or Michael Sawaya's tutorial at http://www.doe-mbi.ucla.edu/people/sawaya/ . A very good and useful compilation of such tutorials is given on the American Crystallographic Association web site (http://www.amercrystalassn.org/CrystalEducationSites.html ). However, these tutorials are geared towards giving an overview of X-ray crystallography as a technique, rather than presenting specific examples to students for self-learning. Furthermore, they do not provide the diffraction data, let alone the raw diffraction images. Of course, diffraction data are available from the PDB (http://www.rcsb.org/pdb ) and also, for instance, from the Autostruct web site (http://www.ccp4.ac.uk/autostruct/testdata/index.html ) hosted by CCP4 or from the Joint Center of Structural Genomics (JCSG; http://www.jcsg.org/datasets-info.shtml ). With the possible exception of the JCSG data, the majority of the data available from these sites are processed data. The few cases where raw diffraction images are available are left rather uncommented. Tutorials for more specialized applications in macromolecular crystallography or for a certain suite of computer programs are also available on the Web. To name them all would be beyond the scope of this contribution.
With the here-presented tutorial describing (i) an S-SAD structure determination of cubic insulin, (ii) a Br-MAD structure determination of thaumatin, (iii) a structure determination by molecular replacement using monoclinic lysozyme, (iv) the identification of bound solvent ions in lysozyme using a longer-wavelength data set and (v) the identification of a weakly bound ligand at the active site of lysozyme, we attempted to cover a broad range of macromolecular crystallography experiments. We certainly do not want to compete with any of the courses or tutorials mentioned above, but to the best of our knowledge this tutorial is unique in that it provides a complete description of a crystallographic experiment covering all aspects from crystallization, crystal mounting, data collection and processing to structure determination and interpretation. It provides not only the crystallization recipes but also the sources of all relevant materials, as well as detailed instructions to perform the experiments. In addition, example diffraction data are provided for those who do not have easy access to a diffraction facility, together with information on the various steps from data processing to structure determination. All of the computer programs used and described in the tutorial are freely available for academic users.
The entire tutorial including the raw diffraction images for all five experiments, the integrated and scaled data, the solved and refined structures, and detailed descriptions of the data processing and structure solution are available as supplementary information to this paper.1 All data can also be downloaded from the sites http://www.embl-hamburg.de/Xray_Tutorial and http://www.mx.bessy.de/bessy-ws/bessy.html and are available from the authors upon request. The tutorial has been successfully used during the workshop for which it was designed and created (see http://www.embl-hamburg.de/workshops/2007/diffraction ). Furthermore, it has been used and favourably received by about 20 individual researchers and students to date, who have learned about it by word of mouth or from presentations at the 16th Annual Meeting of the German Society of Crystallography and the XXI Congress and General Assembly of the International Union of Crystallography (Faust et al., 2008).
We acknowledge the support of the DGK (German Society of Crystallography) for their financial contribution towards a workshop during which the data were collected and the beamline staff at the BESSY synchrotron in Berlin-Adlershof (Germany) for their help in organizing the workshop.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS
Blake, C. C. F., Koenig, D. F., Mair, G. A., North, A. C. T., Philipps, D. C. & Sarma, V. R. (1965). Nature (London), 206, 757–761. CrossRef CAS PubMed Web of Science
Brinkmann, C., Weiss, M. S. & Weckert, E. (2006). Acta Cryst. D62, 349–355. Web of Science CrossRef CAS IUCr Journals
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals
Dauter, Z. & Dauter, M. (1999). J. Mol. Biol. 289, 93–101. Web of Science CrossRef PubMed CAS
Dauter, Z., Dauter, M., de La Fortelle, E., Bricogne, G. & Sheldrick, G. M. (1999). J. Mol. Biol. 289, 83–92. Web of Science CrossRef PubMed CAS
Diederichs, K. & Karplus, P. A. (1997). Nat. Struct. Biol. 4, 269–275. CrossRef CAS PubMed Web of Science
Djinović Carugo, K., Helliwell, J. R., Stuhrmann, H. & Weiss, M. S. (2005). J. Synchrotron Rad. 12, 410–419. Web of Science CrossRef IUCr Journals
Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132. Web of Science CrossRef CAS IUCr Journals
Evans, G. & Pettifer, R. F. (2001). J. Appl. Cryst. 34, 82–86. Web of Science CrossRef CAS IUCr Journals
Faust, A., Panjikar, S., Mueller, U., Parthasarathy, V., Schmidt, A., Lamzin, V. S. & Weiss, M. S. (2008). Acta Cryst. A64, C80. CrossRef IUCr Journals
Hao, Q. (2004). J. Appl. Cryst. 37, 498–499. Web of Science CrossRef CAS IUCr Journals
Hendrickson, W. A. (1999). J. Synchrotron Rad. 6, 845–851. Web of Science CrossRef CAS IUCr Journals
Hendrickson, W. A. & Ogata, C. M. (1997). Methods Enzymol. 276, 494–522. CrossRef CAS Web of Science
Hendrickson, W. A. & Teeter, M. M. (1981). Nature (London), 290, 107–113. CrossRef CAS Web of Science
Jaskólski, M. (2001). J. Appl. Cryst. 34, 371–374. Web of Science CrossRef IUCr Journals
Kabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals
Ko, T.-P., Day, J., Greenwood, A. & McPherson, A. (1994). Acta Cryst. D50, 813–825. CrossRef CAS Web of Science IUCr Journals
Long, F., Vagin, A. A., Young, P. & Murshudov, G. N. (2008). Acta Cryst. D64, 125–132. Web of Science CrossRef CAS IUCr Journals
Morris, R. J., Perrakis, A. & Lamzin, V. S. (2002). Acta Cryst. D58, 968–975. Web of Science CrossRef CAS IUCr Journals
Mueller-Dieckmann, C., Panjikar, S., Schmidt, A., Mueller, S., Kuper, J., Geerlof, A., Wilmanns, M., Singh, R. K., Tucker, P. A. & Weiss, M. S. (2007). Acta Cryst. D63, 366–380. Web of Science CrossRef CAS IUCr Journals
Mueller-Dieckmann, C., Panjikar, S., Tucker, P. A. & Weiss, M. S. (2005). Acta Cryst. D61, 1263–1272. Web of Science CrossRef CAS IUCr Journals
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals
Panjikar, S., Parthasarathy, V., Lamzin, V. S., Weiss, M. S. & Tucker, P. A. (2005). Acta Cryst. D61, 449–457. Web of Science CrossRef CAS IUCr Journals
Perrakis, A., Morris, R. J. & Lamzin, V. S. (1999). Nat. Struct. Biol. 6, 458–463. Web of Science CrossRef PubMed CAS
Saraswathi, N. T., Sankaranarayanan, R. & Vijayan, M. (2002). Acta Cryst. D58, 1162–1167. Web of Science CrossRef CAS IUCr Journals
Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779. Web of Science CrossRef CAS IUCr Journals
Sheldrick, G. M. (2002). Z. Kristallogr. 217, 644–650. Web of Science CrossRef CAS
Sheldrick, G. M. (2008). Acta Cryst. A64, 112–122. Web of Science CrossRef CAS IUCr Journals
Sheldrick, G. M., Hauptman, H. A., Weeks, C. M., Miller, R. & Uson, I. (2001). International Tables for Crystallography, Vol. F, edited by M. G. Rossmann & E. Arnold, pp. 333–351. Dordrecht: Kluwer Academic Publishers.
Vagin, A. & Teplyakov, A. (1997). J. Appl. Cryst. 30, 1022–1025. Web of Science CrossRef CAS IUCr Journals
Vaney, M. C., Maignan, S., Riès-Kautt, M. & Ducruix, A. (1996). Acta Cryst. D52, 505–517. CrossRef CAS Web of Science IUCr Journals
Voet, D., Voet, J. & Pratt, C. W. (2006). Fundamentals in Biochemistry – Life at the Molecular Level, 2nd ed. Hoboken: John Wiley and Sons Inc.
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals
Weiss, M. S., Palm, G. J. & Hilgenfeld, R. (2000). Acta Cryst. D56, 952–958. Web of Science CrossRef CAS IUCr Journals
Weiss, M. S., Sicker, T. & Hilgenfeld, R. (2001). Structure, 9, 771–777. Web of Science CrossRef PubMed CAS
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.