CCP4 study weekend
The phase problem
^{a}Centre for Biomolecular Sciences, University of St Andrews, St Andrews, Fife KY16 9ST, Scotland
^{*}Correspondence email: glt2@standrews.ac.uk
Given recent advances in phasing methods, those new to protein crystallography may be forgiven for asking `what problem?'. As many of those attending the CCP4 meeting come from a biological background, struggling with expression and crystallization, this introductory paper aims to introduce some of the basics that will hopefully make the subsequent papers penetrable. What is the `phase' in crystallography? What is `the problem'? How can we overcome the problem? The paper will emphasize that the phase values can only be discovered through some prior knowledge of the structure. The paper will canter through
and As phasing is the most acronymic realm of crystallography, MR, SIR, SIRAS, MIR, MIRAS, MAD and SAD will be expanded and explained in part. Along the way, we will meet some of the heroes of protein crystallography such as Perutz, Kendrew, Crick, Rossmann and Blow who established many of the phasing methods in the UK. It is inevitable that some basic mathematics is encountered, but this will be done as gently as possible.Keywords: phasing; the phase problem.
1. Introduction
There are many excellent comprehensive texts on phasing methods (Blundell & Johnson, 1976; Drenth, 1994; Rossmann & Arnold, 2001; Blow, 2002) so this introduction to the CCP4 Study Weekend attempts to give an overview of phasing for those new to the field. Many entering protein crystallography are from a biological background, unfamiliar with the details of Fourier summation and complex numbers. The routine incorporation of selenomethionine into proteins and the wide availability of synchrotrons means that in many cases structure solution has become pressbutton. This is to be welcomed, but not all structure solutions are plain sailing and it is still useful to have some understanding of what phasing is. Here, we will emphasize the importance of phases, how phases are derived from some prior knowledge of structure and look briefly at phasing methods (direct, and heavyatom isomorphous replacement). In most phasing methods the aim is to preserve isomorphism, such that the only structural change upon heavyatom substitution is local and there are no changes in unitcell parameters or orientation of the protein in the cell. Of course, single and multiwavelength anomalous diffraction (SAD/MAD) experiments achieve this. Where nonisomorphism does occur, then this can be used to provide phase information and we will look at an example where nonisomorphism was used to extend phases.
In the diffraction experiment (Fig. 1), we measure the intensities of waves scattered from planes (denoted by hkl) in the crystal. The amplitude of the wave F_{hkl} is proportional to the square root of the intensity measured on the detector. To calculate the electron density at a position (xyz) in the of a crystal requires us to perform the following summation over all the hkl planes, which in words we can express as: electron density at (xyz) = the sum of contributions to the point (xyz) of waves scattered from plane (hkl) whose amplitude depends on the number of electrons in the plane, added with the correct relative phase relationship or, mathematically,
where V is the volume of the and α_{hkl} is the phase associated with the structurefactor amplitude F_{hkl}. We can measure the amplitudes, but the phases are lost in the experiment. This is the phase problem.
1.1. The importance of phases
The importance of phases in producing the correct structure is illustrated in Figs. 2 and 3. In Fig. 2 three `electrondensity waves' are added in a which shows the dramatically different electron density resulting from adding the third wave with a different phase angle. In Fig. 3, from Kevin Cowtan's Book of Fourier (https://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html ), the importance of phases in carrying structural information is beautifully illustrated. The calculation of an `electrondensity map' using amplitudes from the diffraction of a duck and phases from the diffraction of a cat results in a cat: a warning of modelbias problems in molecular replacement!
2. Recovering the phases
There is no formal relationship between the amplitudes and phases; the only relationship is via the molecular structure or electron density. Therefore, if we can assume some prior knowledge of the electron density or structure, this can lead to values for the phases. This is the basis for all phasing methods (Table 1).

2.1. Direct methods
e.g.
are based on the positivity and atomicity of electron density that leads to phase relationships between the (normalized) structure factors,where E represents the normalized structurefactor amplitude; that is, the amplitude that would arise from point atoms at rest. Such equations imply that once the phases of some reflections are known, or can be given a variety of starting values, then the phases of other reflections can be deduced leading to a bootstrapping of phase values for all reflections. The requirement of, what is for proteins, very high resolution data (<1.2 Å) has limited the usefulness of ab initio phase determination in protein crystallography, although have been used to phase proteins up to ∼1000 atoms. This socalled Sheldrick's rule (Sheldrick, 1990) has recently been give a structural basis with respect to proteins (Morris & Bricogne, 2003). However, are used routinely to find the heavyatom such as in ShakeandBake (SnB; Miller et al., 1994), SHELXD (Schneider & Sheldrick, 2002) and SHARP (de La Fortelle & Bricogne, 1997), and even subsequent phase determination from the with programs such as SHELXE (Debreczeni et al., 2003) and ACORN (Foadi et al., 2000).
2.2. (MR)
When a homology model is available, ). As a rule of thumb, a sequence identity >25% is normally required and an r.m.s. deviation of <2.0 Å between the α C atoms of the model and the final new structure, although there are exceptions to this. are usually used to obtain first the orientation of the model in the new and then the translation of the correctly oriented model relative to the origin of the new (Fig. 4).
can be successful, using methods first described by Michael Rossmann and David Blow (Rossmann & Blow, 19622.3. Isomorphous replacement
The use of heavyatom substitution was invented very early on by smallmolecule crystallographers to solve the _{4} and CuSeO_{4} (Groth, 1908). The changes in intensities of some classes of reflections were used by Beevers & Lipson (1934) to locate the Cu and S atoms. It was Max Perutz and John Kendrew who first applied the methods to proteins (Perutz, 1956; Kendrew et al., 1958) by soaking protein crystals in heavyatom solutions to create isomorphous heavyatom derivatives (same same orientation of protein in cell) which gave rise to measurable intensity changes which could be used to deduce the positions of the heavy atoms (Fig. 5).
for example, the (same unit cells) of CuSOIn the case of a single ). The amplitudes of a reflection are measured for the native crystal, F_{P}, and for the derivative crystal, F_{PH}. The isomorphous difference, F_{H} ≃ F_{PH} − F_{P}, can be used as an estimate of the heavyatom structurefactor amplitude to determine the heavyatom positions using Patterson or Once located, the heavyatom parameters (xyz positions, occupancies and Debye–Waller thermal factors B) can be refined and used to calculate a more accurate F_{H} and its corresponding phase α_{H}. The native protein phase, α_{P}, can be estimated using the cosine rule (Fig. 7), leading to two possible solutions symmetrically distributed about the heavyatom phase.
(SIR) experiment, the contribution of the heavyatom replacement to the structurefactor amplitude and phases is best illustrated on an Argand diagram (Fig. 6This phase ambiguity is better illustrated in the Harker construction (Fig. 8). The two possible phase values occur where the circles intersect. The problem then arises as to which phase to choose. This requires a consideration of phase probabilities.
3. Phase probability
In reality, there are errors associated with the measurements of the structure factors and in the heavyatom positions and their occupancies such that the vector triangle seldom closes. David Blow and Francis Crick introduced the concept of lack of closure (∊) and its use in defining a phase probability (Blow & Crick, 1959) (Fig. 9). Making the assumption that all the errors reside in F_{PH(calc)} and that errors follow a Gaussian distribution, the probability of a phase having a certain value is then
where E = 〈[F_{PH(obs)} − F_{PH(calc)}]^{2}〉.
Most phasing programs calculate such a probability from 0 to 360° in 10° intervals, say, to produce a phase probability distribution whose shape can be represented by four coefficients of a polynominal, the socalled Hendrickson–Lattman coefficients HLA, HLB, HLC and HLD (Hendrickson & Lattman, 1970). Blow and Crick also showed that an electrondensity map calculated with a weighted amplitude representing the centroid of the phase distribution gave the least error. Fig. 10 shows the phase probability distribution for one reflection from an SIR experiment. The centroid of the distribution is denoted by F_{best}, whose amplitude is the native amplitude F_{P} weighted by the figure of merit, m, which represents the cosine of the phase error. Modern phasing programs now use methods to derive phase probability distributions, as described in Read (2003).
Fig. 11 shows the electron density of part of the of the sialidase from Salmonella typhimurium (Crennell et al., 1993) phased on a single mercury derivative. Although the protein–solvent boundary is partly evident, the electron density remains uninterpretable.
The use of more than one heavyatom derivative in multiple . The phase probability is obtained by multiplying the individual phase probabilities, as shown in Fig. 13 for the same reflection as in Fig. 10, but this time three heavyatom derivatives have resulted in a sharp unimodal distribution with a concomitantly high figure of merit.
(MIR) can break the phase ambiguity, as shown in Fig. 124. Phase improvement
It is rare that experimentally determined phases are sufficiently accurate to give a completely interpretable electrondensity map. Experimental phases are often only the starting point for phase improvement using a variety of methods of density modification, which are also based on some prior knowledge of structure. Solvent flattening, histogram matching and noncrystallographic averaging are the main techniques used to modify electron density and improve phases (Fig. 14). Solvent flattening is a powerful technique that removes negative electron density and sets the value of electron density in the solvent regions to a typical value of 0.33 e Å^{−3}, in contrast to a typical protein electron density of 0.43 e Å^{−3}. Automatic methods are used to define the protein–solvent boundary, first developed by Wang (1985) and then extended into by Leslie (1988). Histogram matching alters the values of electrondensity points to concur with an expected distribution of electrondensity values. averaging imposes equivalence on electrondensity values when more than one copy of a molecule in present in the These methods are encoded into programs such as DM (Cowtan & Zhang, 1999), RESOLVE (Terwilliger, 2002) and CNS (Brünger et al., 1998). Densitymodification techniques will not turn a bad map into a good one, but they will certainly improve promising maps that show some interpretable features.
Density modification is often a cyclic procedure, involving backtransformation of the modified electrondensity map to give modified phases, recombination of these phases with the experimental phases (so as not to throw away experimental reality) and calculation of a new map which is then modified and so the cycle continues until convergence. Such methods can also be used to provide phases beyond the resolution for which experimental phases information is available, assuming higher resolution native data have been collected. In such cases, the modified map is backtransformed to a slightly higher resolution on each cycle to provide new phases for higher resolution reflections. The process is illustrated in Fig. 15.
An example of the application of solvent flattening and histogram matching using DM is shown in Fig. 16 for the S. typhimurium sialidase phased using three derivatives.
4.1. Anomalous scattering
The shows the variation in at the K edge of selenium and Fig. 18 the breakdown of Friedel's law.
has three components: a normal scattering term that is dependent on the and two terms that are not dependent on scattering angle, but on wavelength. These latter two terms represent the that occurs at the when the Xray photon energy is sufficient to promote an electron from an inner shell. The dispersive term reduces the normal scattering factor, whereas the absorption term is 90° advanced in phase. This leads to a breakdown in Friedel's law, giving rise to anomalous differences that can be used to locate the anomalous scatterers. Fig. 17The anomalous or Bijvoet difference can be used in the same way as the isomorphous difference in Patterson or , there are two possible phase values symmetrically located about and two possible phase values symmetrically located about F_{H}. For completeness, the use of multiple isomorphous heavyatom replacement using is termed MIRAS.
to locate the anomalous scatterers. Phases for the native structure factors can then be derived in a similar way to the SIR or MIR case. can be used to break the phase ambiguity in a single experiment, leading to SIRAS (single with anomalous scattering). Note that because of the 90° phase advance of the term, provides orthogonal phase information to the isomorphous term. In Fig. 194.2. MAD
λ_{1}), at the point of inflection on the absorption curve (λ_{2}), where the dispersive term (which is the derivative of the curve) has its minimum, and at a remote wavelength (λ_{3} and/or λ_{4}). Fig. 20 shows a typical absorption curve for an anomalous scatterer, together with the phase and Harker diagrams.
has several problems: nonisomorphism between crystals (unitcell changes, reorientation of the protein, conformational changes, changes in salt and solvent ions), problems in locating all the heavy atoms, problems in refining heavyatom positions, occupancies and thermal parameters and errors in intensity measurements. The use of the method overcomes the nonisomorphism problems. Data are collected at several wavelengths, typically three, in order to maximize the absorption and dispersive effects. Typically, wavelengths are chosen at the absorption, , peak (The changes in structurefactor amplitudes arising from ), which also shows a predicted signal for the case of two Se atoms in 200 amino acids, calculated using Ethan Merritt's webbased calculator (https://www.bmsc.washington.edu/scatter/AS_index.html ). Note that the signal increases with resolution owing to the falloff of normal scattering with resolution.
are generally small and require accurate measurement of intensities. The actual shape of the absorption curve must be determined experimentally by a fluorescence scan on the crystal at the synchrotron, as the environment of the anomalous scatterers can affect the details of the absorption. There is a need for excellent optics for accurate wavelength setting with minimum Generally, all data are collected from a single frozen crystal with high redundancy in order to increase the statistical significance of the measurements and data are collected with as high a completeness as possible. The signal size can be estimated using equations similar to those derived by Crick and Magdoff for isomorphous changes (Fig. 21An example of . In this example of an archael chromatin modelling protein, Alba (Wardleworth et al., 2002), the protein was expressed in a Met^{−} strain of Escherichia coli and the single methionine was replaced with selenomethionine. Data were collected at three wavelengths around the Se K edge with a 12fold redundancy to 3.0 Å on the ESRF beamline ID144. There were two monomers of 10 kDa in the and SOLVE was used to determine the Seatom positions and derive phases. RESOLVE was used to apply density modification to improve the phases.
is shown in Fig. 224.3. SAD
It is becoming increasingly possible to collect data at just a single wavelength, typically at the absorption peak, and use densitymodification protocols to break the phase ambiguity and provide interpretable maps (Fig. 23). This socalled SAD (singlewavelength anomalous diffraction) method is described in Dodson (2003).
5. Crosscrystal averaging
Protein crystallography is not a blackbox technique for every protein; there are still challenges to be had in cases where MAD or SAD techniques cannot be used to derive a highresolution map. On occasion, two or more crystal forms of a protein are available: lowresolution phases may be known for one crystal form, but highresolution data for another crystal form may be available. Crosscrystal averaging involves mapping the electron density from the one .
into the other; phases can then be derived for the new crystal form and through averaging of density between crystal forms and possibly phase extension as part of a densitymodification procedure, one can bootstrap the phases to high resolution. The procedure is outlined in Fig. 24One example of the power of crosscrystal averaging is that of Newcastle disease virus haemagglutininneuraminidase (HN), whose structure solution was plagued with nonisomorphism problems (Crennell et al., 2000). Native crystals from the same crystallization drop could have significantly different unitcell parameters. The protein was derived from virus grown in embryonated chickens' eggs, so SeMet methods were out of the question. Most heavyatom derivatives were nonisomorphous with native crystals and with one another. A platinum derivative was found that gave a clear peak in an anomalous Patterson, which resulted in an attempt at but the signal was just too small with one possibly not fully occupied Pt atom in 100 kDa. The P2_{1}2_{1}2_{1} had dimensions that varied as follows: a = 70.7–74.5, b = 71.8–87.0, c = 194.6–205.4 Å. In the end, crosscrystal averaging was used to bootstrap from a poor uninterpretable 6.0 Å MIR map out to a clearly interpretable 2.0 Å map (Fig. 25). Four data sets were chosen for crosscrystal averaging in DMMULTI, chosen on the following criteria: (i) they were as nonisomphous as possible to one another and (ii) they were to as high a resolution as possible. These were a pH 7.0 roomtemperature data set to 2.8 Å (a = 73.3, b = 78.0, c = 202.6 Å), for which MIR phases were available to 6.0 Å, a pH 6 roomtemperature data set to 3.0 Å (a = 72.0, b = 83.9, c = 201.6 Å), a pH 4.6 frozen data set to 2.5 Å (a = 71.7, b = 77.9, c = 198.2 Å) and a pH 4.6 frozen data set to 2.0 Å (a = 72.3, b = 78.1, c = 199.4 Å). The power of the methods lies in the fact that the different unit cells are sampling the molecular transform in different places. Like most things, the idea is not new, and was indeed used by Bragg and Perutz in the early days of haemoglobin (Bragg & Perutz, 1952), where they altered the of the crystals by controlled dehydration in order to sample the onedimensional transform of the molecules in the This paper is worth a read, if only for the wonderful inclusion of random test data in the form of train times between London and Cambridge!
6. Conclusion
The
is fundamental and will never go away; however, its solution is now fairly routine thanks to SAD and MAD. The major problems in protein crystallography are now in the molecular biology, protein expression and crystallization, but perhaps most of all in interpreting the biological implications of structure which, after all, is where the fun starts.Acknowledgements
I have been privileged to have received any understanding of phasing I possess from some excellent teachers. In particular, I would like to thank Stephen Neidle, Tom Blundell and Ian Tickle. I would like to thank Ethan Merritt for allowing me to reproduce graphs from his web site in Figs. 17, 20 and 21.
References
Beevers, C. A. & Lipson, H. (1934). Proc. R. Soc. London Ser. A, 146, 570–582. CrossRef CAS Google Scholar
Blow, D. M. (2002). Protein Crystallography for Biologists. Oxford University Press. Google Scholar
Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802. CrossRef CAS IUCr Journals Web of Science Google Scholar
Blundell, T. L. & Johnson, L. N. (1976). Protein Crystallography. London: Academic Press. Google Scholar
Bragg, L. & Perutz, M. F. (1952). Proc. R. Soc. London Ser. A, 213, 425–435. CrossRef CAS Google Scholar
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., GrosseKunstleve, R. W., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals Google Scholar
Cowtan, K. D. & Zhang, K. Y. (1999). Prog. Biophys. Mol. Biol. 72, 245–270. Web of Science CrossRef PubMed CAS Google Scholar
Crennell, S. J., Garman, E. F., Laver, W. G., Vimr, E. R. & Taylor, G. L. (1993). Proc. Natl Acad. Sci. USA, 90, 9852–9856. CrossRef CAS PubMed Web of Science Google Scholar
Crennell, S., Takimoto, T., Portner, A. & Taylor, G. (2000). Nature Struct. Biol. 7, 1068–1074. CrossRef PubMed CAS Google Scholar
Debreczeni, J. E., Bunkoczi, G., Ma, Q., Blaser, H. & Sheldrick, G. M. (2003). Acta Cryst. D59, 688–696. Web of Science CrossRef CAS IUCr Journals Google Scholar
Dodson, E. (2003). Acta Cryst. D59, 1958–1966. Web of Science CrossRef CAS IUCr Journals Google Scholar
Drenth, J. (1994). Principles of Protein Xray Crystallography. Berlin: Springer–Verlag. Google Scholar
Foadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jiaxing, Y. & Chaode, Z. (2000). Acta Cryst. D56, 1137–1147. Web of Science CrossRef CAS IUCr Journals Google Scholar
Groth, P. (1908). Chemische Kristallographie. Leipzig: Engelmann. Google Scholar
Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136–143. CrossRef CAS IUCr Journals Google Scholar
Kendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R. G., Wyckoff, H. & Phillips, D. C. (1958). Nature (London), 181, 662–666. CrossRef PubMed CAS Web of Science Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472–494. Google Scholar
Leslie, A. G. W. (1988). In Proceedings of the CCP4 Daresbury Study Weekend. Warrington: Daresbury Laboratory. Google Scholar
Miller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). J. Appl. Cryst. 27, 613–621. CrossRef CAS Web of Science IUCr Journals Google Scholar
Morris, R. J. & Bricogne, G. (2003). Acta Cryst. D59, 615–617. Web of Science CrossRef CAS IUCr Journals Google Scholar
Perutz, M. F. (1956). Acta Cryst. 9, 867–873. CrossRef CAS IUCr Journals Web of Science Google Scholar
Read, R. J. (2003). Acta Cryst. D59, 1891–1902. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rossmann, M. G. & Arnold, E. (2001). Editors. International Tables for Crystallography, Vol. F. Dordrecht: Kluwer Academic Publishers. Google Scholar
Rossmann, M. G. & Blow, D. M. (1962). Acta Cryst. 15, 24–31. CrossRef CAS IUCr Journals Web of Science Google Scholar
Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sheldrick, G. M. (1990). Acta Cryst. A46, 467–473. CrossRef CAS Web of Science IUCr Journals Google Scholar
Terwilliger, T. C. (2002). Acta Cryst. D58, 1937–1940. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wang, B.C. (1985). Methods Enzymol. 115, 90–112. CrossRef CAS PubMed Google Scholar
Wardleworth, B. N., Russell, R. J., Bell, S. D., Taylor, G. L. & White, M. F. (2002). EMBO J. 21, 4654–4662. Web of Science CrossRef PubMed CAS Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.