CCP4 study weekend\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047

The phase problem

aCentre for Biomolecular Sciences, University of St Andrews, St Andrews, Fife KY16 9ST, Scotland
*Correspondence e-mail: glt2@st-andrews.ac.uk

(Received 1 May 2003; accepted 11 August 2003)

Given recent advances in phasing methods, those new to protein crystallography may be forgiven for asking `what problem?'. As many of those attending the CCP4 meeting come from a biological background, struggling with expression and crystallization, this introductory paper aims to introduce some of the basics that will hopefully make the subsequent papers penetrable. What is the `phase' in crystallography? What is `the problem'? How can we overcome the problem? The paper will emphasize that the phase values can only be discovered through some prior knowledge of the structure. The paper will canter through direct methods, isomorphous replacement, anomalous scattering and molecular replacement. As phasing is the most acronymic realm of crystallography, MR, SIR, SIRAS, MIR, MIRAS, MAD and SAD will be expanded and explained in part. Along the way, we will meet some of the heroes of protein crystallography such as Perutz, Kendrew, Crick, Rossmann and Blow who established many of the phasing methods in the UK. It is inevitable that some basic mathematics is encountered, but this will be done as gently as possible.

1. Introduction

There are many excellent comprehensive texts on phasing methods (Blundell & Johnson, 1976[Blundell, T. L. & Johnson, L. N. (1976). Protein Crystallography. London: Academic Press.]; Drenth, 1994[Drenth, J. (1994). Principles of Protein X-ray Crystallography. Berlin: Springer-Verlag.]; Rossmann & Arnold, 2001[Rossmann, M. G. & Arnold, E. (2001). Editors. International Tables for Crystallography, Vol. F. Dordrecht: Kluwer Academic Publishers.]; Blow, 2002[Blow, D. M. (2002). Protein Crystallography for Biologists. Oxford University Press.]) so this introduction to the CCP4 Study Weekend attempts to give an overview of phasing for those new to the field. Many entering protein crystallography are from a biological background, unfamiliar with the details of Fourier summation and complex numbers. The routine incorporation of selenomethionine into proteins and the wide availability of synchrotrons means that in many cases structure solution has become press-button. This is to be welcomed, but not all structure solutions are plain sailing and it is still useful to have some understanding of what phasing is. Here, we will emphasize the importance of phases, how phases are derived from some prior knowledge of structure and look briefly at phasing methods (direct, molecular replacement and heavy-atom isomorphous replacement). In most phasing methods the aim is to preserve isomorphism, such that the only structural change upon heavy-atom substitution is local and there are no changes in unit-cell parameters or orientation of the protein in the cell. Of course, single- and multi-wavelength anomalous diffraction (SAD/MAD) experiments achieve this. Where non-isomorphism does occur, then this can be used to provide phase information and we will look at an example where non-isomorphism was used to extend phases.

In the diffraction experiment (Fig. 1[link]), we measure the intensities of waves scattered from planes (denoted by hkl) in the crystal. The amplitude of the wave |Fhkl| is proportional to the square root of the intensity measured on the detector. To calculate the electron density at a position (xyz) in the unit cell of a crystal requires us to perform the following summation over all the hkl planes, which in words we can express as: electron density at (xyz) = the sum of contributions to the point (xyz) of waves scattered from plane (hkl) whose amplitude depends on the number of electrons in the plane, added with the correct relative phase relationship or, mathematically,

[\rho (xyz) = 1/V \textstyle \sum |F_{hkl}|\exp (i\alpha_{hkl}) \exp (-2\pi i hx + ky + lz),]

where V is the volume of the unit cell and αhkl is the phase associated with the structure-factor amplitude |Fhkl|. We can measure the amplitudes, but the phases are lost in the experiment. This is the phase problem.

[Figure 1]
Figure 1
The diffraction experiment.

1.1. The importance of phases

The importance of phases in producing the correct structure is illustrated in Figs. 2[link] and 3[link]. In Fig. 2[link] three `electron-density waves' are added in a unit cell, which shows the dramatically different electron density resulting from adding the third wave with a different phase angle. In Fig. 3[link], from Kevin Cowtan's Book of Fourier (http://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html ), the importance of phases in carrying structural information is beautifully illustrated. The calculation of an `electron-density map' using amplitudes from the diffraction of a duck and phases from the diffraction of a cat results in a cat: a warning of model-bias problems in molecular replacement!

[Figure 2]
Figure 2
(a) The definition of a phase angle α. (b) The result of adding three waves, where the third wave is added with two different phase angles.
[Figure 3]
Figure 3
The importance of phases in carrying information. Top, the diffraction pattern, or Fourier transform (FT), of a duck and of a cat. Bottom left, a diffraction pattern derived by combining the amplitudes from the duck diffraction pattern with the phases from the cat diffraction pattern. Bottom right, the image that would give rise to this hybrid diffraction pattern. In the diffraction pattern, different colours show different phases and the brightness of the colour indicates the amplitude. Reproduced courtesy of Kevin Cowtan.

2. Recovering the phases

There is no formal relationship between the amplitudes and phases; the only relationship is via the molecular structure or electron density. Therefore, if we can assume some prior knowledge of the electron density or structure, this can lead to values for the phases. This is the basis for all phasing methods (Table 1[link]).

Table 1
Phasing methods

Method Prior knowledge
Direct methods ρ ≥ 0, discrete atoms
Molecular replacement Homology model
Isomorphous replacement Heavy-atom substructure
Anomalous scattering Anomalous atom substructure
   
Density modification Solvent flattening
(phase improvement) Histogram matching
  Non-crystallographic symmetry averaging
  Partial structure
  Phase extension

2.1. Direct methods

Direct methods are based on the positivity and atomicity of electron density that leads to phase relationships between the (normalized) structure factors, e.g.

[\alpha_{-{\bf h}} + \alpha_{{\bf h}'} + \alpha_{{\bf h} - {\bf h}'} \simeq 0,]

[\tan \alpha_{{\bf h}} = {{\langle E_{{\bf h}'}E_{{\bf h} - {\bf h}'} \sin (\alpha_{{\bf h}'} + \alpha_{{\bf h} - {\bf h}'})\rangle_{{\bf h}'}} \over {\langle E_{{\bf h}'}E_{{\bf h} - {\bf h}'} \cos (\alpha_{{\bf h}'} + \alpha_{{\bf h} - {\bf h}'})\rangle_{{\bf h}'}}},]

where E represents the normalized structure-factor amplitude; that is, the amplitude that would arise from point atoms at rest. Such equations imply that once the phases of some reflections are known, or can be given a variety of starting values, then the phases of other reflections can be deduced leading to a bootstrapping of phase values for all reflections. The requirement of, what is for proteins, very high resolution data (<1.2 Å) has limited the usefulness of ab initio phase determination in protein crystallography, although direct methods have been used to phase proteins up to ∼1000 atoms. This so-called Sheldrick's rule (Sheldrick, 1990[Sheldrick, G. M. (1990). Acta Cryst. A46, 467-473.]) has recently been give a structural basis with respect to proteins (Morris & Bricogne, 2003[Morris, R. J. & Bricogne, G. (2003). Acta Cryst. D59, 615-617.]). However, direct methods are used routinely to find the heavy-atom substructure, such as in Shake-and-Bake (SnB; Miller et al., 1994[Miller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). J. Appl. Cryst. 27, 613-621.]), SHELXD (Schneider & Sheldrick, 2002[Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772-1779.]) and SHARP (de La Fortelle & Bricogne, 1997[La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472-494.]), and even subsequent phase determination from the substructure with programs such as SHELXE (Debreczeni et al., 2003[Debreczeni, J. E., Bunkoczi, G., Ma, Q., Blaser, H. & Sheldrick, G. M. (2003). Acta Cryst. D59, 688-696.]) and ACORN (Foadi et al., 2000[Foadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jia-xing, Y. & Chao-de, Z. (2000). Acta Cryst. D56, 1137-1147.]).

2.2. Molecular replacement (MR)

When a homology model is available, molecular replacement can be successful, using methods first described by Michael Rossmann and David Blow (Rossmann & Blow, 1962[Rossmann, M. G. & Blow, D. M. (1962). Acta Cryst. 15, 24-31.]). As a rule of thumb, a sequence identity >25% is normally required and an r.m.s. deviation of <2.0 Å between the α C atoms of the model and the final new structure, although there are exceptions to this. Patterson methods are usually used to obtain first the orientation of the model in the new unit cell and then the translation of the correctly oriented model relative to the origin of the new unit cell (Fig. 4[link]).

[Figure 4]
Figure 4
The process of molecular replacement.

2.3. Isomorphous replacement

The use of heavy-atom substitution was invented very early on by small-molecule crystallographers to solve the phase problem; for example, the isomorphous crystals (same unit cells) of CuSO4 and CuSeO4 (Groth, 1908[Groth, P. (1908). Chemische Kristallographie. Leipzig: Engelmann.]). The changes in intensities of some classes of reflections were used by Beevers & Lipson (1934[Beevers, C. A. & Lipson, H. (1934). Proc. R. Soc. London Ser. A, 146, 570-582.]) to locate the Cu and S atoms. It was Max Perutz and John Kendrew who first applied the methods to proteins (Perutz, 1956[Perutz, M. F. (1956). Acta Cryst. 9, 867-873.]; Kendrew et al., 1958[Kendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R. G., Wyckoff, H. & Phillips, D. C. (1958). Nature (London), 181, 662-666.]) by soaking protein crystals in heavy-atom solutions to create isomorphous heavy-atom derivatives (same unit cell, same orientation of protein in cell) which gave rise to measurable intensity changes which could be used to deduce the positions of the heavy atoms (Fig. 5[link]).

[Figure 5]
Figure 5
Two protein diffraction patterns superimposed and shifted vertically relative to one another. One is from the native bovine β-lactoglubulin, one from a crystal soaked in a mercury salt solution. Note the intensity changes for certain reflections and the identical unit cells suggesting isomorphism. (Photo courtesy of Dr Lindsay Sawyer).

In the case of a single isomorphous replacement (SIR) experiment, the contribution of the heavy-atom replacement to the structure-factor amplitude and phases is best illustrated on an Argand diagram (Fig. 6[link]). The amplitudes of a reflection are measured for the native crystal, |FP|, and for the derivative crystal, |FPH|. The isomorphous difference, |FH| ≃ |FPH| − |FP|, can be used as an estimate of the heavy-atom structure-factor amplitude to determine the heavy-atom positions using Patterson or direct methods. Once located, the heavy-atom parameters (xyz positions, occupancies and Debye–Waller thermal factors B) can be refined and used to calculate a more accurate |FH| and its corresponding phase αH. The native protein phase, αP, can be estimated using the cosine rule (Fig. 7[link]), leading to two possible solutions symmetrically distributed about the heavy-atom phase.

[Figure 6]
Figure 6
Argand digram for SIR. |FP| is the amplitude of a reflection for the native crystal and |FPH| for the derivative crystal.
[Figure 7]
Figure 7
Estimation of native protein phase for SIR. αP = αH + cos−1 [(FPH2FP2FH2)/2FPFH].

This phase ambiguity is better illustrated in the Harker construction (Fig. 8[link]). The two possible phase values occur where the circles intersect. The problem then arises as to which phase to choose. This requires a consideration of phase probabilities.

[Figure 8]
Figure 8
Harker construction for SIR.

3. Phase probability

In reality, there are errors associated with the measurements of the structure factors and in the heavy-atom positions and their occupancies such that the vector triangle seldom closes. David Blow and Francis Crick introduced the concept of lack of closure () and its use in defining a phase probability (Blow & Crick, 1959[Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794-802.]) (Fig. 9[link]). Making the assumption that all the errors reside in FPH(calc) and that errors follow a Gaussian distribution, the probability of a phase having a certain value is then

[P(\alpha_{P}) \propto \exp (-\varepsilon^{2}/2E^{2}), ]

where E = 〈[FPH(obs) − FPH(calc)]2〉.

[Figure 9]
Figure 9
Phase probability (Blow & Crick, 1959[Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794-802.]). The lack of closure = |FPH(obs)| − |FPH(calc)| = |FPH(obs)| − |[|FP|exp(iαP) + |FH|exp(iαH)]|.

Most phasing programs calculate such a probability from 0 to 360° in 10° intervals, say, to produce a phase probability distribution whose shape can be represented by four coefficients of a polynominal, the so-called Hendrickson–Lattman coefficients HLA, HLB, HLC and HLD (Hendrickson & Lattman, 1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]). Blow and Crick also showed that an electron-density map calculated with a weighted amplitude representing the centroid of the phase distribution gave the least error. Fig. 10[link] shows the phase probability distribution for one reflection from an SIR experiment. The centroid of the distribution is denoted by Fbest, whose amplitude is the native amplitude |FP| weighted by the figure of merit, m, which represents the cosine of the phase error. Modern phasing programs now use maximum-likelihood methods to derive phase probability distributions, as described in Read (2003[Read, R. J. (2003). Acta Cryst. D59, 1891-1902.]).

[Figure 10]
Figure 10
Phase probability for one reflection in a SIR experiment. Fbest is the centroid of the distribution. The map calculated with |Fbest|exp(iαbest) [or m|FP|exp(iαbest), where m is the figure of merit, 〈cos Δα〉] has least error. m = 0.23 implies a 76° error.

Fig. 11[link] shows the electron density of part of the unit cell of the sialidase from Salmonella typhimurium (Crennell et al., 1993[Crennell, S. J., Garman, E. F., Laver, W. G., Vimr, E. R. & Taylor, G. L. (1993). Proc. Natl Acad. Sci. USA, 90, 9852-9856.]) phased on a single mercury derivative. Although the protein–solvent boundary is partly evident, the electron density remains uninterpretable.

[Figure 11]
Figure 11
(a) A 2.6 Å SIR electron-density map with the final α-carbon trace of the structure superimposed. ρ(x) = (1/V)[\textstyle \sum]m|FP|exp(iαbest)exp(−2πih·x). (b) A small section of the map with the final structure sumperimposed.

The use of more than one heavy-atom derivative in multiple isomorphous replacement (MIR) can break the phase ambiguity, as shown in Fig. 12[link]. The phase probability is obtained by multiplying the individual phase probabilities, as shown in Fig. 13[link] for the same reflection as in Fig. 10[link], but this time three heavy-atom derivatives have resulted in a sharp unimodal distribution with a concomitantly high figure of merit.

[Figure 12]
Figure 12
Harker diagram for MIR with two heavy-atom derivatives.
[Figure 13]
Figure 13
Phase probability for one reflection in a MIR experiment. (a) One derivative. (b) Three derivatives. P(αP) ∝ [\textstyle \prod_{i=1}^{\rm No.\,\,of \,\,derivatives}\exp(-\varepsilon_{i}^{2}/2E_{i}^{2})].

4. Phase improvement

It is rare that experimentally determined phases are sufficiently accurate to give a completely interpretable electron-density map. Experimental phases are often only the starting point for phase improvement using a variety of methods of density modification, which are also based on some prior knowledge of structure. Solvent flattening, histogram matching and non-crystallographic averaging are the main techniques used to modify electron density and improve phases (Fig. 14[link]). Solvent flattening is a powerful technique that removes negative electron density and sets the value of electron density in the solvent regions to a typical value of 0.33 e Å−3, in contrast to a typical protein electron density of 0.43 e Å−3. Automatic methods are used to define the protein–solvent boundary, first developed by Wang (1985[Wang, B.-C. (1985). Methods Enzymol. 115, 90-112.]) and then extended into reciprocal space by Leslie (1988[Leslie, A. G. W. (1988). In Proceedings of the CCP4 Daresbury Study Weekend. Warrington: Daresbury Laboratory.]). Histogram matching alters the values of electron-density points to concur with an expected distribution of electron-density values. Non-crystallographic symmetry averaging imposes equivalence on electron-density values when more than one copy of a molecule in present in the asymmetric unit. These methods are encoded into programs such as DM (Cowtan & Zhang, 1999[Cowtan, K. D. & Zhang, K. Y. (1999). Prog. Biophys. Mol. Biol. 72, 245-270.]), RESOLVE (Terwilliger, 2002[Terwilliger, T. C. (2002). Acta Cryst. D58, 1937-1940.]) and CNS (Brünger et al., 1998[Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921.]). Density-modification techniques will not turn a bad map into a good one, but they will certainly improve promising maps that show some interpretable features.

[Figure 14]
Figure 14
Density-modification techniques. (a) Solvent flattening uses automated methods to define the protein–solvent boundary and then modify the solvent electron density to be of equal value. (b) Histogram matching redefines the values of electron-density points in a map to confer to an expected distribution of electron-density values. (c) Non-crystallographic symmetry averaging imposes identical electron-density values to points related by local symmetry, in this case a trimer of ducks that forms the asymmetric unit. The local NCS symmetry operators relating points in duck A to ducks B and C are shown.

Density modification is often a cyclic procedure, involving back-transformation of the modified electron-density map to give modified phases, recombination of these phases with the experimental phases (so as not to throw away experimental reality) and calculation of a new map which is then modified and so the cycle continues until convergence. Such methods can also be used to provide phases beyond the resolution for which experimental phases information is available, assuming higher resolution native data have been collected. In such cases, the modified map is back-transformed to a slightly higher resolution on each cycle to provide new phases for higher resolution reflections. The process is illustrated in Fig. 15[link].

[Figure 15]
Figure 15
Phase improvement by density modification.

An example of the application of solvent flattening and histogram matching using DM is shown in Fig. 16[link] for the S. typhimurium sialidase phased using three derivatives.

[Figure 16]
Figure 16
(a) 2.6 Å MIR electron density. (b) Electron density after solvent flattening and histogram matching in DM. The solvent envelope determined by DM is shown in green.

4.1. Anomalous scattering

The atomic scattering factor has three components: a normal scattering term that is dependent on the Bragg angle and two terms that are not dependent on scattering angle, but on wavelength. These latter two terms represent the anomalous scattering that occurs at the absorption edge when the X-ray photon energy is sufficient to promote an electron from an inner shell. The dispersive term reduces the normal scattering factor, whereas the absorption term is 90° advanced in phase. This leads to a breakdown in Friedel's law, giving rise to anomalous differences that can be used to locate the anomalous scatterers. Fig. 17[link] shows the variation in anomalous scattering at the K edge of selenium and Fig. 18[link] the breakdown of Friedel's law.

[Figure 17]
Figure 17
Variation in anomalous scattering at the K edge of selenium.
[Figure 18]
Figure 18
Breakdown of Friedel's law when an anomlaous scatterer is present. f(θ, λ) = f0(θ) + [f'](λ) + i[f''](λ). Fhkl[F_{\overline hkl}]. ΔF± = |FPH(+)| − |FPH(−)| is the Bijvoet difference.

The anomalous or Bijvoet difference can be used in the same way as the isomorphous difference in Patterson or direct methods to locate the anomalous scatterers. Phases for the native structure factors can then be derived in a similar way to the SIR or MIR case. Anomalous scattering can be used to break the phase ambiguity in a single isomorphous replacement experiment, leading to SIRAS (single isomorphous replacement with anomalous scattering). Note that because of the 90° phase advance of the [f'] term, anomalous scattering provides orthogonal phase information to the isomorphous term. In Fig. 19[link], there are two possible phase values symmetrically located about [f'] and two possible phase values symmetrically located about FH. For completeness, the use of multiple isomorphous heavy-atom replacement using anomalous scattering is termed MIRAS.

[Figure 19]
Figure 19
Harker construction for SIRAS.

4.2. MAD

Isomorphous replacement has several problems: non-isomorphism between crystals (unit-cell changes, reorientation of the protein, conformational changes, changes in salt and solvent ions), problems in locating all the heavy atoms, problems in refining heavy-atom positions, occupancies and thermal parameters and errors in intensity measurements. The use of the multiwavelength anomalous diffraction (MAD) method overcomes the non-isomorphism problems. Data are collected at several wavelengths, typically three, in order to maximize the absorption and dispersive effects. Typically, wavelengths are chosen at the absorption, [f''], peak (λ1), at the point of inflection on the absorption curve (λ2), where the dispersive term (which is the derivative of the [f''] curve) has its minimum, and at a remote wavelength (λ3 and/or λ4). Fig. 20[link] shows a typical absorption curve for an anomalous scatterer, together with the phase and Harker diagrams.

[Figure 20]
Figure 20
MAD phasing. (a) Typical absorption curve for an anomalous scatterer. (b) Phase diagram. |FP| is not measured, so one of the λs is chosen as the `native'. (c) Harker construction.

The changes in structure-factor amplitudes arising from anomalous scattering are generally small and require accurate measurement of intensities. The actual shape of the absorption curve must be determined experimentally by a fluorescence scan on the crystal at the synchrotron, as the environment of the anomalous scatterers can affect the details of the absorption. There is a need for excellent optics for accurate wavelength setting with minimum wavelength dispersion. Generally, all data are collected from a single frozen crystal with high redundancy in order to increase the statistical significance of the measurements and data are collected with as high a completeness as possible. The signal size can be estimated using equations similar to those derived by Crick and Magdoff for isomorphous changes (Fig. 21[link]), which also shows a predicted signal for the case of two Se atoms in 200 amino acids, calculated using Ethan Merritt's web-based calculator (http://www.bmsc.washington.edu/scatter/AS_index.html ). Note that the signal increases with resolution owing to the fall-off of normal scattering with resolution.

[Figure 21]
Figure 21
Estimation of signal size. The expected Bijvoet diffraction ratio r.m.s.(ΔF±)/r.m.s.(|F|) ≃ (NA/2NT)1/2(2[f''_{A}]/Zeff). The expected dispersive ration r.m.s.(ΔFΔλ)/r.m.s.(|F|) ≃ (NA/2NT)1/2[[f''_{A}](λi) − [f''_{A}](λj)|]/Zeff). NA is the number of anomalous scatterers, NT the total number of atoms in the structure and Zeff is the normal scattering power for all atoms (6.7 e at 2θ = 0).

An example of MAD phasing is shown in Fig. 22[link]. In this example of an archael chromatin modelling protein, Alba (Wardleworth et al., 2002[Wardleworth, B. N., Russell, R. J., Bell, S. D., Taylor, G. L. & White, M. F. (2002). EMBO J. 21, 4654-4662.]), the protein was expressed in a Met strain of Escherichia coli and the single methionine was replaced with selenomethionine. Data were collected at three wavelengths around the Se K edge with a 12-fold redundancy to 3.0 Å on the ESRF beamline ID14-4. There were two monomers of 10 kDa in the asymmetric unit and SOLVE was used to determine the Se-atom positions and derive phases. RESOLVE was used to apply density modification to improve the phases.

[Figure 22]
Figure 22
(a) 3.0 Å electron-density map of Alba after SOLVE. (b) After RESOLVE, with the final α-carbon trace of the structure superimposed.

4.3. SAD

It is becoming increasingly possible to collect data at just a single wavelength, typically at the absorption peak, and use density-modification protocols to break the phase ambiguity and provide interpretable maps (Fig. 23[link]). This so-called SAD (single-wavelength anomalous diffraction) method is described in Dodson (2003[Dodson, E. (2003). Acta Cryst. D59, 1958-1966.]).

[Figure 23]
Figure 23
Harker construction for SAD. ΔF± is used to find the substructure of anomalous scatterers, followed by phasing and phase improvement.

5. Cross-crystal averaging

Protein crystallography is not a black-box technique for every protein; there are still challenges to be had in cases where MAD or SAD techniques cannot be used to derive a high-resolution map. On occasion, two or more crystal forms of a protein are available: low-resolution phases may be known for one crystal form, but high-resolution data for another crystal form may be available. Cross-crystal averaging involves mapping the electron density from the one unit cell into the other; phases can then be derived for the new crystal form and through averaging of density between crystal forms and possibly phase extension as part of a density-modification procedure, one can bootstrap the phases to high resolution. The procedure is outlined in Fig. 24[link].

[Figure 24]
Figure 24
Cross-crystal averaging. Two crystal forms of the same protein for which phase information to low resolution in known for one form (left) and high-resolution data but no phase information is known for another form (right).

One example of the power of cross-crystal averaging is that of Newcastle disease virus haemagglutinin-neuraminidase (HN), whose structure solution was plagued with non-isomorphism problems (Crennell et al., 2000[Crennell, S., Takimoto, T., Portner, A. & Taylor, G. (2000). Nature Struct. Biol. 7, 1068-1074.]). Native crystals from the same crystallization drop could have significantly different unit-cell parameters. The protein was derived from virus grown in embryonated chickens' eggs, so SeMet methods were out of the question. Most heavy-atom derivatives were non-isomorphous with native crystals and with one another. A platinum derivative was found that gave a clear peak in an anomalous Patterson, which resulted in an attempt at MAD phasing, but the signal was just too small with one possibly not fully occupied Pt atom in 100 kDa. The P212121 unit cell had dimensions that varied as follows: a = 70.7–74.5, b = 71.8–87.0, c = 194.6–205.4 Å. In the end, cross-crystal averaging was used to bootstrap from a poor uninterpretable 6.0 Å MIR map out to a clearly interpretable 2.0 Å map (Fig. 25[link]). Four data sets were chosen for cross-crystal averaging in DMMULTI, chosen on the following criteria: (i) they were as non-isomphous as possible to one another and (ii) they were to as high a resolution as possible. These were a pH 7.0 room-temperature data set to 2.8 Å (a = 73.3, b = 78.0, c = 202.6 Å), for which MIR phases were available to 6.0 Å, a pH 6 room-temperature data set to 3.0 Å (a = 72.0, b = 83.9, c = 201.6 Å), a pH 4.6 frozen data set to 2.5 Å (a = 71.7, b = 77.9, c = 198.2 Å) and a pH 4.6 frozen data set to 2.0 Å (a = 72.3, b = 78.1, c = 199.4 Å). The power of the methods lies in the fact that the different unit cells are sampling the molecular transform in different places. Like most things, the idea is not new, and was indeed used by Bragg and Perutz in the early days of haemoglobin (Bragg & Perutz, 1952[Bragg, L. & Perutz, M. F. (1952). Proc. R. Soc. London Ser. A, 213, 425-435.]), where they altered the unit cell of the crystals by controlled dehydration in order to sample the one-dimensional transform of the molecules in the unit cell. This paper is worth a read, if only for the wonderful inclusion of random test data in the form of train times between London and Cambridge!

[Figure 25]
Figure 25
Cross-crystal averaging in haemagglutinin-neuraminidase (HN). (a) The unit cell showing the 6.0 Å MIR map derived from eight heavy-atom derivatives contoured at 2.0σ, revealing two blobs corresponding to the two molecules in the asymmetric unit. (b) A section of the 2.0 Å map after phase extension and cross-crystal averaging over four non-isomorphous data sets.

6. Conclusion

The phase problem is fundamental and will never go away; however, its solution is now fairly routine thanks to SAD and MAD. The major problems in protein crystallography are now in the molecular biology, protein expression and crystallization, but perhaps most of all in interpreting the biological implications of structure which, after all, is where the fun starts.

Acknowledgements

I have been privileged to have received any understanding of phasing I possess from some excellent teachers. In particular, I would like to thank Stephen Neidle, Tom Blundell and Ian Tickle. I would like to thank Ethan Merritt for allowing me to reproduce graphs from his web site in Figs. 17[link], 20[link] and 21[link].

References

First citationBeevers, C. A. & Lipson, H. (1934). Proc. R. Soc. London Ser. A, 146, 570–582.  CrossRef CAS
First citationBlow, D. M. (2002). Protein Crystallography for Biologists. Oxford University Press.
First citationBlow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802. CrossRef CAS IUCr Journals Web of Science
First citationBlundell, T. L. & Johnson, L. N. (1976). Protein Crystallography. London: Academic Press.
First citationBragg, L. & Perutz, M. F. (1952). Proc. R. Soc. London Ser. A, 213, 425–435.  CrossRef CAS
First citationBrünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals
First citationCowtan, K. D. & Zhang, K. Y. (1999). Prog. Biophys. Mol. Biol. 72, 245–270. Web of Science CrossRef PubMed CAS
First citationCrennell, S. J., Garman, E. F., Laver, W. G., Vimr, E. R. & Taylor, G. L. (1993). Proc. Natl Acad. Sci. USA, 90, 9852–9856. CrossRef CAS PubMed Web of Science
First citationCrennell, S., Takimoto, T., Portner, A. & Taylor, G. (2000). Nature Struct. Biol. 7, 1068–1074. CrossRef PubMed CAS
First citationDebreczeni, J. E., Bunkoczi, G., Ma, Q., Blaser, H. & Sheldrick, G. M. (2003). Acta Cryst. D59, 688–696. Web of Science CrossRef CAS IUCr Journals
First citationDodson, E. (2003). Acta Cryst. D59, 1958–1966. Web of Science CrossRef CAS IUCr Journals
First citationDrenth, J. (1994). Principles of Protein X-ray Crystallography. Berlin: Springer–Verlag.
First citationFoadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jia-xing, Y. & Chao-de, Z. (2000). Acta Cryst. D56, 1137–1147. Web of Science CrossRef CAS IUCr Journals
First citationGroth, P. (1908). Chemische Kristallographie. Leipzig: Engelmann.
First citationHendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136–143. CrossRef CAS IUCr Journals
First citationKendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R. G., Wyckoff, H. & Phillips, D. C. (1958). Nature (London), 181, 662–666. CrossRef PubMed CAS Web of Science
First citationLa Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472–494.
First citationLeslie, A. G. W. (1988). In Proceedings of the CCP4 Daresbury Study Weekend. Warrington: Daresbury Laboratory.
First citationMiller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). J. Appl. Cryst. 27, 613–621. CrossRef CAS Web of Science IUCr Journals
First citationMorris, R. J. & Bricogne, G. (2003). Acta Cryst. D59, 615–617. Web of Science CrossRef CAS IUCr Journals
First citationPerutz, M. F. (1956). Acta Cryst. 9, 867–873. CrossRef CAS IUCr Journals Web of Science
First citationRead, R. J. (2003). Acta Cryst. D59, 1891–1902. Web of Science CrossRef CAS IUCr Journals
First citationRossmann, M. G. & Arnold, E. (2001). Editors. International Tables for Crystallography, Vol. F. Dordrecht: Kluwer Academic Publishers.
First citationRossmann, M. G. & Blow, D. M. (1962). Acta Cryst. 15, 24–31. CrossRef CAS IUCr Journals Web of Science
First citationSchneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779. Web of Science CrossRef CAS IUCr Journals
First citationSheldrick, G. M. (1990). Acta Cryst. A46, 467–473. CrossRef CAS Web of Science IUCr Journals
First citationTerwilliger, T. C. (2002). Acta Cryst. D58, 1937–1940. Web of Science CrossRef CAS IUCr Journals
First citationWang, B.-C. (1985). Methods Enzymol. 115, 90–112. CrossRef CAS PubMed
First citationWardleworth, B. N., Russell, R. J., Bell, S. D., Taylor, G. L. & White, M. F. (2002). EMBO J. 21, 4654–4662. Web of Science CrossRef PubMed CAS

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds