research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047

Direct methods and protein crystallography at low resolution

aDepartment of Chemistry, University of Glasgow, Glasgow G12 8QQ, Scotland
*Correspondence e-mail: chris@chem.gla.ac.uk

(Received 4 April 2000; accepted 27 June 2000)

The tools of modern direct methods are examined and their limitations for solving protein structures discussed. Direct methods need atomic resolution data (1.1–1.2 Å) for structures of around 1000 atoms if no heavy atom is present. For low-resolution data, alternative approaches are necessary and these include maximum entropy, symbolic addition, Sayre's equation, group scattering factors and electron microscopy.

1. The tools of direct methods

Direct methods have evolved from the early 1950s to become the method of choice for solving small-molecule crystal structures from diffraction data. In this context, `small' extends from ten atoms in the asymmetric unit to a small protein with 1000 or more atoms. In this section, the tools that direct methods use and their limitations are examined. This is necessarily brief; for a full description, see Giacovazzo (1998[Giacovazzo, C. (1998). Direct Phasing in Crystallography. Fundamentals and Applications. Oxford University Press.]), Woolfson & Fan (1995[Woolfson, M. M. & Fan, H.-F. (1995). Physical and Non-Physical Methods of Solving Crystal Structures. Cambridge University Press.]) or Fortier (1997[Fortier, S. (1997). Editor. Direct Methods for Solving Macromolecular Structures. Dortrecht: Kluwer.]).

1.1. Normalization

Intensity data are first normalized to give normalized structure factors or E magnitudes,

[| {E_{\bf h}}|^2 = {{k\,| {F_{\bf h}^{\rm obs}}|^2 }\over {\varepsilon _{\bf h}\textstyle \sum \limits_{j = 1}^{N_{}}{f_j^2 \exp({2B\sin ^2 \theta /\lambda ^2) }}}}, \eqno (1)]

where k is a scale factor that puts the observed intensities |Fh|2 on an absolute scale, h is the statistical weight for reflection h and fj is the atomic scattering factor for atom j. There are N atoms in the unit cell, with an overall isotropic temperature factor B. B and k need to be determined and this is carried out using Wilson's method (Wilson, 1949[Wilson, A. J. C. (1949). Acta Cryst. 2, 318-321.]). This assumes that the atoms in the unit cell are uniformly and randomly distributed and such an assumption forms the basis of Wilson statistics. Obviously, in proteins and other biological macromolecules this is not the case; at the very least, we have an ordered protein and a disordered solvent volume that really requires a different treatment. Nonetheless, it is possible to obtain useful values of k and B by this method.

The distribution of E magnitudes depends on whether the space group is centrosymmetric or not and does not depend on structural complexity. In the non-centrosymmetric case, ∼37% of the normalized structure factors are expected to be >1.0, 1.8% >2.0 and only 0.01% >3.0.

1.2. Triplets

Each structure-factor magnitude |Fh| has an asscoiated phase angle φh which we wish to determine. Triplets are the fundamental phase relationship in direct methods and they take the form

[\Phi _3 = \varphi _{\bf h} + \varphi _{\bf k} + \varphi _{- {\bf h}- {\bf k}} \simeq \,0. \eqno (2)]

It is obvious that the indices of the three reflections sum to zero. Associated with each triplet is a concentration parameter κh,k

[\kappa _{{\bf h},{\bf k}} = {{2| {E_{\bf h}E_{\bf k}E_{- {\bf h}- {\bf k}}}|}\over {N^{1/2}}}, \eqno (3)]

where N is the number of atoms assumed equal in the unit cell, excluding H atoms. Relationship (2[link]) implies a probabilistic origin and the Cochran distribution (Cochran, 1955[Cochran, W. (1955). Acta Cryst. 8, 473-478.]) gives us the required formula,

[P(\Phi _{\bf 3}| \kappa _{{\bf h},{\bf k}}) = {1 \over {2\pi I_0 ({\kappa _{{\bf h},{\bf k}}})}}\exp({\kappa _{{\bf h},{\bf k}}\cos \Phi _3 }). \eqno (4)]

I0 is a zeroth-order Bessel function of the first kind. The expression [2\pi I_0 ({\kappa _{{\bf h},{\bf k}}})] is a normalizing term. Fig. 1[link] shows how Bessel functions appropriate to direct methods behave as a function of their argument. The Cochran distribution assumes the viability of Wilson statistics. Fig. 2[link] shows how the probability distribution (4) varies with the concentration parameter. It can be seen that the mode of the distribution is always zero and that as κh,k decreases the information content of the Cochran distribution also decreases, until at κh,k = 1 very little useful information can be obtained concerning the value of the triplet. κh,k decreases as 1/N1/2. If the three E magnitudes in the triplet have values of 2.5 then this corresponds to a limit of ∼1000 atoms in the unit cell.

[Figure 1]
Figure 1
The variation of Bessel functions (a) I0(x) (dotted line) and I1(x) (full line) and (b) I1(x)/I0(x) as a function of x in the range 0–5.
[Figure 2]
Figure 2
The Cochran distribution as a function of the concentration parameter κh,k.

1.3. Quartets

Quartets are the logical extension of triplets and involve four phases instead of three,

[\Phi_{4} = \varphi_{\bf h} + \varphi_{\bf k} + \varphi_{\bf l} + \varphi_{-{\bf h} - {\bf k} - {\bf l}}. \eqno (5)]

The distribution (Schenk, 1973[Schenk, H. (1973). Acta Cryst. A29, 77-82.]; Hauptman, 1975[Hauptman, H. A. (1975). Acta Cryst. A31, 680-687.]) is more complex than that of the triplet. Defining the principal terms

[R_1 = | {E_{\bf h}} |, R_2 = | {E_{\bf k}} |, R_3 = | {E_{\bf l}} |, R_4 = | {E_{- {\bf h}- {\bf k}- {\bf l}}} | \eqno (6)]

and the unique cross-terms

[R_{12} = | {E_{{\bf h}+ {\bf k}}}|, R_{23} = | {E_{{\bf k}+ {\bf l}}}|, R_{31} = | {E_{{\bf l}+ {\bf h}}}|, \eqno (7)]

the required distribution is

[\eqalignno { P(\Phi _4 |\,R_1, R_2, &R_3, R_4, R_{12},R_{23},R_{31}) = \cr & {1 \over L}\exp ({- 2B\cos \Phi _4 })I_0 ({2N^{- 1/2}R_{12}X_{12}})\cr &\ \quad {\times}\ I_0 ({2N^{- 1/2} R_{23}X_{13}})I_0 ({2N^{- 1/2}R_{31}X_{31}}),& (8) }]

where L is a normalizing term (usually determined numerically),

[\eqalignno {& B = (2/N)R_1 R_2 R_3 R_4, \cr & X_{12} = [{R_1^2 R_2^2 + R_3^2 R_4^2 + 2R_1 R_2 R_3 R_4 \cos \Phi _4 }]^{1/2} ,\cr & X_{23} = [{R_2^2 R_3^2 + R_1^2 R_4^2 + 2R_1 R_2 R_3 R_4 \cos \Phi _4 }]^{1/2}, \cr & X_{31} = [{R_3^2 R_1^2 + R_2^2 R_4^2 + 2R_1 R_2 R_3 R_4 \cos \Phi _4 }]^{1/2}.& (9)}]

Three sorts of quartet can be identified as follows.

  • (i) Positive quartets, in which the principal and cross-terms are large. These are strongly correlated with triplets which makes them difficult to use.

  • (ii) Negative quartets, where the principal terms are large and the cross-terms are small.

  • (iii) Enantiomorph-sensitive quartets, where the principal terms are large and the cross-terms have an intermediate value. Both the negative and enantiomorph-sensitive quartets are largely independent of triplets.

Fig. 3[link] shows typical quartet distributions for these three cases using a structure with 200 equal atoms in the unit cell, all four principal terms set to 3.0 and with varying cross-terms. Note that the reliability of quartets is a function of 1/N and not 1/N1/2 as in the triplet case. This makes the use of these invariants in protein crystallography rather questionable.
[Figure 3]
Figure 3
Quartet distributions for a crystal structure with N = 200. The principal terms are given by R1 = R2 = R3 = R4 = 3.0. (a) A positive quartet with cross-terms R12 = R23 = R31 = 3.0, (b) a negative quartet with cross-terms R12 = R23 = R31 = 0.25, (c) an enantiomorph quartet with cross-terms R12 = R23 = R31 = 0.9.

1.4. The tangent formula

The tangent formula (Karle & Hauptman, 1956[Karle, J. & Hauptman, H. A. (1956). Acta Cryst. 9, 635-651.]) is a key formula in direct methods that lets us refine phase values and determine new ones. Consider the situation in which we have a series of triplets with a common reflection φh. They can be written

[\eqalignno{ & \varphi _{\bf h} = \varphi _{{\bf h}2} - \varphi _{{\bf h}- {\bf h}2} \cr & \varphi _{\bf h} = \varphi _{{\bf h}3} - \varphi _{{\bf h}- {\bf h}3} \cr & \varphi _{\bf h} = \varphi _{{\bf h}4} - \varphi _{{\bf h}- {\bf h}4}\,\,etc. & (10)}]

Consider also a situation in which all the phases on the RHS of (10[link]) are known at least approximately; the tangent formula then gives us an estimate for φh which in its simplest form is

[\tan \varphi _{\bf h} = {{\textstyle \sum\limits_{\bf k}{| {E_{\bf k}E_{{\bf h}- {\bf k}}}|\,\sin ({\varphi _{\bf k}\, + \varphi _{{\bf h}- {\bf k}}})}}\over {\textstyle \sum\limits_{\bf k}{| {E_{\bf k}E_{{\bf h}- {\bf k}}}|\,\cos ({\varphi _{\bf k}\, + \varphi _{{\bf h}- {\bf k}}})}}}. \eqno (11)]

It can be extended to include quartets of all types and various weighting schemes which help impose stability on a formula that can be prone to instabilities.

1.5. Symbolic addition

The triplets (and quartets) form a set of linear equations relating phase values. The technique of symbolic addition (Karle & Karle, 1966[Karle, J. & Karle, I. L. (1966). Acta Cryst. 21, 849-859.]) assigns algebraic symbols to a small number (typically 4–8) phases that have large associated E magnitudes and which interact strongly through phase relationships. Triplets and quartets are used to determine 50–100 new phases as functions of these symbols; this is the process of symbolic addition. The symbols are converted into numerical values using relationships between them made manifest by the symbolic addition procedure or by giving unassigned symbols permuted values in the range 0–2π. The phases are then extended and refined using either tangent refinement or the Sayre equation (see §[link]1.7).

Symbolic addition is not much used currently for solving small molecules; it has been superseded by methods that are much easier to automate. It does, however, have the virtues of stability when used when used with macromolecules at low resolution; this is explored further in §[link]3.2.

1.6. The minimal principle

The mode of the Cochran distribution for triplets is always zero. However, the mean can be computed as

[\langle \cos \Phi _3 \rangle = {\textstyle \int\limits_0^\pi} \cos \Phi _3 P(\Phi _3 | \kappa_{{\bf h},{\bf k}}) = {{ I_1 (\kappa _{{\bf h,k}})}\over {I_0 (\kappa _{{\bf h,k}})}}. \eqno (12)]

This expression gives rise to the minimal function (DeTitta et al., 1994[DeTitta, G. T., Weeks, C. M., Thuman, P., Miller, R. & Hauptman, H. A. (1994). Acta Cryst. A50, 203-210.])

[R({\Phi _3 }) = {\textstyle \sum \limits_{{\bf h},{\bf k}}} \kappa _{{\bf h},{\bf k}} \left [\cos T_{{\bf h},{\bf k}} - {{I_1 (\kappa _{{\bf h},{\bf k}})} \over {I_0 (\kappa _{{\bf h},{\bf k}})}} \right] ^2\big/ \textstyle \sum\limits_{{\bf h},{\bf k}}\kappa _{{\bf h},{\bf k}}, \eqno (13)]

where cosTh,k is the value of the triplet computed from known phases. The function R(Φ3) serves two purposes: (i) as a formula to refine and estimate new phases by minimizing the difference between the estimated value and the mean of the cosine of the triplet and (ii) as the minimal principle which uses (13) to define the best phase set, i.e. as a figure of merit.

1.7. The Sayre equation

The Sayre equation (Sayre, 1952[Sayre, D. (1952). Acta Cryst. 5, 60-65.]) is algebraic rather than probabilistic in origin and is derived from the expression for the electron density and its square,

[F_{\bf h} = (\theta / V)\textstyle \sum\limits_{\bf k}{F_{\bf k}}F_{{\bf h}- {\bf k}}. \eqno (14)]

In terms of E magnitudes this takes the form of the Sayre–Hughes (Hughes, 1953[Hughes, E. W. (1953). Acta Cryst. 6, 871.]) equation,

[E_{\bf h} = N^{1/2} \langle {E_{\bf k}E_{{\bf h}- {\bf k}}} \rangle. \eqno (15)]

The Sayre equation can be used in the same way as the tangent formula, but has a more general validity and is not constrained to use only large structure-factor magnitudes.

1.8. Figures of merit

In general, direct methods are multi-solutional: they give rise to multiple phase sets and we need to select those which are most likely to give useful structural information. Figures of merit serve this purpose and are used to rank phase sets. There are numerous such indicators, including the following. (i) The minimal function (13). An optimal phase set will have a minimum value of R(Φ3). (ii) The negative quartet figure of merit (DeTitta et al., 1975[DeTitta, G. T., Langs, D. A., Edmonds, J. W. & Duax, W. L. (1975). Acta Cryst. A31, 472-479.]),

[{\rm NQEST} = {{\textstyle \sum\limits_{\rm negative}B\cos (\varphi _{\bf h} + \varphi _{\bf k} + \varphi _{\bf l} + \varphi _{\bf - h - k - l})}\over {\textstyle \sum\limits_{\rm negative}B }}, \eqno (16)]

where the summation spans all those quartets assumed negative using (8). An optimal phase set should have a minimum value of NQEST.

Usually, several figures of merit are calculated for a given phase set and these are combined to given an overall figure called a CFOM.

1.9. Correlation coefficients

Let Eo be the observed E magnitude and Ec the calculated value from, for example, a variant of the tangent formula; let w be the associated weight. For a set of such magnitudes we can then compute the correlation coefficient CC, which takes many forms. A useful expression from Read (1986[Read, R. J. (1986). Acta Cryst. A42, 140-149.]) is

[\eqalignno {{\rm CC} &= \left (\textstyle \sum wE_o^2 E_c^2 \sum w - \sum wE_o^2 \sum wE_c^2 \right)/ \cr &\quad\ {\big (}\big\{\big[\textstyle \sum wE_o^4 \sum w - \big(\sum wE_o^2\big)^2\big ] \cr &\quad\ {\times}\ \big [\textstyle \sum wE_c^4 \sum w - \big(\sum wE_c^2\big)^2\big] \big\}^{1/2}\big ). & (17)}]

Correlation coefficients lie between −1 ≤ CC ≤ 1.0. They can be used as figures of merit.

1.10. E maps

So far our discussions have involved reciprocal-space quantities; the transform into real space is carried out using E magnitudes via E maps,

[\rho ({\bf x}) \simeq {1 \over V}\textstyle \sum\limits_{\bf h}| E_{\bf h}|\exp(i\varphi _{\bf h})\exp(- 2\pi i{\bf h}\cdot{\bf x}). \eqno (18)]

The use of E magnitudes and the limits we shall impose on the reflections entering the summation in (16) mean that the electron density is only approximate (at the very least, there are serious series-termination errors), but hopefully is sufficient to reveal structural features so that model building can begin.

1.11. Simplifying the problem

The problem of direct phasing can be simplified by the following heuristic rules.

  • (i) Only the top 8–10 Na need to be phased, where Na is the nunber of atoms in the asymmetric unit.

  • (ii) In centrosymmetric space groups, all the phases are centric with phases restricted to 0. Non-centrosymmetric space groups often have centrosymmetric projections giving rise to centric reflections which have restricted phase choices, e.g. 0, π, ±π/2.

  • (iii) We can tolerate relatively large (∼40°) random errors, but smaller systematic errors.

2. Using the tools to solve crystal structures

There are numerous procedures for solving structures via direct methods. A typical, though somewhat simplified, sequence is as follows.

  • (i) The data are normalized using Wilson's method to give E magnitudes. These are sorted in descending order.

  • (ii) Triplets are generated for the top 10Na reflections. Quartets (usually just the negative ones) are also optionally generated.

  • (iii) The top 8–10Na reflections are given random phases.

  • (iv) The phases are refined to convergence using the tangent formula in one of its many variants.

  • (v) Figures of merit are calculated for this phase set and combined together to give an overall figure of merit CFOM.

  • (vi) Steps (iii)–(v) are repeated 24–1000 times depending on the difficulty and complexity of the structure.

  • (vii) The phase sets are sorted on CFOM.

  • (viii) An E map is computed for the best set and the peaks picked. We then use our knowledge of molecular dimensions and conformations to extract a trial structure. This is the first point at which chemical knowledge is used actively, i.e. the direct-methods procedure is model-free until this point.

  • (ix) The structure is completed and refined in the usual way.

  • (x) If no identifiable fragment can be found then the next ranked phase set from step (vii) is used and steps (viii) and (ix) are repeated. This can be performed for the top ten or more phase sets.

This is shown diagrammatically in Fig. 4[link].
[Figure 4]
Figure 4
A flow chart for traditional direct methods as used to solve small-molecule structures.

2.1. What is needed for this method to work?

The procedure is usually routine if the following criteria are met.

  • (i) Atomicity. We need intensity data to a resolution of 1.1–1.2 Å.

  • (ii) Completeness. The data must be complete to this resolution.

  • (iii) Accuracy. Accurate data are required.

  • (iv) Complexity. The number of non-H atoms in the asymmetric unit should be <200.

Clearly, none of these criteria apply to most protein data sets, where a resolution of 2 Å is common, where low-angle data may be missing, where accuracy is limited by poor crystalline specimens and where the number of atoms in the asymmetric unit is several thousand.

This latter problem can be overcome using atomicity as a stronger constraint and this gives rise to the computer programs Shake-and-Bake (SnB; Weeks & Miller, 1997[Weeks, C. M. & Miller, R. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 139-146. Warrington: Daresbury Laboratory.]) and Half-Bake (Sheldrick, 1997[Sheldrick, G. M. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 147-157. Warrington: Daresbury Laboratory.]).

2.2. Shake-and-Bake (SnB)

SnB starts in a conventional way by normalizing the data and generating triplet (and optionally negative quartet) invariants. To assign initial phases, an extension of the random-phase procedure into reciprocal space is made; trial structures are generated by placing random atoms in the unit cell with distance constraints, i.e. atoms may not be closer than 1.5 Å. No angle constraints are applied. A Fourier transform gives phase values which, because of the distance constraint, tend to have lower errors than a simple random-phasing algorithm. Note that an imposition of atomicity is being invoked from the very beginning in this procedure.

The random phases are now refined using either tangent methods or a grid search based on the minimal function in which each phase is modified by a phase shift that minimizes R(Φ3). A new map is generated from these refined phases and this is subjected to a peak-search procedure (again we have atomicity) in which N peaks are selected (for an N-atom problem) subject to the same distance constraint that was used in the initial phase generation. The new peaks give new phases, which are then refined in a cyclical fashion. At convergence, R(Φ3) is stored, a new phase set is generated and the procedure is repeated.

As phase sets accumulate, one looks for a set which has a much lower value of R(Φ3) than the others. This is usually an indication of phase correctness and the atoms corresponding to this solution form the starting point of a traditional completion.

The procedure is shown in flow-chart form in Fig. 5[link].

[Figure 5]
Figure 5
A flow chart for the SnB program as applied to small proteins with atomic resolution data. (From Weeks & Miller, 1997[Weeks, C. M. & Miller, R. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 139-146. Warrington: Daresbury Laboratory.].)

2.3. Half-Bake

Half-Bake uses the ideas of SnB in a different way but still requires and imposes atomicity. Instead of the minimal function, correlation coefficients (17) and a restricted coefficient

[\textstyle \sum\limits_{E_o \gt E_{\min}}E_c^2 (E_o^2 - 1), \eqno (19)]

(where Emin is typically 1.3–1.5) are used as indicators of phase correctness. The tangent formula is used in phase refinement. Fig. 6[link] shows the flow chart for this procedure.

[Figure 6]
Figure 6
A flow chart for the Half-Bake computer program as applied to small proteins with atomic resolution data. (From Sheldrick, 1997[Sheldrick, G. M. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 147-157. Warrington: Daresbury Laboratory.].)

Both SnB and Half-Bake have solved structures with N > 1000 and have also become valuable tools in deriving heavy-atom substructures in proteins. In this case, the resolution limit can be substantially relaxed because even at 2 Å atoms such as Se are clearly resolved and the necessary atomicity is still present.

3. Solving protein structures at low resolution using direct methods

For reasons that should now be clear, there is no general solution of the phase problem at low resolution, but the following direct-methods (i.e. model-free) techniques have been explored: (i) maximum entropy, (ii) globular scattering factors, (iii) symbolic addition and (iv) electron microscopy and electron crystallography.

Other techniques such as sphere packings (Andersson, 1999[Andersson, K. M. (1999). J. Appl. Cryst. 32, 530-535.]; Andersson & Hovmöller, 1996[Andersson, K. M. & Hovmöller, S. (1996). Acta Cryst. D52, 1174-1180.]) are outside the scope of this paper and other methods are fully described elsewhere in this issue.

3.1. Maximum entropy

The maximum-entropy (ME) formalism was first applied to the phase problem by Bricogne (1984[Bricogne, G. (1984). Acta Cryst. A40, 410-445.]) and subsequently incorporated in a more general Bayesian statistical approach applied to macromolecules. For a review, see Gilmore (1986[Gilmore, C. J. (1986). Direct Methods for Solving Macromolecular Structures, edited by S. Fortier, pp. 317-321. Dortrecht: Kluwer.]). The ME method is not constrained to the use of Wilson statistics and is stable irrespective of data resolution; it is thus better able to deal with model-free ab initio structure determination at low resolution. Associated with the Bricogne formalism is likelihood as a figure of merit and this is also a resolution-independent indicator of phase correctness of great power.

For an example of the ME method applied to low-resolution electron diffraction data from membrane proteins, see Gilmore et al. (1996[Gilmore, C. J., Nicholson, W. V. & Dorset, D. L. (1996). Acta Cryst. A52, 937-946.]). In this work, two protein structures were solved in projection: Omp F porin and halorhodopsin.

3.1.1. Omp F porin

The structure of Omp F porin from the outer membrane of Escherichia coli (MW = 36 500 Da) was originally determined using images at 3.2 Å resolution by Sass et al. (1989[Sass, H. J., Büldt, G., Beckmann, E., Zemlin, F., Van Heel, M., Zeitler, E., Rosenbusch, J. P., Dorset, D. L. & Massalski, A. (1989). J. Mol. Biol. 209, 171-175.]). Their data were obtained at 100 kV from glucose-embedded samples on a liquid-helium-cooled superconducting cryomicroscope. Most of the diffracted power from these images was contained within a 6 Å limit and so ab initio phasing was carried out to this same limit. There were 42 unique reflections; the plane group of the projection is p31m with a unit cell of side a = 72 Å. The true map using the image-derived phases of Sass is shown in Fig. 7[link](a). The best map derived from ME phasing is shown in Fig. 7[link](b). At this resolution, the preferred map has a basis set mean absolute phase error of only 9°. With only minor details there is an essential correspondence with this map and that computed with all correct angles from the image data; the correlation coefficient is 0.94.

[Figure 7]
Figure 7
Potential maps for Omp F porin: (a) the true map using the image derived phases of Sass et al. (1989[Sass, H. J., Büldt, G., Beckmann, E., Zemlin, F., Van Heel, M., Zeitler, E., Rosenbusch, J. P., Dorset, D. L. & Massalski, A. (1989). J. Mol. Biol. 209, 171-175.]), (b) the best map derived from ME phasing. The mean phase error is 9° and the correlation coefficient between (a) and (b) is 0.94. (From Gilmore et al., 1996[Gilmore, C. J., Nicholson, W. V. & Dorset, D. L. (1996). Acta Cryst. A52, 937-946.].)
3.1.2. Halorhodopsin

Electron-diffraction amplitudes and electron-micrograph-derived crystallographic phases from halorhodopsin to 6 Å resolution were reported by Havelka et al. (1993[Havelka, W. A., Henderson, R., Heymann, J. A. W. & Oesterhelt, D. (1993). J. Mol. Biol. 234, 837-846.]) from frozen hydrated samples. The centrosymmetric tetragonal plane group is p4gm with unit-cell parameter a = 102 Å. Within the 6 Å resolution limit, this corresponds to 76 unique reflections. The true map is shown in Fig. 8[link](a). Using the ME method, 16 reflections to 9 Å were phased with only one incorrect indication; the corresponding map is shown in Fig. 8[link](b). The correlation coefficient between these two maps is 0.82.

[Figure 8]
Figure 8
Potential maps for halorhodopsin: (a) the true map, (b) the best ME map with only one incorrect phase indication and a correlation coefficient between (a) of 0.82. (From Gilmore et al., 1996[Gilmore, C. J., Nicholson, W. V. & Dorset, D. L. (1996). Acta Cryst. A52, 937-946.].)

3.2. Globular scattering factors

Harker (1953[Harker, D. (1953). Acta Cryst. 6, 731-736.]) discussed the problem of normalizing data via the Wilson method when its resolution was less than atomic. Wilson statistics only hold if the resolution is less than the shortest interatomic distance in the crystal. If this is not the case, then the expression

[\langle I \rangle _s = \textstyle \sum\limits_{j = 1}^N {f_j^2 } \eqno (20)]

used by Wilson (where s = sinθ/λ) has to be replaced by

[\langle I \rangle _s = \textstyle \sum\limits_g {F_g^2 }, \eqno (21)]

where Fg is a globular scattering factor. For a sphere,

[F_g (s)\, = {\textstyle \sum\limits_i^N} {f_i(s){{\sin 2\pi sr_i }\over {2\pi sr_i }}} \eqno (22)]

and for G globs in the unit cell,

[F_{\bf h}^{\rm calc} = \textstyle \sum\limits_{g = 1}^G {F_g \exp(2\pi i{\bf h}\cdot{\bf r}}_{g}). \eqno (23)]

Clearly, this reduces a cell with N atoms to one containing G globs. The associated phase relationships will reflect this by showing a large increase in the concentration parameter. This idea has been used extensively by Dorset (see, for example, Dorset & McCourt, 1999[Dorset, D. L. & McCourt, M. P. (1999). Z. Kristallogr. 214, 652-658.]) in conjunction with symbolic addition to solve a variety of structures at 10–20 Å resolution.

3.3. Globular structure factors and symbolic addition: beef liver catalase

As an example of this (and of the ME method) see Dorset & Gilmore (1999[Dorset, D. L. & Gilmore, C. J. (1999). Acta Cryst. A55, 448-456.]), which examines beef liver catalase in projection at 9 Å using room-temperature electron-diffraction data. The plane group is pgg, with unit-cell parameters a = 69.7, b = 177 Å. Both the ME formalism and symbolic addition coupled with the Sayre–Hughes equation were used. In addition to using likelihood, the Luzzati figure of merit 〈Δρ4min was also employed, where Δρ = ρ[\overline \rho] (Luzzati et al., 1972[Luzzati, V., Tardieu, A. & Taupin, D. (1972). J. Mol. Biol. 64, 269-286.]). Note that the minimum value of this figure of merit corresponds to maps with a minimum dynamic range and maximum flatness (rather like entropy); this seems intuitively reasonable under low-resolution conditions.

The results of a symbolic addition calculation in which the best map was selected viaΔρ4min are shown in Fig. 9[link](a), which has a resolution of ∼9 Å. At first sight, Fig. 9[link](b), derived from ME calculations, shows no resemblance to Fig. 9[link](a), but it is a Babinet solution. Babinet solutions are those in which all the phase angles are shifted by π, i.e. φh → π + φh, and in real space the maps are characterized as the inverse of the non-reversed one. Babinet solutions are not uncommon when phasing at low resolution in a model-free environment and care needs to be exercised. The Babinet of Fig. 9[link](b) is shown in Fig. 9[link](c) and the correspondence between this and Fig. 9[link](a) is obvious. Finally, the symbolic addition–Luzzati method is combined with the Babinet in Fig. 9[link](c) to give Fig. 9[link](d). For comparison, an image-derived solution at 23 Å using data from Akey & Edelstein (1983[Akey, C. W. & Edelstein, S. J. (1983). J. Mol. Biol. 163, 575-612.]) is shown in Fig. 10[link].

[Figure 9]
Figure 9
Potential maps for beef liver catalase at approximately 9 Å resolution: (a) using symbolic addition and selecting the map with the lowest value of 〈Δρ4〉, (b) derived from maximum-entropy calculations and selecting the map with the lowest value of 〈Δρ4〉, (c) the Babinet map of (b), (d) based on a subset of (c) using the same reflections as in (a). The crystallographic b axis is horizontal. (From Dorset & Gilmore, 1999[Dorset, D. L. & Gilmore, C. J. (1999). Acta Cryst. A55, 448-456.].)
[Figure 10]
Figure 10
Potential map for beef liver catalase from image-derived phases at 23 Å by Akey & Edelstein (1983[Akey, C. W. & Edelstein, S. J. (1983). J. Mol. Biol. 163, 575-612.]). (From Dorset & Gilmore, 1999[Dorset, D. L. & Gilmore, C. J. (1999). Acta Cryst. A55, 448-456.].)

3.4. The electron microscope and electron crystallography

The electron microscope is an invaluable tool in low-resolution imaging of biological macromolecules. It is the source of two sorts of data for crystallographic purposes.

  • (i) Phased reflections, where the phase information comes from the Fourier transform of electron-microscope images after suitable filtering. Usually the phases so derived correspond to intensities that have a significantly lower resolution than the diffraction data and there are some significant sources of error in image data arising from radiation damage, curvilinear paracrystalline distortion and transfer-function uncertainties, but they are invaluable.

  • (ii) The electron diffraction data, which are less problematic and have a higher resolution than those in (i). Clearly, to achieve optimum resolution we need to extend the image-derived phases to phase the diffraction intensities and this is a problem that direct methods can address.

Two sorts of situation arise in phase extension as follows.

  • (i) The image phase set is sufficiently large and well distributed in reciprocal space to permit an unambiguous phase-extension procedure without recourse to multi-solution methods, i.e. those that involve phase permutation.

  • (ii) When the basis set is small or in some way inadequate we have a branching problem. This problem arises when phases are selected without exploring the relevant phase space in sufficient detail, so that what appears to be an unambiguous phase choice is no such thing. The methods outlined in §[link]3 can be employed here; for a survey of electron crystallography and these problems in an ME context, see Gilmore (1996[Gilmore, C. J. (1996). Acta Cryst. A52, 561-589.]).

3.5. The use of very low resolution reflections

Traditional wisdom dictates that very low angle reflections in protein crystallography are of minimal value and their use can prevent a successful structure solution. This is effectively refuted by Andersson (1999[Andersson, K. M. (1999). J. Appl. Cryst. 32, 530-535.]) and by the results presented above where the very low order reflections played a key role. To summarize his arguments: the solvent contribution to a given reflection depends on the difference between the electron density of the solvent and that of the protein. At very low resolution, the Babinet principle means that the phases of the solvent are shifted by π relative to the protein (see Fig. 11[link]). The correlation coefficient is close to 100% to 15 Å resolution, but becomes effectively zero at less than 3 Å. This means that no bulk-solvent correction is needed when using only low-angle reflections. When mixing data at low and high resolution then the magnitude of the solvent vector depends on the solvent–protein contrast and we can write the total structure factor |Fh|total as

[| {F_{\bf h}}|^{\rm total} = | {F_{\bf h}} |^p - k_s | {F_{\bf h}} |^p \exp ({- B_s \sin ^2 \theta /\lambda ^2 }), \eqno (24)]

where ks measures the density ratio of solvent and protein and Bs is the solvent temperature factor. Thus, properly handled, there is no reason to exclude low-order data from ab intio structure determination.

[Figure 11]
Figure 11
Solvent and Babinet effects at very low resolution. |Fh|p is the contribution of the protein to the total structure factor |Fh|ps with a phase angle φh and |Fh|s is the solvent contribution with a phase, by Babinet's principle, of φh + π. (Taken from Andersson, 1999[Andersson, K. M. (1999). J. Appl. Cryst. 32, 530-535.].)

Acknowledgements

I wish to acknowledge invaluable and stimulating discussions with Klas Andersson and Doug Dorset and support from Eastman-Kodak (Rochester), EPSRC and BBSRC.

References

First citationAkey, C. W. & Edelstein, S. J. (1983). J. Mol. Biol. 163, 575–612.  CrossRef CAS PubMed Web of Science Google Scholar
First citationAndersson, K. M. (1999). J. Appl. Cryst. 32, 530–535.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAndersson, K. M. & Hovmöller, S. (1996). Acta Cryst. D52, 1174–1180.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationBricogne, G. (1984). Acta Cryst. A40, 410–445.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationCochran, W. (1955). Acta Cryst. 8, 473–478.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationDeTitta, G. T., Langs, D. A., Edmonds, J. W. & Duax, W. L. (1975). Acta Cryst. A31, 472–479.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationDeTitta, G. T., Weeks, C. M., Thuman, P., Miller, R. & Hauptman, H. A. (1994). Acta Cryst. A50, 203–210.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationDorset, D. L. & Gilmore, C. J. (1999). Acta Cryst. A55, 448–456.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDorset, D. L. & McCourt, M. P. (1999). Z. Kristallogr. 214, 652–658.  Web of Science CrossRef CAS Google Scholar
First citationFortier, S. (1997). Editor. Direct Methods for Solving Macromolecular Structures. Dortrecht: Kluwer.  Google Scholar
First citationGiacovazzo, C. (1998). Direct Phasing in Crystallography. Fundamentals and Applications. Oxford University Press.  Google Scholar
First citationGilmore, C. J. (1986). Direct Methods for Solving Macromolecular Structures, edited by S. Fortier, pp. 317–321. Dortrecht: Kluwer.  Google Scholar
First citationGilmore, C. J. (1996). Acta Cryst. A52, 561–589.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationGilmore, C. J., Nicholson, W. V. & Dorset, D. L. (1996). Acta Cryst. A52, 937–946.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationHarker, D. (1953). Acta Cryst. 6, 731–736.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationHauptman, H. A. (1975). Acta Cryst. A31, 680–687.  CrossRef IUCr Journals Web of Science Google Scholar
First citationHavelka, W. A., Henderson, R., Heymann, J. A. W. & Oesterhelt, D. (1993). J. Mol. Biol. 234, 837–846.  CrossRef CAS PubMed Web of Science Google Scholar
First citationHughes, E. W. (1953). Acta Cryst. 6, 871.  CrossRef IUCr Journals Web of Science Google Scholar
First citationKarle, J. & Hauptman, H. A. (1956). Acta Cryst. 9, 635–651.  CrossRef IUCr Journals Web of Science Google Scholar
First citationKarle, J. & Karle, I. L. (1966). Acta Cryst. 21, 849–859.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationLuzzati, V., Tardieu, A. & Taupin, D. (1972). J. Mol. Biol. 64, 269–286.  CrossRef CAS PubMed Web of Science Google Scholar
First citationRead, R. J. (1986). Acta Cryst. A42, 140–149.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationSass, H. J., Büldt, G., Beckmann, E., Zemlin, F., Van Heel, M., Zeitler, E., Rosenbusch, J. P., Dorset, D. L. & Massalski, A. (1989). J. Mol. Biol. 209, 171–175.  CrossRef CAS PubMed Web of Science Google Scholar
First citationSayre, D. (1952). Acta Cryst. 5, 60–65.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSchenk, H. (1973). Acta Cryst. A29, 77–82.  CrossRef IUCr Journals Web of Science Google Scholar
First citationSheldrick, G. M. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 147–157. Warrington: Daresbury Laboratory.  Google Scholar
First citationWeeks, C. M. & Miller, R. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 139–146. Warrington: Daresbury Laboratory.  Google Scholar
First citationWilson, A. J. C. (1949). Acta Cryst. 2, 318–321.  CrossRef IUCr Journals Web of Science Google Scholar
First citationWoolfson, M. M. & Fan, H.-F. (1995). Physical and Non-Physical Methods of Solving Crystal Structures. Cambridge University Press.  Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds