research papers
and protein crystallography at low resolution
^{a}Department of Chemistry, University of Glasgow, Glasgow G12 8QQ, Scotland
^{*}Correspondence email: chris@chem.gla.ac.uk
The tools of modern
are examined and their limitations for solving protein structures discussed. need atomic resolution data (1.1–1.2 Å) for structures of around 1000 atoms if no heavy atom is present. For lowresolution data, alternative approaches are necessary and these include maximum symbolic addition, Sayre's equation, group scattering factors and electron microscopy.Keywords: direct methods; maximum entropy.
1. The tools of direct methods
), Woolfson & Fan (1995) or Fortier (1997).
have evolved from the early 1950s to become the method of choice for solving smallmolecule crystal structures from diffraction data. In this context, `small' extends from ten atoms in the to a small protein with 1000 or more atoms. In this section, the tools that use and their limitations are examined. This is necessarily brief; for a full description, see Giacovazzo (19981.1. Normalization
Intensity data are first normalized to give normalized structure factors or E magnitudes,
where k is a scale factor that puts the observed intensities F_{h}^{2} on an absolute scale, ∊_{h} is the for reflection h and f_{j} is the for atom j. There are N atoms in the with an overall isotropic temperature factor B. B and k need to be determined and this is carried out using Wilson's method (Wilson, 1949). This assumes that the atoms in the are uniformly and randomly distributed and such an assumption forms the basis of Wilson statistics. Obviously, in proteins and other biological macromolecules this is not the case; at the very least, we have an ordered protein and a disordered solvent volume that really requires a different treatment. Nonetheless, it is possible to obtain useful values of k and B by this method.
The distribution of E magnitudes depends on whether the is centrosymmetric or not and does not depend on structural complexity. In the noncentrosymmetric case, ∼37% of the normalized structure factors are expected to be >1.0, 1.8% >2.0 and only 0.01% >3.0.
1.2. Triplets
Each structurefactor magnitude F_{h} has an asscoiated phase angle φ_{h} which we wish to determine. Triplets are the fundamental phase relationship in and they take the form
It is obvious that the indices of the three reflections sum to zero. Associated with each triplet is a concentration parameter κ_{h,k}
where N is the number of atoms assumed equal in the excluding H atoms. Relationship (2) implies a probabilistic origin and the Cochran distribution (Cochran, 1955) gives us the required formula,
I_{0} is a zerothorder Bessel function of the first kind. The expression is a normalizing term. Fig. 1 shows how Bessel functions appropriate to behave as a function of their argument. The Cochran distribution assumes the viability of Wilson statistics. Fig. 2 shows how the probability distribution (4) varies with the concentration parameter. It can be seen that the mode of the distribution is always zero and that as κ_{h,k} decreases the information content of the Cochran distribution also decreases, until at κ_{h,k} = 1 very little useful information can be obtained concerning the value of the triplet. κ_{h,k} decreases as 1/N^{1/2}. If the three E magnitudes in the triplet have values of 2.5 then this corresponds to a limit of ∼1000 atoms in the unit cell.
1.3. Quartets
Quartets are the logical extension of triplets and involve four phases instead of three,
The distribution (Schenk, 1973; Hauptman, 1975) is more complex than that of the triplet. Defining the principal terms
and the unique crossterms
the required distribution is
where L is a normalizing term (usually determined numerically),
Three sorts of quartet can be identified as follows.

1.4. The tangent formula
The tangent formula (Karle & Hauptman, 1956) is a key formula in that lets us refine phase values and determine new ones. Consider the situation in which we have a series of triplets with a common reflection φ_{h}. They can be written
Consider also a situation in which all the phases on the RHS of (10) are known at least approximately; the tangent formula then gives us an estimate for φ_{h} which in its simplest form is
It can be extended to include quartets of all types and various weighting schemes which help impose stability on a formula that can be prone to instabilities.
1.5. Symbolic addition
The triplets (and quartets) form a set of linear equations relating phase values. The technique of symbolic addition (Karle & Karle, 1966) assigns algebraic symbols to a small number (typically 4–8) phases that have large associated E magnitudes and which interact strongly through phase relationships. Triplets and quartets are used to determine 50–100 new phases as functions of these symbols; this is the process of symbolic addition. The symbols are converted into numerical values using relationships between them made manifest by the symbolic addition procedure or by giving unassigned symbols permuted values in the range 0–2π. The phases are then extended and refined using either tangent or the (see §1.7).
Symbolic addition is not much used currently for solving small molecules; it has been superseded by methods that are much easier to automate. It does, however, have the virtues of stability when used when used with macromolecules at low resolution; this is explored further in §3.2.
1.6. The minimal principle
The mode of the Cochran distribution for triplets is always zero. However, the mean can be computed as
This expression gives rise to the minimal function (DeTitta et al., 1994)
where cosT_{h,k} is the value of the triplet computed from known phases. The function R(Φ_{3}) serves two purposes: (i) as a formula to refine and estimate new phases by minimizing the difference between the estimated value and the mean of the cosine of the triplet and (ii) as the minimal principle which uses (13) to define the best phase set, i.e. as a figure of merit.
1.7. The Sayre equation
The ) is algebraic rather than probabilistic in origin and is derived from the expression for the electron density and its square,
(Sayre, 1952In terms of E magnitudes this takes the form of the Sayre–Hughes (Hughes, 1953) equation,
The
can be used in the same way as the tangent formula, but has a more general validity and is not constrained to use only large structurefactor magnitudes.1.8. Figures of merit
In general, R(Φ_{3}). (ii) The negative quartet figure of merit (DeTitta et al., 1975),
are multisolutional: they give rise to multiple phase sets and we need to select those which are most likely to give useful structural information. Figures of merit serve this purpose and are used to rank phase sets. There are numerous such indicators, including the following. (i) The minimal function (13). An optimal phase set will have a minimum value ofwhere the summation spans all those quartets assumed negative using (8). An optimal phase set should have a minimum value of NQEST.
Usually, several figures of merit are calculated for a given phase set and these are combined to given an overall figure called a CFOM.
1.9. Correlation coefficients
Let E_{o} be the observed E magnitude and E_{c} the calculated value from, for example, a variant of the tangent formula; let w be the associated weight. For a set of such magnitudes we can then compute the CC, which takes many forms. A useful expression from Read (1986) is
Correlation coefficients lie between −1 ≤ CC ≤ 1.0. They can be used as figures of merit.
1.10. E maps
So far our discussions have involved reciprocalspace quantities; the transform into real space is carried out using E magnitudes via E maps,
The use of E magnitudes and the limits we shall impose on the reflections entering the summation in (16) mean that the electron density is only approximate (at the very least, there are serious seriestermination errors), but hopefully is sufficient to reveal structural features so that model building can begin.
1.11. Simplifying the problem
The problem of direct phasing can be simplified by the following heuristic rules.

2. Using the tools to solve crystal structures
There are numerous procedures for solving structures via A typical, though somewhat simplified, sequence is as follows.

2.1. What is needed for this method to work?
The procedure is usually routine if the following criteria are met.

This latter problem can be overcome using atomicity as a stronger constraint and this gives rise to the computer programs ShakeandBake (SnB; Weeks & Miller, 1997) and HalfBake (Sheldrick, 1997).
2.2. ShakeandBake (SnB)
SnB starts in a conventional way by normalizing the data and generating triplet (and optionally negative quartet) invariants. To assign initial phases, an extension of the randomphase procedure into is made; trial structures are generated by placing random atoms in the with distance constraints, i.e. atoms may not be closer than 1.5 Å. No angle constraints are applied. A Fourier transform gives phase values which, because of the distance constraint, tend to have lower errors than a simple randomphasing algorithm. Note that an imposition of atomicity is being invoked from the very beginning in this procedure.
The random phases are now refined using either tangent methods or a grid search based on the minimal function in which each phase is modified by a phase shift that minimizes R(Φ_{3}). A new map is generated from these refined phases and this is subjected to a peaksearch procedure (again we have atomicity) in which N peaks are selected (for an Natom problem) subject to the same distance constraint that was used in the initial phase generation. The new peaks give new phases, which are then refined in a cyclical fashion. At convergence, R(Φ_{3}) is stored, a new phase set is generated and the procedure is repeated.
As phase sets accumulate, one looks for a set which has a much lower value of R(Φ_{3}) than the others. This is usually an indication of phase correctness and the atoms corresponding to this solution form the starting point of a traditional completion.
2.3. HalfBake
HalfBake uses the ideas of SnB in a different way but still requires and imposes atomicity. Instead of the minimal function, correlation coefficients (17) and a restricted coefficient
(where E_{min} is typically 1.3–1.5) are used as indicators of phase correctness. The tangent formula is used in phase Fig. 6 shows the flow chart for this procedure.
Both SnB and HalfBake have solved structures with N > 1000 and have also become valuable tools in deriving heavyatom substructures in proteins. In this case, the resolution limit can be substantially relaxed because even at 2 Å atoms such as Se are clearly resolved and the necessary atomicity is still present.
3. Solving protein structures at low resolution using direct methods
For reasons that should now be clear, there is no general solution of the i.e. modelfree) techniques have been explored: (i) maximum (ii) globular scattering factors, (iii) symbolic addition and (iv) and electron crystallography.
at low resolution, but the following directmethods (Other techniques such as sphere packings (Andersson, 1999; Andersson & Hovmöller, 1996) are outside the scope of this paper and other methods are fully described elsewhere in this issue.
3.1. Maximum entropy
The maximumentropy (ME) formalism was first applied to the ) and subsequently incorporated in a more general Bayesian statistical approach applied to macromolecules. For a review, see Gilmore (1986). The ME method is not constrained to the use of Wilson statistics and is stable irrespective of data resolution; it is thus better able to deal with modelfree ab initio at low resolution. Associated with the Bricogne formalism is likelihood as a figure of merit and this is also a resolutionindependent indicator of phase correctness of great power.
by Bricogne (1984For an example of the ME method applied to lowresolution electron diffraction data from membrane proteins, see Gilmore et al. (1996). In this work, two protein structures were solved in projection: Omp F porin and halorhodopsin.
3.1.1. Omp F porin
The structure of Omp F porin from the outer membrane of Escherichia coli (MW = 36 500 Da) was originally determined using images at 3.2 Å resolution by Sass et al. (1989). Their data were obtained at 100 kV from glucoseembedded samples on a liquidheliumcooled superconducting cryomicroscope. Most of the diffracted power from these images was contained within a 6 Å limit and so ab initio phasing was carried out to this same limit. There were 42 unique reflections; the plane group of the projection is p31m with a of side a = 72 Å. The true map using the imagederived phases of Sass is shown in Fig. 7(a). The best map derived from ME phasing is shown in Fig. 7(b). At this resolution, the preferred map has a basis set mean absolute phase error of only 9°. With only minor details there is an essential correspondence with this map and that computed with all correct angles from the image data; the is 0.94.
3.1.2. Halorhodopsin
Electrondiffraction amplitudes and electronmicrographderived crystallographic phases from halorhodopsin to 6 Å resolution were reported by Havelka et al. (1993) from frozen hydrated samples. The centrosymmetric tetragonal plane group is p4gm with unitcell parameter a = 102 Å. Within the 6 Å resolution limit, this corresponds to 76 unique reflections. The true map is shown in Fig. 8(a). Using the ME method, 16 reflections to 9 Å were phased with only one incorrect indication; the corresponding map is shown in Fig. 8(b). The between these two maps is 0.82.
3.2. Globular scattering factors
Harker (1953) discussed the problem of normalizing data via the Wilson method when its resolution was less than atomic. Wilson statistics only hold if the resolution is less than the shortest interatomic distance in the crystal. If this is not the case, then the expression
used by Wilson (where s = sinθ/λ) has to be replaced by
where F_{g} is a globular scattering factor. For a sphere,
and for G globs in the
Clearly, this reduces a cell with N atoms to one containing G globs. The associated phase relationships will reflect this by showing a large increase in the concentration parameter. This idea has been used extensively by Dorset (see, for example, Dorset & McCourt, 1999) in conjunction with symbolic addition to solve a variety of structures at 10–20 Å resolution.
3.3. Globular structure factors and symbolic addition: beef liver catalase
As an example of this (and of the ME method) see Dorset & Gilmore (1999), which examines beef liver catalase in projection at 9 Å using roomtemperature electrondiffraction data. The plane group is pgg, with unitcell parameters a = 69.7, b = 177 Å. Both the ME formalism and symbolic addition coupled with the Sayre–Hughes equation were used. In addition to using likelihood, the Luzzati figure of merit 〈Δρ^{4}〉_{min} was also employed, where Δρ = ρ − (Luzzati et al., 1972). Note that the minimum value of this figure of merit corresponds to maps with a minimum and maximum flatness (rather like entropy); this seems intuitively reasonable under lowresolution conditions.
The results of a symbolic addition calculation in which the best map was selected via 〈Δρ^{4}〉_{min} are shown in Fig. 9(a), which has a resolution of ∼9 Å. At first sight, Fig. 9(b), derived from ME calculations, shows no resemblance to Fig. 9(a), but it is a Babinet solution. Babinet solutions are those in which all the phase angles are shifted by π, i.e. φ_{h} → π + φ_{h}, and in real space the maps are characterized as the inverse of the nonreversed one. Babinet solutions are not uncommon when phasing at low resolution in a modelfree environment and care needs to be exercised. The Babinet of Fig. 9(b) is shown in Fig. 9(c) and the correspondence between this and Fig. 9(a) is obvious. Finally, the symbolic addition–Luzzati method is combined with the Babinet in Fig. 9(c) to give Fig. 9(d). For comparison, an imagederived solution at 23 Å using data from Akey & Edelstein (1983) is shown in Fig. 10.
3.4. The electron microscope and electron crystallography
The electron microscope is an invaluable tool in lowresolution imaging of biological macromolecules. It is the source of two sorts of data for crystallographic purposes.

Two sorts of situation arise in phase extension as follows.
3.5. The use of very low resolution reflections
Traditional wisdom dictates that very low angle reflections in protein crystallography are of minimal value and their use can prevent a successful structure solution. This is effectively refuted by Andersson (1999) and by the results presented above where the very low order reflections played a key role. To summarize his arguments: the solvent contribution to a given reflection depends on the difference between the electron density of the solvent and that of the protein. At very low resolution, the Babinet principle means that the phases of the solvent are shifted by π relative to the protein (see Fig. 11). The is close to 100% to 15 Å resolution, but becomes effectively zero at less than 3 Å. This means that no bulksolvent correction is needed when using only lowangle reflections. When mixing data at low and high resolution then the magnitude of the solvent vector depends on the solvent–protein contrast and we can write the total F_{h}^{total} as
where k_{s} measures the density ratio of solvent and protein and B_{s} is the solvent temperature factor. Thus, properly handled, there is no reason to exclude loworder data from ab intio structure determination.
Acknowledgements
I wish to acknowledge invaluable and stimulating discussions with Klas Andersson and Doug Dorset and support from EastmanKodak (Rochester), EPSRC and BBSRC.
References
Akey, C. W. & Edelstein, S. J. (1983). J. Mol. Biol. 163, 575–612. CrossRef CAS PubMed Web of Science Google Scholar
Andersson, K. M. (1999). J. Appl. Cryst. 32, 530–535. Web of Science CrossRef CAS IUCr Journals Google Scholar
Andersson, K. M. & Hovmöller, S. (1996). Acta Cryst. D52, 1174–1180. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (1984). Acta Cryst. A40, 410–445. CrossRef CAS Web of Science IUCr Journals Google Scholar
Cochran, W. (1955). Acta Cryst. 8, 473–478. CrossRef CAS IUCr Journals Web of Science Google Scholar
DeTitta, G. T., Langs, D. A., Edmonds, J. W. & Duax, W. L. (1975). Acta Cryst. A31, 472–479. CrossRef CAS IUCr Journals Web of Science Google Scholar
DeTitta, G. T., Weeks, C. M., Thuman, P., Miller, R. & Hauptman, H. A. (1994). Acta Cryst. A50, 203–210. CrossRef CAS Web of Science IUCr Journals Google Scholar
Dorset, D. L. & Gilmore, C. J. (1999). Acta Cryst. A55, 448–456. Web of Science CrossRef CAS IUCr Journals Google Scholar
Dorset, D. L. & McCourt, M. P. (1999). Z. Kristallogr. 214, 652–658. Web of Science CrossRef CAS Google Scholar
Fortier, S. (1997). Editor. Direct Methods for Solving Macromolecular Structures. Dortrecht: Kluwer. Google Scholar
Giacovazzo, C. (1998). Direct Phasing in Crystallography. Fundamentals and Applications. Oxford University Press. Google Scholar
Gilmore, C. J. (1986). Direct Methods for Solving Macromolecular Structures, edited by S. Fortier, pp. 317–321. Dortrecht: Kluwer. Google Scholar
Gilmore, C. J. (1996). Acta Cryst. A52, 561–589. CrossRef CAS Web of Science IUCr Journals Google Scholar
Gilmore, C. J., Nicholson, W. V. & Dorset, D. L. (1996). Acta Cryst. A52, 937–946. CrossRef CAS Web of Science IUCr Journals Google Scholar
Harker, D. (1953). Acta Cryst. 6, 731–736. CrossRef CAS IUCr Journals Web of Science Google Scholar
Hauptman, H. A. (1975). Acta Cryst. A31, 680–687. CrossRef IUCr Journals Web of Science Google Scholar
Havelka, W. A., Henderson, R., Heymann, J. A. W. & Oesterhelt, D. (1993). J. Mol. Biol. 234, 837–846. CrossRef CAS PubMed Web of Science Google Scholar
Hughes, E. W. (1953). Acta Cryst. 6, 871. CrossRef IUCr Journals Web of Science Google Scholar
Karle, J. & Hauptman, H. A. (1956). Acta Cryst. 9, 635–651. CrossRef IUCr Journals Web of Science Google Scholar
Karle, J. & Karle, I. L. (1966). Acta Cryst. 21, 849–859. CrossRef CAS IUCr Journals Web of Science Google Scholar
Luzzati, V., Tardieu, A. & Taupin, D. (1972). J. Mol. Biol. 64, 269–286. CrossRef CAS PubMed Web of Science Google Scholar
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Sass, H. J., Büldt, G., Beckmann, E., Zemlin, F., Van Heel, M., Zeitler, E., Rosenbusch, J. P., Dorset, D. L. & Massalski, A. (1989). J. Mol. Biol. 209, 171–175. CrossRef CAS PubMed Web of Science Google Scholar
Sayre, D. (1952). Acta Cryst. 5, 60–65. CrossRef CAS IUCr Journals Web of Science Google Scholar
Schenk, H. (1973). Acta Cryst. A29, 77–82. CrossRef IUCr Journals Web of Science Google Scholar
Sheldrick, G. M. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 147–157. Warrington: Daresbury Laboratory. Google Scholar
Weeks, C. M. & Miller, R. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances in Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 139–146. Warrington: Daresbury Laboratory. Google Scholar
Wilson, A. J. C. (1949). Acta Cryst. 2, 318–321. CrossRef IUCr Journals Web of Science Google Scholar
Woolfson, M. M. & Fan, H.F. (1995). Physical and NonPhysical Methods of Solving Crystal Structures. Cambridge University Press. Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.