σ2 R, a reciprocal-space measure of the quality of macromolecular electron-density maps

A reciprocal-space measure of the quality of macromolecular crystallographic phases based on the variance of the local roughness of the map is presented.


Introduction
A key step in the determination of macromolecular crystal structures, either by direct methods or by more traditional MAD or MIR approaches, is the evaluation of the quality of an electron-density map. In applying direct methods to macromolecular crystal structure determination, statistical relationships derived from characteristics of small-molecule structures (e.g. Sheldrick, 1990;Weeks et al., 1995;Hauptman, 1997) are typically used to discriminate between possible phase sets. In the MAD or MIR approaches, the crystallographer typically manually examines an electron-density map and equates its interpretability with its quality. There would be considerable utility in having objective measures of the quality of electron-density maps which include as many features of macromolecular crystals as possible. Such measures could be used to choose between possible phase sets in ab initio methods and between possible heavy-atom solutions in the MIR and MAD methods. Additionally, if the measure of quality could be expressed in a simple reciprocalspace formulation, the measure could be used to improve phase quality or even to determine phases ab initio.
One measure of the quality of macromolecular electrondensity maps which has been proposed is an automated analysis of the connectivity of electron-density maps (Baker et al., 1993). This approach works well for evaluation of a map, but unfortunately it has proven dif®cult to use in phase improvement. We have recently demonstrated that an evaluation of the distinction between solvent and protein regions can be a very powerful criterion for scoring electrondensity maps (Terwilliger & Berendzen, 1999a,b). Our approach is based on the well known observation that macromolecular crystals typically contain distinct regions of protein (where the local variation of electron density from point to point is very high) and solvent (where the electron density is essentially constant). This observation has been the basis of widely used solvent-¯attening procedures (Wang, 1985;Xiang et al., 1993;Podjarny et al., 1987;Abrahams et al., 1994;Zhang & Main, 1990).
We have used the difference between protein and solvent regions to generate an objective measure of the quality of a macromolecular electron-density map. Firstly, we calculated the local r.m.s. electron density near each grid point in the asymmetric unit, omitting the F 000 term in the Fourier synthesis. In this way, the local r.m.s. density is very small in the solvent region but large in the protein region. We then determined the standard deviation of this local r.m.s. density over the entire asymmetric unit and use it as a ®gure of merit of the phasing. Maps which have a uniform distribution of local r.m.s. density have low values of the standard deviation; those with distinct protein and solvent regions have higher values. We have found this measure very useful in differentiating between heavy-atom solutions in the MIR and MAD approaches, as well as in identi®cation of the hand of heavyatom solutions when anomalous differences have been measured (Terwilliger & Berendzen, 1999a).
Although it is dif®cult to express the standard deviation of local r.m.s. electron density in a reciprocal-space formulation, a very closely related characteristic, the variance of the local roughness, can be calculated readily. Here, we de®ne this variance of the local roughness as the overall variance of the local variance of electron density in a map, and show how it can be calculated in reciprocal space. The expression we derive is suitable as a ®gure of merit for phase-quality evaluation, for phase improvement and for ab initio phasing methods.

Theory
In our previous work, we calculated the standard deviation of local r.m.s. electron density in a map. It was calculated using a grid with spacing approximately one-third of the resolution of the map in boxes ®ve grid units on an edge, and the standard deviation of the local r.m.s. density was obtained from overlapping boxes throughout the asymmetric unit of the crystal (Terwilliger & Berendzen, 1999a). We found that the precise size and overlap of the boxes had only small effects on the calculation. Here, we use a closely related but more generalizable approach, in which the overall variance of the local roughness of electron density is calculated. Instead of using overlapping boxes to determine the variation of local meansquare density from point to point in the cell, we use a windowing function to de®ne the region over which the local variance (roughness) of electron density is calculated. Any windowing function could be used for this purpose, but a particularly convenient one is a Gaussian function.
The local roughness in a map [r(x)] can be represented by the weighted variance of electron density in a region de®ned by a windowing function centered at x: where &x is the mean local electron density, given by and g(x) is an arbitrary windowing function. If the windowing function is a three-dimensional Gaussian function with unit volume and a variance (for each of the components x, y, z) of ' 2 then it can be expressed as The variance (' 2 R ) of this local roughness of electron density over the entire unit cell is then given by where r 1aV rx and V is the volume of the unit cell. To calculate the variance of local roughness of the electron density, ' 2 R , in reciprocal space, we use the facts that the ®rst term on the right-hand side of (2) represents the convolution of & 2 (x) and g(x), and that &x in (2) is in turn the convolution of &(x) and g(x). The electron density &(x), assumed to be a real function, and the squared electron density & 2 (x) can be expressed as (cf. Bracewell, 1986) respectively, where h (ha*, kb*, lc*) and the reciprocal lattice vectors are a*, b* and c*. The coef®cients B h can be calculated from the structure factors F h using the relation summing over all values of k. The Gaussian function g(x) can be readily expressed in Fourier space; it appears as the temperature factor in the Fourier transform of a Gaussian distribution of electron density about an atom, for example. An expression for a Gaussian centered at the origin with unit volume and a variance of & 2 is and S h is the magnitude of the scattering vector jjhjj 2 sin a!.
As &x (3) is the convolution of &(x) and g(x), we can write where the coef®cients Q h are simply the original structure factors F h damped by the exponential factors G h , The second term on the right-hand side of (2) can now be expressed using (7) and (8) as where the coef®cients B AVG h are based on the dampened structure factors Q k in (12), Next, as the ®rst term on the right-hand side of (2) is the convolution of & 2 x and g(x), we can write where the coef®cients T h are given by We can now express the local roughness of a map (1) in the form rx h where the coef®cients R h are given by The desired variance ' 2 R in (5) is composed of two parts, the mean value of r 2 (x) and the square of the mean value of r(x) over the unit cell. The mean value of r(x) over the unit cell is simply the h = (0, 0, 0) term of its corresponding transform, R 000 . Similarly, the mean value of r 2 (x) is given by the h = (0, 0, 0) term of its transform. Using Parseval's theorem (cf. Bracewell, 1986), the mean value of r 2 (x) can be expressed in the form where the integral is taken over the unit-cell volume. Finally, the variance of local roughness (' 2 R ) in (5)

Discussion
(21) is a representation in reciprocal space of ' 2 R , the variance of the local roughness of electron density in a Fourier synthesis. In the case of macromolecular crystals containing well de®ned regions of protein and solvent, this variance tends to be very high, as protein-containing areas of the unit cell are very rough and solvent-containing areas are very smooth (Terwilliger & Berendzen, 1999a). Consequently, the value of this variance can be used as a measure of the relative qualities of various possible phase sets for a macromolecular structure.
The variance of local roughness, ' 2 R , in (21) is given by the sum of squares of the coef®cients R h , other than R 000 , in the Fourier synthesis for the local roughness, r(x). This is equivalent to noting that ' 2 R is simply the overall mean square value of the local roughness, after subtracting the overall average value of R 000 . The coef®cients R h for the local roughness, given in (18), each contain two terms, B h G h and B AVG h . The ®rst term, B h G h , consists of coef®cients in the Fourier series expression (15) for the local mean-square electron density. The second term, B AVG h , are coef®cients in the Fourier series expression for the local mean electron density, squared. The difference corresponds to the local variance of the electron density, which we describe as local roughness.
An important feature of (21) is that only the low-order terms are large. This is a consequence of the presence of the exponential terms G h multiplying the B h terms in (18) and multiplying the F h terms in (12). Because of this, ' 2 R in (21) is, to a ®rst approximation, the sum of the squares of the lowestorder terms in the Fourier series (7) describing & 2 x. The magnitudes of these low-order terms describe how well de®ned the regions of the unit cell are which contain low and high values of & 2 x. If the distribution of & 2 x is relatively uniform in the unit cell, then the low-order terms in this Fourier series will be small. If the distribution is highly nonuniform then the low-order terms, and hence ' 2 R , will be large. (21) has several important properties which should be emphasized. The most signi®cant is that the exponential term limits the range of h over which the terms in the summation are large to those with small jjhjj. This means that evaluating ' 2 R can be rapid. The calculation of each B h in (8) or B AVG h in (14) requires just one pass through all re¯ections. As only small values of h make a large contribution to ' 2 R , a relatively small number of passes through the re¯ections are necessary to calculate ' 2 R . The potential rapidity of calculation of ' 2 R means that Monte Carlo methods or methods based on the genetic algorithm could potentially be used to optimize ' 2 R even in cases with large numbers of re¯ections. If a windowing function other than a Gaussian is used, or if the Gaussian function has a very narrow width, however, the number of terms needed to accurately evaluate ' 2 R would not necessarily be small. In general, the calculation of ' 2 R using the low-order terms in (21) corresponds to truncation of the spectrum of the windowing function at some resolution.
The second signi®cant aspect of (21) is that the value of ' 2 R depends on the crystallographic phases in an easily calculable way. It is straightforward to differentiate (21) with respect to individual phases. This means that matrix methods can be used to adjust the phases to maximize ' 2 R . As re¯ections only interact signi®cantly in (8) with other re¯ections which differ in k by a small number, such matrix methods would have to involve at most only a fraction of the elements in the matrix and possibly just diagonal elements. This kind of approach could be used to combine the maximization of ' 2 R with that of other direct-methods ®gures of merit to improve the ability of direct-methods to solve macromolecular structures.
As (21) is essentially a reciprocal-space formulation of the real-space measure of map quality which we have already examined in detail (Terwilliger & Berendzen, 1999a), most of the properties of the two formulations will be very similar. In Fig. 1, we present a set of model calculations using (21) to evaluate electron-density maps in reciprocal space. 6200 model data from 20 to 3.0 A Ê were generated based on coordinates from a dehalogenase enzyme from Rhodococcus species ATCC 55388 (American Type Culture Collection, 1992) determined recently in our laboratory. The protein contains 316 amino-acid residues and crystallizes in space group P2 1 2 1 2 with unit-cell dimensions a = 94, b = 80, c = 43 A Ê and one molecule in the asymmetric unit (J. Newman, personal communication). Fig. 1(a) shows results for a total of 2000 phase sets generated from the model data, with phase errors ranging from 0±150 . These model data sets were analyzed using (21) with a value of ' = 6 A Ê and including all 364 terms for which the exponential term G(h) in (10) has a value of 0.0001 or larger. The logarithm of the variance in local roughness, log(' 2 R ), is plotted in Fig. 1(a) as a function of the cosine of the mean phase error in the data set. For phase sets with hcos(Á)i of $0.3 or greater, the logarithm of the variance in local roughness is quite closely related to the phase accuracy. For phase sets with lower hcos(Á)i, there is only a small correlation. Fig. 1(b) shows the practical implications of the data in Fig. 1(a) and also illustrates that only low-order terms in (21) are necessary for calculating ' 2 R . In Fig. 1(b), the data in Fig. 1(a) are analyzed to estimate the probability that a correct choice of the better of two phase sets can be determined from the logarithm of the variance of local roughness. Fig. 1(b) shows analyses of four groups of 2000 phase sets each. In each of the four analyses, a different minimum value of the exponential term G(h) was used, ranging from 0.0001 to 0.1. To obtain Fig. 1(b), the data in Fig. 1(a) were grouped into pairs of sets differing by 0.1 AE 0.05 units in hcos(Á)i. Each member of each set was compared with each member of the paired set, and the fraction of times that the member with the higher value of log(' 2 R ) also had the higher value of hcos(Á)i was plotted. Fig. 1(b) shows that, as expected in phase sets with very low phase accuracy (hcos(Á)i < 0.25), the value of log(' 2 R ) leads to only a 50% chance of choosing the better of two phase sets which differ in accuracy. For phase sets with values of hcos(Á)i from 0.25 to 0.4, however, the probability of choosing the better of two phase sets differing by this amount increases from 0.6 to 0.9. The 58 lowest order terms in the series in (21) give almost the same likelihood of making a correct choice as the 364 lowest order terms. This means that high-order terms can be ignored without a substantial effect.

Conclusions
The reciprocal-space formulation presented here has major advantages compared with the real-space calculations carried out previously (Terwilliger & Berendzen, 1999a). These are that the variance ' 2 R can be calculated without a Fourier transform and that potentially phases can be adjusted to maximize the variance. The rapid calculation of variance means that it can be used as a measure of the quality of phases in many different trials, and the potential for maximization of the variance means that it can be used in phase improvement and possibly even ab initio phasing algorithms. The most powerful means for phase improvement for macromolecules without non-crystallographic symmetry is at present solvent attening (Wang, 1985;Xiang et al., 1993;Podjarny et al., 1987;Abrahams et al., 1994;Zhang & Main, 1990). Carrying out this type of procedure requires that the electron-density map be of suf®ciently high quality that an envelope de®ning the boundary between protein and solvent can be reliably calculated. (21) provides a means for improving phases even before the boundary is clearly de®ned. Maximizing ' 2 R will maximize the distinction between protein and solvent regions without requiring a knowledge of where each are located. Consequently, (21) may be useful in cases where solvent¯attening is not effective, as well as providing a complementary approach in cases where phases are good to begin with. Calculation of variance of local roughness using (21). (a) The logarithm of ' 2 R is plotted for 2000 model phase sets, as described in the text. The abscissa is hcos(Á)i, the mean value of the effective ®gure of merit of the phase set. (b) The probability of choosing the better of two phase sets which differ in quality by 0.1 units of hcos(Á)i is plotted for model data obtained as in (a), using the 364 lowest order terms (diamonds), 249 lowest order terms (triangles), 145 lowest order terms (squares) or 58 lowest order terms (crosses), as described in the text.
There are several aspects of the reciprocal-space formulation which remain to be optimized. One is the choice of the windowing function. We have chosen a Gaussian function, but the derivation we carried out is independent of the windowing function and any function could have been used. A Gaussian is particularly convenient because it results in strongly damped coef®cients that become very small for all but small values of jhjX Other windowing functions, however, might yield better measures of the quality of the electron-density map, and a survey of other functions might improve the algorithm. Another possibility might be to construct a histogram of values of ' 2 R from many solved protein structures which could in turn be used to construct a data-likelihood model for estimation of phase errors. Such an approach could be considerably more powerful than the one described here because it would give probability information which could be combined in a Bayesian approach with other sources of phase information.