research papers
Statistical density modification with noncrystallographic symmetry
^{a}Mail Stop M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
^{*}Correspondence email: terwilliger@lanl.gov
Statistical density modification is a technique for phase improvement through a calculation of the posterior probability of the phases, given experimental phase information and expectations about features of the electrondensity map. The technique can take advantage of both estimates of electron density in the map and uncertainties or probability distributions for those estimates. For crystals with
(NCS), this allows the use of the expected similarity of electron density at NCSrelated points without requiring an implicit assumption that these regions are identical.Keywords: density modification; noncrystallographic symmetry.
1. Introduction
; Rossmann, 1972; Kleywegt & Read, 1997). When present, NCS is often used along with solvent flattening (Wang, 1985) as a constraint on the electron density in a map, resulting indirectly in an improvement of the phases. Largely because it is difficult to do otherwise, in this process the NCS is generally treated as if it were exact, even if it might not be or if NCSrelated density might be more similar in one region than in another. In the holographic method of Szöke et al. (1997), the possibility of imperfect NCS was introduced into the densitymodification process by using a cost function to describe the expected similarity of NCS copies. In the method of Abrahams & Leslie (1996), imperfect NCS was also considered in weighting the various NCS copies during averaging.
(NCS) can be a powerful aid in improving the quality of macromolecular electrondensity maps (Bricogne, 1974Recently, we developed a method for improving crystallographic phases through the use of expectations about the electron density in a map that can take advantage of both the estimates of electron density in the map and uncertainties or probability distributions for those estimates (Terwilliger, 1999, 2000). This `statistical densitymodification' technique (previously known as `maximumlikelihood density modification') combines experimental phase probabilities with phase probabilities derived from the expectations about the electrondensity map to yield posterior (combined) phase probabilities. The key elements in this method are the use of a mapprobability function to describe the plausibility of an electrondensity map and the calculation of derivatives of the probability function to describe how this plausibility would change if an individual phase were changed.
The mapprobability function in statistical density modification consists of the integral over all points in the map of a local log probability of the map. In turn, the local log probability of the map is the logarithm of the a priori probability of the value of the electrondensity map at that point. The a priori probability distributions for plausible values of the electron density at each point in the map can come from any source, including the flatness of the solvent, the expected range of electron densities in the region of a macromolecule and, in the present case, As the mapprobability function uses probability distributions rather than expected electron densities, it takes into account both the expected electron density (the mean of the probability distribution) and uncertainties in this expectation (through the distribution itself). In the case of NCS, this means that statistical density modification can take NCS into account without requiring an implicit assumption that the NCS is exact or even that the deviations from perfect NCS are the same everywhere in the region where NCS applies.
2. Methods
2.1. Identification of NCS operators.
). Additionally, approximate centers of regions where they would apply were estimated from the centroids of the coordinates of each of the n_{NCS} sets of heavyatom sites that could be related to another set of heavyatom sites through NCS. These NCS operations were then checked by determining the covariance of density in regions related by the NCS operators as a function of distance d from the (approximate) centers of the regions where NCS applies, fitting this covariance to a Gaussian centered at d = 0, extrapolating the covariance to d = 0 and only including NCS if the extrapolated covariance 〈ρ_{i}ρ_{j}〉 was at least 0.1 times the meansquare value of the electron density in the map (it was typically 0.5 to 2 times the meansquare value of the map).
operators were identified using the NCS in heavyatom sites (Terwilliger, 20022.2. Identification of `NCS asymmetric unit'
The region over which NCS applies and which is repeated n_{NCS} times in the of the crystal was identified using the covariance in a fashion similar to that used for checking the NCS operators and similar to the automatic method described by Cowtan (1998). A local mean covariance of density among the n_{NCS} regions of NCS was used to identify this region. For each point on a grid centered at the center of one of the presumed regions where NCS applies, the mean value of the covariance of density 〈ρ_{i}ρ_{j}〉 for a sphere with radius of r around all pairs of points related by NCS to this one was calculated. The radius r was typically taken to be the same as the radius used for smoothing the squared electron density in mask calculation for solvent flattening (Wang, 1985). The NCS was then defined by sequentially testing all points on the grid, starting with those close to the origin and then moving further away. If the point had a mean covariance of density greater than a cutoff c_{MIN} and was not related by or NCS to any other point already in the NCS it was included. The cutoff c_{MIN} was chosen by testing a range of values and picking the one that yielded approximately the same fraction of the within the n_{NCS} regions of NCS as was expected to be within the macromolecule (i.e. not solvent) region of the unit cell.
2.3. Estimation of expected electrondensity probability distribution from NCS
The electron density ρ_{i} at a point i related by NCS symmetry to n_{NCS} − 1 other points in the was estimated from the of the density at the n_{NCS} − 1 NCSrelated points. The weights and the uncertainty in this estimate were estimated from the local covariance of density 〈ρ_{i}ρ_{j}〉 mentioned above as follows. A simple error model for the relationship between the density ρ_{i} and ρ_{j} at two NCSrelated points i and j was used,
where x corresponds to the contribution to electron density that is shared by NCSrelated points and z_{i} and z_{j} correspond the contributions that are unique to NCS copies i and j (including both errors in the map and true differences). The expected value of x^{2} can then be estimated from the covariance of ρ_{i} and ρ_{j},
where the average is taken over all points in the NCS i and j. The expected value of z_{i}^{2} can then be estimated using (1), again averaged over the NCS asymmetric unit,
and all pairs of NCS copiesUsing this error model, if ρ_{i} is used as an estimate of the electron density shared by all NCS copies for this point (x), then the expected variance in this estimate of x is given by .
This variance was used as the weighting term for averaging the n_{NCS} − 1 values of electron density used to estimate the value of x for each point in the NCS and for estimating the variance in this estimate. This estimate and variance x then formed the basis for a Gaussian probability distribution of the expected electron density at this point derived from NCS.
Several modifications to the simple model resulting in (4) were made to take into account local variations in similarity among NCSrelated molecules and to reduce bias in the estimation of 〈x^{2}〉 arising from the iterative nature of density modification. The estimates of 〈x^{2}〉, the meansquare electron density shared by all NCS copies, are calculated in (3) as an average over the NCS In this formulation, all NCS copies and all points within the NCS have the same value of 〈x^{2}〉. To take into account variation in the overall similarity between pairs of NCSrelated molecules, an estimate c_{ij} = 〈x^{2}〉_{ij} was estimated separately for each pair. Then, to take into account local variations in similarity among NCSrelated molecules, a local estimate of 〈x^{2}〉 corresponding to the local meansquare density in common between all NCSrelated copies was obtained by using a locally averaged value of c(x) = 〈ρ_{i}ρ_{j}〉 in (3), where the local average was taken over a sphere with the same radius r used above in the identification of the NCS Then, the overall estimate of 〈x^{2}〉 for a particular pair of NCSrelated points in molecules i and j was the product of c_{ij} and c(x).
In this densitymodification procedure, the electron density at points in n_{NCS} − 1 NCSrelated copies of the NCS are use to estimate the expected density at points in the remaining copy. After one or more cycles of density modification, the density at all n_{NCS} copies of the NCS tend to become increasingly similar. This is desirable of course, as the point of using NCS in density modification is to take advantage of the fact that the density in the various NCS copies really is more similar than is found in the initial map. However, in practice the densitymodification procedure can sometimes make the NCS copies even more similar than they really are, leading to an overestimate of 〈x^{2}〉 and an underestimate of the variance . To obtain a less biased estimate of the variance , a crossvalidation approach was used. At the start of the densitymodification procedure several cycles were carried out without including NCS information in the phasing process, but including the calculation of 〈x^{2}〉. This overall covariance estimate was used as a `free' or unbiased estimate throughout the densitymodification procedure. Then, to account for local variation in the covariance of density 〈x^{2}〉, the overall values were multiplied for each cycle of density modification by the current ratio of the locally averaged value of 〈ρ_{i}ρ_{j}〉 (as described above) to the overall mean value of 〈ρ_{i}ρ_{j}〉.
2.4. Combination of information from NCS with a priori probability distribution of electron density in the macromolecule region
Even in the absence of NCS, a substantial amount of information exists on the expected distribution of electron density at points in the region of the macromolecule. As discussed earlier (Terwilliger, 2000), the a priori probability distribution of electron density in this region can be expressed in terms of the distribution found for model electron density, ρ_{M},
where the coefficients w_{k}, d_{k} and are estimated by fitting p(ρ_{M}) to the model electron density. A similar description can be obtained for the solventcontaining region of the model map. The a priori probability distribution for electron density in the macromoleculecontaining region of an experimental map with errors can then be described by
where the coefficients β and are estimated by fitting (6) and the corresponding equation for the solvent region (with the same values of β and and different values of w_{k}, c_{k} and ) to the electron density in the experimental map. For simplicity we rewrite (6) as
where
The NCSbased information consists of an estimate, ρ′, of the density at each point in the map, with an associated variance σ^{2},
or
where,
Combining this with (7) yields the expression
where the coefficients , and are given by
3. Results and discussion
3.1. Automatic identification and verification of NCS
The procedure for identifying (Terwilliger, 2002) and testing for NCS described here was tested by applying it to phases obtained from four MAD experiments and one SAD experiment on crystals with twofold, threefold, fourfold and sixfold NCS. In each case, the SOLVE software (Terwilliger & Berendzen, 1999) was used to identify selenium sites and calculate a starting electrondensity map. The MAD data sets included a nucleotide diphosphate kinase with nine selenium sites from Pyrobaculum aerophilum (Pédelacq et al., 2002), a hypothetical protein with 16 selenium sites from P. aerophilum (J. D. Pedelacq, E. Liong & T. C. Terwilliger, unpublished work), a red fluorescent protein with 26 selenium sites (Yarbrough et al., 2001) and a formate dehydrogenase with 12 selenium sites from P. aerophilum (T. S. Peat, J. M. Newman, G. S. Waldo & T. C. Terwilliger, unpublished work; PDB entry 1qp8 ). The SAD data set was 2aminoethylphosphonate transaminase with 66 selenium sites (Chen et al., 2000).
Table 1 lists these crystals, with the number of NCS copies in the the number of NCS operators found from the selenium sites by the automatic procedure we developed recently (Terwilliger, 2002) and the number of NCS operators found after comparing the electron density at the potentially NCSrelated positions. In all five cases, the NCS could be identified correctly from the heavyatom sites and confirmed using the electron density in the map. In one of the cases (the dehydrogenase from P. aerophilum), the NCS in the selenium sites could only be detected when the allowed deviation of the sites from perfect NCS was increased from the default value of 1.4 Å (half the resolution) to 3.0 Å.

As a further test, the procedure for identifying and verifying NCS was applied to three MAD data sets and one MIR data set in which the crystals had no NCS (Table 1). The MAD data sets were gene 5 protein (Skinner et al., 1994), the armadillo repeat region from βcatenin (Huber et al., 1997) and initiation factor 5A from P. aerophilum (Peat et al., 1998), and the MIR data set was dehalogenase from Rhodococcus (Newman et al., 1999). In each case, the twostep procedure of searching for NCS in the heavyatom sites and verifying any NCS that was present in the heavyatom sites resulted correctly in the conclusion that no NCS was present. In two cases (βcatenin and dehalogenase), the correct conclusion was drawn after NCS was found in the heavyatom sites but not in the electrondensity map.
3.2. Incorporation of NCS in statistical density modification
Table 2 illustrates the utility of NCS information in the context of statistical density modification, comparing the quality of the final electrondensity maps with and without the inclusion of NCS information. The quality of the maps was assessed using two measures. One was the mean effective figure of merit of the phases, calculated from the mean cosine of the phase difference between the phases from this procedure and the phases from a refined model. The other was the of the map calculated from this procedure with the map calculated using phases from the refined model.

The extent of phase improvement resulting from the use of the NCS information varied considerably from case to case (Table 2). The starting correlation of the map obtained from MAD or SAD phasing with the map calculated from the refined models and the starting mean cosine of the phase error is listed for each. Additionally, the corresponding values for the maps obtained after statistical density modification, with and without NCS, are shown. In both of the cases with twofold symmetry, the NCS information improved the phasing very slightly. This was a little surprising considering that the twofolds were not parallel to axes in either case and the weights on NCS were sufficient to bring the correlation coefficients between NCS copies to 0.81 (formate dehydrogenase) and 0.93 (the hypothetical protein).
In the case of threefold symmetry, the inclusion of NCS made a very large difference, increasing the correlation of the resulting map with the map calculated from the refined model from 0.65 to 0.77. Somewhat surprisingly, in the cases with four and six NCSrelated molecules the inclusion of NCS had a relatively small effect. On the other hand, the phases were very accurate even without NCS (correlations with the model map of 0.90 and 0.79, respectively), so in these cases there might simply not be much additional information available from the inclusion of NCS. This conclusion was tested by creating a test set of phases from the red fluorescent protein data set using just four of the 26 sites for phasing. As anticipated, beginning with a map correlation of 0.29, the inclusion of NCS had a very large effect, raising the final map correlation from 0.33 to 0.85, nearly as high as that obtained using all the selenium sites for phasing.
3.3. Estimation of variances in estimates of electron density from NCS
A key element of the statistical method for density modification is the ability to specify a probability distribution for the expected electron density in a map. In the case of NCS, this means that the method can, in principle, take into account the similarity of NCSrelated molecules. It can also take into account the differences between NCSrelated copies in a map that arise from errors in phases. In the present implementation of NCS in statistical density modification, (4) is used to estimate the uncertainty in target values of electron density used as prior information based on NCS. We tested whether these estimates are optimal by carrying out a systematic investigation of the phase improvement obtained when these variance estimates are multiplied by each of a range of values from 0.1 to 150.
Fig. 1 shows the result of this test applied to the NDPkinase data, with the threefold NCS applied. When NCS variance estimates are multiplied by a large scale factor (150) so that NCS is effectively not used in density modification, the final correlation of the NCSrelated regions of the map after density modification was just 0.66 and the correlation between the densitymodified map and the map based on the refined structure of NDPkinase was 0.66 (Fig. 1). At the other extreme, when variance estimates are multiplied by a small scale factor (0.1) so that NCS is strongly emphasized in density modification, the final correlation of the NCSrelated regions of the map was 0.97, but a map correlation between the densitymodified map and the map based on the refined structure of NDPkinase is still only 0.68. At intermediate values of the scale factor (e.g. 0.8–1.2), corresponding closely to the estimates of the variance obtained with (4), NCS is included in density modification, but in a balanced way. The resulting final correlation of the NCSrelated regions of the map is 0.94–0.95 and the final map correlation with the model map is 0.76–0.77. Overall, Fig. 1 indicates that the estimates of variances obtained using (4) are very close to optimal in carrying out statistical density modification with NCS.
4. Conclusions
The mapprobability function in statistical density modification provides a way to incorporate information using a different probability distribution of expected values of electron density for every point in a map. This flexibility means that it is not necessary to assume that all NCSrelated copies of a molecule in a crystal are identical, or even to assume that all parts of a single molecule are equally similar to the NCSrelated parts of another molecule. The extent of local similarity among NCSrelated molecules can be assessed using the local correlation of density and bias in these estimates owing to the iterative nature of density modification can be reduced by estimating the overall correlation by crossvalidation.
The methods described here are implemented in the software RESOLVE (Terwilliger, 2000) available from https://solve.lanl.gov .
Acknowledgements
The author is grateful to the NIH and the PHENIX softwaredevelopment consortium for generous support, to LiWei Hung for helpful discussions, to Thomas Peat for use of the formate dehydrogenase data prior to publication and to Osnat Herzberg for the use of the aminoethylphosphonate transaminase data prior to publication.
References
Abrahams, J. P. & Leslie, A. G. W. (1996). Acta Cryst. D52, 30–42. CrossRef CAS Web of Science IUCr Journals
Bricogne, G. (1974). Acta Cryst. A30, 395–405. CrossRef Web of Science IUCr Journals
Chen, C. C. H., Kim, A., Zhang, H., Howard, A. J., Sheldrick, G. M., DunawayMariano, D. & Herzberg, O. (2000). Am. Crystallogr. Assoc. Annu. Meet., Abstract 02.06.03.
Cowtan, K. (1998). Acta Cryst. D54, 750–756. Web of Science CrossRef CAS IUCr Journals
Huber, A. H., Nelson, W. J. & Weis, W. I (1997). Cell, 90, 871–882. CrossRef CAS PubMed Web of Science
Kleywegt, G. J. & Read, R. J. (1997). Structure, 5, 1557–1569. Web of Science CrossRef CAS PubMed
Newman, J., Peat, T. S., Richard, R., Kan, L., Swanson, P. E., Affholter, J. A., Holmes, I. H., Schindler, J. F., Unkefer, C. J. & Terwilliger, T. C. (1999). Biochemistry, 38, 16105–16114. Web of Science CrossRef PubMed CAS
Peat, T. S., Newman, J., Waldo, G. S., Berendzen, J. & Terwilliger, T. C. (1998). Structure, 6, 1207–1214. Web of Science CrossRef CAS PubMed
Pédelacq, J.D., Piltch, E., Liong, E. E., Berendzen, J., Kim, C.Y., Rho, B.S., Park, M. S., Terwilliger, T. C. & Waldo, G. S. (2002). In the press.
Rossmann, M. G. (1972). The Molecular Replacement Method. New York: Gordon & Breach.
Skinner, M. M., Zhang, H., Leschnitzer, D. H., Guan, Y., Bellamy, H., Sweet, R. M., Gray, C. W., Konings, R. N. H., Wang, A. H.J. & Terwilliger, T. C. (1994). Proc. Natl Acad. Sci. USA, 91, 2071–2075. CrossRef CAS PubMed Web of Science
Szöke, A., Szöke, H. & Somoza, J. R. (1997). Acta Cryst. A53, 291–313. CrossRef Web of Science IUCr Journals
Terwilliger, T. C. (1999). Acta Cryst. D55, 1863–1871. Web of Science CrossRef CAS IUCr Journals
Terwilliger, T. C. (2000). Acta Cryst. D56, 965–972. Web of Science CrossRef CAS IUCr Journals
Terwilliger, T. C. (2002). Acta Cryst. D58, 2213–2215. Web of Science CrossRef CAS IUCr Journals
Terwilliger, T. C. & Berendzen, J. (1999). Acta Cryst. D55, 849–861. Web of Science CrossRef CAS IUCr Journals
Wang, B.C. (1985). Methods Enzymol. 115, 90–112. CrossRef CAS PubMed
Yarbrough, D., Wachter, R. M., Kallio, K., Matz, M. V. & Remington, S. J. (2001). Proc. Natl Acad. Sci. USA, 98, 462–467. Web of Science CrossRef PubMed CAS
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.