Statistical density modification using local pattern matching

Statistical density modification can make use of local patterns of density found in protein structures to improve crystallographic phases.

A method for improving crystallographic phases is presented that is based on the preferential occurrence of certain local patterns of electron density in macromolecular electrondensity maps. The method focuses on the relationship between the value of electron density at a point in the map and the pattern of density surrounding this point. Patterns of density that can be superimposed by rotation about the central point are considered equivalent. Standard templates are created from experimental or model electron-density maps by clustering and averaging local patterns of electron density. The clustering is based on correlation coef®cients after rotation to maximize the correlation. Experimental or model maps are also used to create histograms relating the value of electron density at the central point to the correlation coef®cient of the density surrounding this point with each member of the set of standard patterns. These histograms are then used to estimate the electron density at each point in a new experimental electron-density map using the pattern of electron density at points surrounding that point and the correlation coef®cient of this density to each of the set of standard templates, again after rotation to maximize the correlation. The method is strengthened by excluding any information from the point in question from both the templates and the local pattern of density in the calculation. A function based on the origin of the Patterson function is used to remove information about the electron density at the point in question from nearby electron density. This allows an estimation of the electron density at each point in a map, using only information from other points in the process. The resulting estimates of electron density are shown to have errors that are nearly independent of the errors in the original map using model data and templates calculated at a resolution of 2.6 A Ê . Owing to this independence of errors, information from the new map can be combined in a simple fashion with information from the original map to create an improved map. An iterative phase-improvement process using this approach and other applications of the image-reconstruction method are described and applied to experimental data at resolutions ranging from 2.4 to 2.8 A Ê .

Introduction
Electron-density maps corresponding to macromolecules such as proteins have features that differ in fundamental ways from those found in maps calculated with random phases. These differences have been used in many ways, ranging from improving the accuracy of crystallographic phases to evaluating the quality of electron-density maps. For example, maps corresponding to proteins often have large regions of relatively featureless solvent and large regions containing of polypeptide chains, while a map calculated with random phases has similar¯uctuations in density everywhere (Bricogne, 1974). This observation is the basis of the powerful solvent-¯attening approach (Bricogne, 1974;Wang, 1985) as well as methods for evaluating the quality of macromolecular electron-density maps (e.g. Terwilliger & Berendzen, 1999). Similarly, the presence of non-crystallographic symmetry in macromolecular electron-density maps has been useful in phase improvement (Bricogne, 1974;Rossmann, 1972;Kleywegt & Read, 1997). Additionally, maps corresponding to macromolecules can be interpreted in terms of atomic models, providing a powerful basis for map-quality evaluation and improvement (Agarwal & Isaacs, 1977;Lunin & Urzhumtsev, 1984;Lamzin & Wilson, 1993;Perrakis et al., 1997Perrakis et al., , 1999Perrakis et al., , 2001Morris et al., 2002). On a statistical level, the density in the protein region of a macromolecular electron-density map has a distribution that is very different to that in a map calculated with random phases. This has been extensively used in histogram matching and related methods for phase improvement (Harrison, 1988;Lunin, 1988;Zhang & Main, 1990;Goldstein & Zhang, 1998;Nieh & Zhang, 1999;Cowtan, 1999).
In this work, the focus is on local patterns of density that are common in macromolecular protein structures. Macromolecules are built from small regular repeated units and the packing of these units is highly constrained owing to van der Waals interactions. Owing to the regularity of macromolecules on a local scale, their electron-density maps have local features that are distinctive and very different from those of maps calculated from random phases Main & Wilson, 2000;Wilson & Main, 2000;Colovos et al., 2000). This property has been used to evaluate the quality of electron-density maps and to improve phases at low resolution. , , Main & Wilson (2000) and Wilson & Main (2000) use histogram and wavelet analysis to improve electron density in low-resolution maps by requiring the wavelet coef®cients to be similar to those of model structures. Colovos et al. (2000) analyze the local features of high-and medium-resolution electron-density maps and compare them with those of model maps to evaluate the quality of the maps and suggest that their approaches may also be useful for phase improvement.
We recently developed a method for density modi®cation that consisted of the identi®cation of the locations of helical or other highly regular features in an electron-density map, followed by statistical density modi®cation using an idealized version of this density as the`expected' electron density nearby (Terwilliger, 2001). This method was shown to yield some phase improvement, but suffered the serious disadvantage that after an initial cycle the features that were initially identi®ed became greatly accentuated and few new features could be found. We suspect that this is a consequence of the inherent feedback in the method, where a feature in the original electron density that partially matches a helical template is restrained to look like this template, making it an even better match for the template in the next round (even if the true density in the region is not helical). We have therefore developed a very different approach to using the information inherent in local features of an electron-density map which does not have this feedback and which therefore might have substantially improved capability for phase improvement.
Here, we show that the local patterns of density surrounding any point in a map can be used to estimate the electron density at that point. This observation makes it possible to begin with an electron-density map with errors, to obtain a new estimate of the density at each point in the map without using the density at that point and thereby to construct a new estimate of electron density that has errors which are nearly uncorrelated with the errors in the original map. This recovered image' of the electron density has many uses, including phase improvement and evaluation of map quality.

Estimation of electron density from local patterns in a map
The central approach of this work is to use the density surrounding each point in a map to construct a new estimate of electron density at that point. There are three overall steps. The ®rst two create templates and evaluate statistics of these templates using data from experimental or model maps, with and without additional errors. The third applies these results to other maps. In the applications described here, we have used density-modi®ed experimental maps obtained from MAD or SAD data at a resolution of 2.6 A Ê to create the templates and histograms, but a similar procedure could be carried out using either experimental or model maps at any resolution. In the ®rst step, N templates of averaged density are created. These templates were based on the local density in a density-modi®ed experimental protein electron-density map and are grouped by correlation coef®cient. Secondly, the relationship between the density at point x and the template which has the highest correlation with the density near x is tabulated using additional density-modi®ed experimental electron-density maps. Finally, the method is applied to other experimental maps. The density near each point x in a map is used to construct a new estimate of the density at x. In this process, the local density is corrected in a way that removes the information about the density at x from all its neighbors.

Removal of information about density at x from local density
In our approach, the goal is to obtain an estimate of the value of the electron density at a point x in the unit cell in such a way that the new estimate has errors that are not correlated with the errors in the original electron-density map at x. To do this, the method uses information from the electron density at points surrounding the point x in obtaining a new estimate of the value of the electron density at x. One way to remove the information about the electron density at x would simply be to consider the electron density in a spherical shell around the point x. If the inner radius of the shell were large enough, then the values of electron density inside the shell would be relatively uncorrelated with the electron density at x. The choice of an inner radius, however, is not obvious because the electron-density map is a Fourier sum of terms with widely varying spatial frequencies. Consequently, there is signi®cant correlation between values of electron density at point x with points even as far away as the resolution of the map. Additionally, it is disadvantageous to exclude all density close to x in the calculations because the patterns to be considered are very local.
An alternative method is to create a local density function for points near x that has values that are similar to the electron density near x, but that are adjusted in such a way that the values are uncorrelated with the electron density at x. This modi®ed local density g x (Áx) will depend on the coordinate difference Áx between each point near x and x. The function g x (Áx) is a function of both x and Áx and therefore must be calculated separately for each point x and offset Áx in the map. We would like the value of the function g x (Áx) to be generally similar to the value of the electron density at x + Áx, which we will represent by &(x + Áx). As Áx is increased, we would like g x (Áx) to become very close to &(x + Áx). That is, we would like g x Áx 9 &x ÁxY 1 g x Áx 3 &x Áx for large ÁxX 2 We would also like the function g x (Áx) to be uncorrelated everywhere with the value of the electron density at x, given by &(x). The function g x (Áx) gives modi®ed values of the density at x + Áx. We would like to be able to say that if we compare the modi®ed density at x + Áx [given by g x (Áx)] with the density at x [given by &(x)], these quantities should be unrelated [that is, g x (Áx) does not contain information about the value of &(x)]. One way to specify this is to require that for any offset Áx, if we go through the entire map and calculate g x (Áx) for each point x, then g x (Áx) and &(x) are to be uncorrelated, A ®nal desirable property of g x (Áx) for the current purpose is to have its value at Áx = 0 be equal to the mean value of g x (Áx) for nearby points Áx. The reason this is desirable is that we would like to compare local patterns to a template based on the correlation of densities and have no contribution from the mean value of local density. Setting the value of g x (Áx) to any ®xed value (e.g. zero) at Áx = 0 would introduce a contribution that comes from the mean value of local density &(x) to the correlation between g x (Áx) and a template. A way to remove information about the mean value of local density is to specify the requirement that where all values of Áx in the region to be used later in calculations of correlations of densities are considered in the averaging. A function g x (Áx) that has all these properties is where the weighting function W(Áx) is given by and where the function U(Áx) is the normalized value of the Patterson function near the origin, calculated from the electron-density map itself using the relation UÁx h&x&x Áxi x ah& 2 xi x X 7 In essence, g x (Áx) is equal to the value of the electron density at x + Áx, after correction for the difference between &(x), the value of the electron density at x, and h&(x + Áx)i Áx , the mean of nearby values, all using the weighting function W(Áx). It can be veri®ed by substitution that both (3) or (4) are satis®ed by this function. Additionally, (1) and (2) are satis®ed because the normalized rotationally averaged Patterson function is normally quite small everywhere except near the origin and normally becomes very small for points far from the origin.

Local pattern identification
The ®rst step in the procedure for density modi®cation by pattern matching is to obtain templates that correspond to common patterns of local electron density. These templates are generated using the local electron density near each point x in density-modi®ed experimental electron-density maps, modi®ed to remove information from the central point x, as described in the previous section. The maps can be calculated at any resolution, but a set of templates is normally associated with a particular resolution (typically d min = 2.6 A Ê ). The approach used here to obtain templates is hierarchical. First, three separate sets of N max (typically 40) templates are generated using only points in an electron-density map that have low, medium or high electron density. A subset (typically 40) of these templates that have low mutual correlation is then selected. Finally, an even smaller subset of N ®nal (typically 20) templates is chosen from this group in order to maximize the predictive power of the templates while maintaining a ®xed number of total templates.
To generate a set of templates, each grid point in an electron-density map is considered, one at a time, only including points that are associated with either low where & and ' are the mean and standard deviation of the map, depending on the set of templates to be created. For each appropriate grid point (x), the modi®ed local electron density g x (Áx) is calculated for all neighboring points within a radius r max (typically, r max = 2 A Ê when d min = 2.6 A Ê ). This modi®ed electron density is compared with all existing templates using the correlation coef®cient of density in the template with the modi®ed local density as a measure of similarity. The grid used is normally the same grid as is used for all FFT, NCS-averaging and other density calculations and is typically between 1/6 and 1/4 of the resolution of the map. The number of points typically used in a template is approximately 100. For each existing template, N rot different rotations of the template are considered so as to attempt to match the modi®ed local density in any orientation and the highest correlation coef®cient of the match for all rotations of the template is noted. In the examples considered here, we use a total of N rot = 158 rotations to sample the possible three-dimensional rotations of an object with a rotation of about 50 relating neighboring orientations. If the correlation coef®cient of the local modi®ed electron density at this point x with an existing template k is greater than CC min (typically, CC min = 0.85), then the local modi®ed density at this point is included in the de®nition of template k by rotating the density to match the current template k and including the rotated local modi®ed density in the average density for this template. If the local modi®ed electron density does not have a correlation with any existing template greater than CC min , then the local modi®ed density is used to start a new template. Once N max templates have been created (typically, N max = 40), then the local modi®ed density at each subsequent point is included in whichever template it matches most closely.
By repeating the generation of templates using points in the electron-density map that have low, medium and high density, a relatively diverse set of templates is created. Next, a subset (typically 1/3) of these is chosen based on mutual correlation coef®cients in order to obtain a set of templates with the minimum possible similarity to each other. To do this, the correlation coef®cients of all pairs of templates are calculated and the template with the highest correlation to another template is eliminated. The process is repeated until the desired number of templates is obtained. The ®nal selection of templates based on predictive power is carried out after analyzing the statistics associated with each of the N max templates obtained at this stage, as described in a later section.

Statistics of local patterns: general approach
The second overall step in this process is to identify the relationships between the correlation of each template with local modi®ed density in a map and the value of the electron density at x. This is peformed for experimental maps both with and without added errors. There are many possible ways to describe these relationships, but a simple approach used here is to break it down into two parts.
The ®rst part consists of an examination of the statistics of high-quality experimental maps. We have found that the electron density at a point x in a map is quite strongly dependent on the two templates k and l that have the highest (k) and next-highest (l) correlation coef®cients with the local modi®ed density at x. That is, for electron-density maps of proteins, the probability distribution p(&|k, l) can be very informative about the electron density & at x.
The second part is to consider the relationship between maps with and without added errors. The approach is to begin with the observed correlation coef®cients of all the templates at a point x to a map that contains errors and then to use these in a calculation of the probability that a particular pair of templates k and l would have the highest two correlation coef®cients in the corresponding high-quality map. In this case, the statistics of density for the high-quality maps p(&|k, l) obtained above can then be applied.
To carry this process out, a second set of probabilities are needed. These are the probabilities p(CC k |CC obs,k ) that the correlation coef®cient for template k to a point x in a highquality map would have the value CC k , given the observation that this template has a correlation coef®cient of CC obs,k to the same point in a map with additional errors. To account for differing levels of error in the experimental map, these probabilities are tabulated as a function of the overall ®gure of merit of the map with errors.
To apply these probability distributions to data near the point x in a new (`observed') electron-density map, the correlation coef®cient of each template k to the local modi®ed density near x is ®rst determined (once again, after trying many rotations and choosing the one for each template that maximizes the correlation coef®cient). This set of correlation coef®cients {CC obs } and the two probability distributions p(&|k, l) and p(CC k CC obs,k ) can then be combined as follows to obtain an estimate of the electron density & at x in a highquality version of the same map.
If we somehow knew which two templates k and l have the highest correlation coef®cients to the local modi®ed density near x in a high-quality version of the new`observed' map, then we could use our probability distribution p(&|k, l) directly to estimate the probability distribution for &. We do not know the identity of k and l, but suppose instead that we had probabilities, p(k, l|{CC obs }), for each possible pair k and l based on the correlation coef®cients observed for thè observed' map. Combining these, we could write that p&jfCC obs g p&jkY lpkY ljfCC obs gY 8 where the sum is over all possible pairs of templates k and l. An estimate of the electron density at x can then be obtained from the weighted mean The probability, p(k, l|{CC obs }), that the pair k and l have the highest correlation coef®cients to the local modi®ed density near x in a high-quality version of the`observed' map can in turn be estimated from the observed correlation coef®cients of all the templates to this map, {CC obs }, in several steps. We separate the probability into two parts, one for the probability that template k has the highest correlation and one for the probability that template l has the next highest, given that template k has the highest correlation, pkY ljfCC obs g pljkY fCC obs gpkjfCC obs gX 10 We can now estimate the probability that template k has the highest correlation with the (non-existent) high-quality version of the`observed' map. We will integrate over all possible values of CC k , the correlation of template k with the high-quality map. For each value of CC k , we will calculate the probability that this is indeed the value of the correlation of template k, given by p(CC k ) = p(CC k |CC obs,k ), and the probability that all other templates have a correlation coef®cient less than CC k , where the integral is over all values of CC k . The probability that template l has the next-highest correlation is given by pljfkY CC obs g pCC l jT kYl pCC j`C C l dCC l X 12 2.5. Statistics of local patterns: tabulating histograms An important part of this step consists of generating histograms of values for the electron density at x as a function of the correlation coef®cients of the N max templates with the local modi®ed density at x. Each of the N max templates is compared with the modi®ed local density at all points in a set of high-quality maps. At each point x, the two templates k and l that have the highest and next-highest correlation coef®cients, respectively, with the local modi®ed density at x are identi®ed (after rotation to maximize this value). The value of the (unmodi®ed) electron density &(x) is then tabulated as a function of k and l. These histograms are then normalized to yield an estimate of the probability distribution, p(&|k, l).
The second part of this step is to obtain probability distributions, p(CC k |CC obs,k ), relating the correlation coef®cient value, CC obs,k , observed for a particular template at a point x in a map that contains added errors to the correlation coef®cient, CC k , that would be observed for the identical template at the identical point x in the corresponding map without any added errors. These probability distributions are calculated by using paired sets of high-quality experimental maps with and without added errors. At each point in a map, the correlation coef®cient of each template k to the map without added errors, CC k , and the correlation to the map with added errors, CC obs,k , are noted. Normalization of the resulting histograms leads to an estimate of the probability, p(CC k |CC obs,k ), that CC k is the correlation to the map without added errors if the value CC obs,k is observed in the map with added errors. This calculation is repeated for maps with varying levels of additional errors by creating simulated phase sets with Gaussian distributions of phase errors with varying overall values of the cosine of phase error, hcosÁ9i, ranging typically from 0.5 to 0.8. In application to new`observed' map, the probability distribution obtained using data with added phase errors with a mean cosine hcosÁ9i similar to the ®gure of merit of the experimental map is used.

Selection of templates based on predictive power
The ®nal selection of N ®nal templates is based on predictive power. A subset of N ®nal templates is selected from the N max templates obtained earlier using high-quality electron-density maps. The subset is selected to maximize the correlation between the electron density calculated using (9) and the electron density in the maps. The histograms that form the basis of (9) are calculated from experimental density for one set of proteins and the correlation is calculated for another. The pair of templates that yields the highest correlation is ®rst identi®ed. Then, one by one, the template that increases this correlation by the largest amount is added to the group, until N ®nal templates are chosen.

Indexing the rotations for each template to reduce computational requirements
The slowest step in applying the procedures described here consists of calculating the maximum correlation of local modi®ed density with each of the N ®nal templates, considering as many as 158 rotations of each template (or local density) for each point. We have developed a simple indexing system that reduces the number of rotations that need to be considered for each template. The index for a point x is based on the density at M points near x (typically, M = 9 and the points are chosen to be approximately uniformly distributed on a sphere of radius 0.9r max centered at x). Point m is given an local index i m from 0 to 3, based on the local density at that point where ' is the r.m.s. of the entire map. An overall index I is then calculated for the local density from the relation where the sum is over the M nearby points. Next, the relationship between the index I and the best rotation is tabulated for each of the templates using high-quality experimental maps containing added errors. For each point in each map used above to calculate statistics of the correlation of templates with local modi®ed density, the index I is calculated and the optimal rotation is noted for each template. An indexing table is then constructed in which each index I is associated with a list of preferred rotations for each template. The table is constructed so that about 95% of the time the optimal rotation for a given template is contained in the list. This indexing procedure reduces the number of rotations that need to be considered by about a factor of ®ve. Other indexing methods could be applied that might further reduce the number of rotations to be considered (e.g. Funkhouser et al., 2003).

Using local patterns to create a new estimate of electron density
The pattern of density near a point x in an electron-density map can be analyzed using (8) to produce a probability distribution, p(&|{CC obs }), for the electron density at x. The estimate from (9) of density at x, & est (and the uncertainty in this estimate, ' est , if desired), can then be used to construct a new estimate of the electron density in the map. This`recovered image' of the electron-density map can be visualized with or without smoothing, can be used as a target for statistical density modi®cation  or can be combined directly with the original electron-density map to obtain an improved map.
We have used an iterative procedure to combine the information from the recovered image with the information present in an experimental electron density (Fig. 1). In the ®rst cycle, the starting phase probabilities are experimental values and in all cycles the amplitudes are experimental values. In each cycle, the starting phases and amplitudes are subjected to density modi®cation (e.g. statistical density modi®cation or other related methods) to obtain the best possible electrondensity map without using any pattern-based information. This density-modi®ed map is then analyzed for local patterns and an image of the map is recovered. Thirdly, the density in the recovered image is used all by itself to estimate phase probabilities. This third step is carried out here using statistical density modi®cation  as described below, but could be performed using ' A -based methods (Read, 1986). Finally, the phase probabilities from the recovered image are combined with the original experimental phase probabilities to yield the starting phase probabilities for the next cycle. The process is iterated until changes in the density-modi®ed map from cycle to cycle are small (typically one to ®ve cycles). The density-modi®ed map from the ®nal cycle is then suitable for interpretation.
2.9. Using statistical density modification to estimate phases based on a target electron-density function Statistical density modi®cation  is a procedure for calculating crystallographic phase probabilities based on the agreement of the map resulting from these phases with prior expectations. Any set of prior expectations about the map can be included in this procedure. In particular, if an estimate of electron density is available for all points in the map (e.g. the recovered image obtained in the procedure described above), then this estimate can be used as prior information about the map. In this procedure, observed values of the amplitudes of structure factors are used and an estimate of uncertainty in the electron density is required. This procedure is used to estimate phase probabilities from a recovered image, where the expected electron density is simply the best estimate from (9) and the uncertainty is taken to be a constant everywhere given by the r.m.s. of a map calculated with the observed structure-factor amplitudes.

Results and discussion
3.1. Removing information about electron density at x from the local electron density An important aspect of the pattern-matching density-modi®cation method presented here is that it is designed to yield an estimate of the electron density that has errors uncorrelated with the errors in the original map. This is accomplished by using only information from the region around a point x to estimate the density at x and not including any information about the density at x in the process, as described in x2. Fig. 2 illustrates this process of removing information about electron density at x. Fig. 2(a) shows a section of a density-modi®ed MAD electron-density map for initiation factor 5A (IF5A; Peat et al., 1998) in the region near a particular point x (the point x is designated by a star at the center of the ®gure). Note that the density at x is positive in this case. In Fig. 2(b), the density is adjusted to remove the information about the density at x from x and from all neighboring points. This calculation essentially consists of subtracting the origin of a normalized Patterson function corresponding to this map, multiplied by the value of the density at x minus the mean local density, from all neighboring points, as described in x2. This calculation has the effect of setting the value of the density at x to the mean density in the local region, setting the density very near x to intermediate values and leaving the value of points far from x unchanged.

Common local patterns in protein electron-density maps
The analysis of local patterns in electron-density maps was carried out using the density-modi®ed MAD electron-density map from IF5A, calculated at a resolution of 2.6 A Ê (PDB code 1bkb; Berman et al., 2000;Peat et al., 1998). This was a very Acta Cryst.   clear map with a correlation coef®cient to the map calculated from the ®nal re®ned model of IF5A of 0.82. Local patterns were analyzed for regions centered on each point in this grid, only considering points within 2.5 A Ê of an atom in the model. Local patterns were identi®ed as described in x2 using the modi®ed local density surrounding each point. This approach removes information about the density at x from the nearby density. The patterns are selected after considering rotations about the central point, so any rotational differences between templates are not signi®cant in determining their features.
The ®nal templates were chosen on the basis of their predictive power. The N max = 40 templates that were initially created using the model electron-density map for IF5A were then compared with all points in two other density-modi®ed experimental electron-density maps, the armadillo repeat of -catenin (Huber et al., 1997) and red¯uorescent protein (Yarbrough et al., 2001), and correlation coef®cients for each template at each point were obtained. The same 40 templates were then compared in the same way with the IF5A map. Finally, subsets of the 40 templates were considered. For each subset of templates, the -catenin and red¯uorescent protein electron-density maps were used to generate histograms and the IF5A map was used to compare the estimates of electron density obtained using (9) with IF5A electron density. In the ®rst cycle of identifying templates, all pairs of templates were considered and the pair yielding the highest correlation was chosen. In subsequent cycles, the additional template that yielded the greatest improvement in correlation was chosen. Fig. 3(a) (open circles) shows the correlation of estimated and model density as a function of the number of templates used. Much of the information is contained in just two templates and almost all the rest is in the ®rst 20. Based on this observation, we have used 20 templates for the remainder of this work.
The fundamental property of macromolecular electrondensity maps that is used in our approach is that different local patterns of density in these maps are associated with different values of the density at their central point. The open circles in Fig. 3(a) show that such an association exists and that only a small number of templates are needed to describe it. We next tested whether a similar association exists for random maps. The closed triangles in Fig. 3(a) were obtained in the same way the open circles, except that all the maps were calculated after randomizing all the crystallographic phases. The closed triangles in Fig. 3(a) show that there is essentially no association between local patterns of density and density at their central points for the random maps. This means that the correlations between patterns and densities at their central points is a feature of protein-like maps and not a feature of maps with random phases.
An important part of the present approach was the removal of information about the density at a point x in the analysis of the patterns surrounding x using (5). The reason for doing this was to obtain an estimate of the density at point x that is independent of the current value of density at that point. Fig. 3(b) shows that this choice of methods is also important for discriminating between patterns that arise from noise and those that arise from protein-like features. Fig. 3(b) was calculated in exactly the same way as Fig. 3(a), except that the local density was not adjusted to remove information about the value of the density at the central point and a completely new set of templates and statistics was used, re¯ecting this different approach. This was accomplished by not applying (5) to the local density. The open circles in Fig. 3(b) show that if the local density is not adjusted to remove information about the central point, then templates can be obtained that give a very high correlation between the value of the density calculated from (9) and the actual density. However, this correlation is likely to be almost entirely due to the fact that information about the central point is included in both the  For the open circles, the templates were derived from the IF5A map, the histograms from -catenin and red uorescent protein maps and the model density and recovered density were from the IF5A map. For the closed triangles, phases were randomized for all three maps before carrying out the calculations. (b) As in (a), except that the local density was not adjusted to remove information about the density at the central point, so that g x (Áx) = &(x + Áx).
templates and the correlations. Supporting this interpretation, the closed triangles in Fig. 3(b) show that randomized maps give essentially the same correlations as protein electron-density maps when the information about the central point is not removed from the calculations.
Figs. 4(a) and 4(b) show contours of positive density corresponding to the N max = 20 templates obtained. The templates are arranged in order of decreasing contribution to the estimates of density. The patterns are very simple, typically containing one to three spherical or extended regions of positive density and one or more rings or regions of negative density in various relations to the central point. Some of the pairs of templates are similar (for example, Nos. 17 and 18) and, as shown in Fig. 3, the number could be reduced further with just a small reduction in predictive power. The patterns found in some of the templates are related in a simple way to atomic coordinates in the structures used to generate the templates. For example, Fig. 2 shows the density surrounding a point located near a C atom, the junction of three chains of atoms. This density, after removing the information about the density right at this point, is most closely similar to pattern No. 12 in Fig. 3, which consists of a curved lobe of density adjacent to the origin.
The core of the method described here is the association of different templates with different expected values of electron density at the point that is at the center of the templates. The electron density near a point x in a map is compared with the 20 templates and the two templates that match the density most closely Templates of local density calculated at a resolution of 2.6 A Ê . The templates are arranged in order of decreasing contribution to the information about the density at the central point. The sections shown are 8 Â 8 A Ê ; only the spherical region 4 A Ê in diameter at the center of each ®gure is used in the pattern-matching process. Contours at +1.5' (a) and À1.5' (b, templates in the same orientation as in a) are shown. are identi®ed. The procedure is ®rst performed with highquality experimental maps to associate pairs of templates with expected density and then with an observed map to estimate the values of electron density in a high-quality version of the observed map. In order to use as much information as possible, the process is carried out in a probabilistic fashion, considering the possibility that any pair of patterns might best match the density in a high-quality version of the observed map.
The 20 patterns are each associated with different average values of density at their central points. For example, template No. 1 contains two spherical regions of positive density situated on opposite sides of the origin. At locations where this pattern is the one that best matches the density in model maps, the mean density at the central point is about À0.3 AE 0.6 (on an arbitrary scale with the mean of the map equal to zero). Template No. 12 contains a curved lobe of positive density immediately adjacent to the origin. Template No. 12 is associated with mean density of about 0.6 AE 0.9. Table 1 lists the density associated with locations where each of the 20 templates best match the local modi®ed density in model maps.

Reconstructing model electron density using correlations with local patterns
The templates shown in Fig. 4 and the density typically associated with them listed in Table 1 can be used to reconstruct an image of an electrondensity map. Fig. 5 shows an example using model data so that errors can be readily analyzed. Fig. 5(a) shows a section of model electron density with errors calculated using the structure of gene 5 protein (PDB code 1vqb; Skinner et al., 1994) at a resolution of 2.6 A Ê . The errors in the phases were adjusted so that the map had a correlation coef®cient to the perfect map of 0.81. The estimated electron density reconstructed from this map is shown in Fig. 5(b) and a version of this density, smoothed with a radius of 1.5 A Ê , is shown in Fig. 5(c). Finally, phases were estimated using statistical density modi®cation based on the model structure-factor amplitudes from the reconstructed density (Fig. 5d). The reconstructed density has a correlation coef®cient to the original (model) map of 0.19, the smoothed image has a correlation of 0.38 and the map calculated with phases obtained from the reconstructed density and model amplitudes has a correlation coef®cient of 0.46. Table 1 Templates of local electron density calculated at a resolution of 2.6 A Ê .  As model data were used to obtain the images in Fig. 5, it is possible to analyze the errors in the recovered image and determine whether they are in fact independent of the errors in the original map. The errors in electron-density maps are somewhat complicated as they come from errors in phase angles. A simpli®ed error model in which the values of the electron density in two maps y 1 (x) and y 2 (x) have correlated errors is assumed for the present analysis. For convenience, in this analysis the maps y 1 (x), y 2 (x) are each normalized to an r.m.s. value of unity and a mean of zero. In this error model, each map has a component that is related to t(x), the true density in a perfect map (also normalized in the same way), each map has a component c(x) that is an error term unrelated to t(x) but that is the same in the two maps and each map has an independent error term e 1 (x) and e 2 (x). As this is model data, we know the values of t(x) as well as the values of y 1 (x) and y 2 (x), y 1 x 1 tx cx e 1 xY 13 y 2 x 2 tx cx e 2 xX 14 In this model case, the coef®cients 1 and 1 can be estimated from the known maps t(x), y 1 (x) and y 2 (x), 1 9 hy 1 xtxiY 15 2 9 hy 2 xtxiX 16 We can then estimate the correlation of errors CC errors with the relation CC errors 9 hy 1 x À 1 txy 2 x À 2 txi fhy 1 x À 1 tx 2 ihy 2 x À 2 tx 2 ig 1a2 X 17 Using (17), we ®nd that the correlation coef®cient of the errors in the starting map with errors with the errors in the recovered map in Fig. 5(b) is À0.01. The same calculation for the recovered smoothed map in Fig. 5(c) leads to a correlation coef®cient of the errors of À0.02. Similarly, the calculation for the map in Fig. 5(d) obtained using phases calculated from the recovered image and model amplitudes lead to a correlation of errors of À0.04. This indicates that the errors in the recovered image are not correlated with the errors in the original map.
We have found that the independence of errors is not as perfect when density-modi®ed phases are used. To examine this, we started with model phases and amplitudes, introduced errors into the phases, leading to an electron-density map with a correlation to the perfect map of 0.6, and then carried out statistical density modi®cation on this map (not including any local pattern information), leading to a density-modi®ed map with a correlation to the perfect map of 0.83. This density-modi®ed map was then analyzed for local patterns as described above. In this case the smoothed recovered image had a correlation to the perfect map of 0.50. The correlation of errors with the density-modi®ed map was 0.21, considerably higher than in the case where the map used for pattern identi®cation had completely random errors. This suggests that the method might not be quite as effective when used on density-modi®ed maps as on experimental maps.

Reconstructing electron density from density-modified experimental maps using correlations with local patterns
The analysis described above was carried out with electron density calculated from models so that the error analysis could be performed in detail. We next applied the method to electron density obtained from a MAD experiment so that its utility with real data could be examined. The electron density obtained after applying statistical density modi®cation  to three-wavelength MAD data from gene 5 protein (PDB code 1vqb; Terwilliger Density modification using pattern matching 1697 research papers Figure 6 Template-matching using gene 5 protein MAD data. As in Fig. 5, but using experimental MAD data instead of model data. Skinner et al., 1994) was used as the starting point for this analysis. This RESOLVE electron-density map had a correlation coef®cient of 0.79 to the model density calculated from PDB entry 1vqb. Fig. 6(a) shows a section through this density-modi®ed map. Local pattern analysis was applied to this map as described above. Fig. 6(b) shows the image that was recovered from this map, Fig. 6(c) shows a smoothed version of this image and Fig. 6(d) shows the map obtained using phases calculated from the recovered image and observed structure-factor amplitudes. The recovered image in Fig. 6(b) has a correlation of 0.25, the smoothed recovered image in Fig. 6(c) has a correlation of 0.42 and the map calculated using phases from the recovered image in Fig. 6(d) has a correlation of 0.52.
An approximate version of the error analysis described in the previous section for Fig. 4 was carried out for the maps in Fig. 6. In this analysis, the`true' density was taken to be the density calculated from the model of gene 5 protein (PDB code 1vqb). The correlation of errors between the starting RESOLVE map in Fig. 6(a) with the errors in the recovered image in Fig. 6(b) was 0.15 and the correlation of errors between the starting RESOLVE map with the errors in the smoothed recovered image in Fig. 6(c) was 0.23. The correlation of errors in the map calculated using phases from the recovered image in Fig. 6(d) with the errors in the starting RESOLVE map was 0.36. This means that the errors are not highly correlated in this analysis, but that they are also not completely independent. Part of the correlation of`errors' could be because of the fact that the`true' density is not known and the errors are estimated using model density for gene 5 protein. Consequently, any errors in this model density would lead to correlation of`errors' in all the maps in this analysis.
3.5. Combination of phase information from local pattern identification with experimental phase information Fig. 6(d) shows an electron-density map calculated using observed structure-factor amplitudes for gene 5 protein and phase probabilities obtained using statistical density modi®cation on the reconstructed image in Fig. 6(b). These phase probabilities were then combined with the original phase probabilities from the three-wavelength MAD experiment to yield a set of phase probabilities and a new electron-density map. The original SOLVE electron-density map (Terwilliger & Berendzen, 1999) using experimental phases is shown in Fig.  7(a). This map has a correlation with the model gene 5 protein map of 0.56. The electron-density map calculated from combined phases is shown in Fig. 7(b). This new electrondensity map has a correlation to the model map of 0.65. Finally, the combined phases and the experimental structurefactor amplitudes were used in statistical density modi®cation using the same parameters as those used to obtain the original RESOLVE phase probabilities. The resulting map is shown in Fig. 7(c); it is very similar to the original RESOLVE map shown in Fig. 5(a), but is slightly improved, with a correlation to the model gene 5 protein map of 0.82 (compared with 0.79 for the original RESOLVE map).
A key element of the process used here is to remove information about the density at each point x from the analysis of patterns of density around of x. We tested the importance of this step by repeating the entire process of generating templates and histograms and then applying them to the gene 5 protein MAD data, but without removing this information. In this case, the recovered image had a higher correlation with the model map than in the test case described above (0.55 compared with 0.25) and the smoothed recovered image had a correlation of 0.59, compared with 0.42. On the other hand, the correlation of errors between the recovered image and the starting RESOLVE map was also much higher (0.68 compared with 0.15), as was the correlation of errors between the smoothed recovered image and the starting RESOLVE map (0.85 compared with 0.23). Finally, the Phase improvement using template matching on gene 5 protein MAD data. (a) SOLVE electron-density map for gene 5 protein. (b) Electron-density map calculated using observed structure-factor amplitudes and combined phases. The combined phases consisted of the SOLVE phase estimates combined with the phases estimated using statistical density modi®cation based on the reconstructed density shown in Fig. 6(b). (c) RESOLVE electron-density map after one cycle of statistical density modi®cation starting with the map shown in (b). All contours are at 0.8'.
resulting combined phases were used as a starting point for density modi®cation, but in this case no improvement in the ®nal map was obtained (correlation coef®cient with the model map of 0.79 in both cases), supporting the idea that this step is an important element in the process.
3.6. Iterative local pattern identification and density modification Fig. 1 illustrated an iterative process for phase improvement based on the local pattern identi®cation described here. In this process, the pattern-identi®cation step is always carried out on the best available map and then the resulting phase information is combined with experimental phase information to yield an improved starting point for density modi®cation. The ®rst cycle in this iterative process for phase improvement is identical to the process described above. Subsequent cycles simply iterate the process. Phase improvement using template matching on nusA SAD data. (a) RESOLVE electron-density map for nusA protein calculated without pattern matching. (b), (c) and (d) Electron-density maps after one, three and ®ve cycles of density modi®cation including pattern matching, respectively. All contours are at 1.5'. Table 2 Application of iterative statistical density modi®cation with local pattern recognition.
For each experimental data set, density modi®cation was carried out using default inputs for RESOLVE  and phase probabilities calculated using SOLVE (Terwilliger & Berendzen, 1999). The process shown in Fig. 1 was then carried out, including the identi®cation and use of local patterns of density. Non-crystallographic symmetry was not included in any density-modi®cation procedures in these tests. The correlation coef®cient of the resulting electron-density maps to those calculated with phases obtained from the re®ned models of each structure are listed. Additionally, the number of residues that could be automatically modeled and assigned to sequence and the number that could be modeled (whether or not assigned to sequence) with RESOLVE (Terwilliger, 2003a,b) using default parameters are listed. As the number of residues obtained with automated model building is somewhat sensitive to the parameters and details of the methods used, models were built with versions 2.02, 2.03, 2.04 and 2.05 of RESOLVE and the average numbers of residues built are reported. electron-density map obtained without using local pattern matching. Figs. 8(a), 8(b) and 8(c) show the density-modi®ed map after one, three and ®ve cycles using local pattern matching. The correlation coef®cient of the starting RESOLVE electron-density map with a map calculated from the re®ned model of nusA is 0.65; the map after ®ve cycles has a correlation of 0.85. Table 2 summarizes the results of applying this process to experimental data from crystals of several different proteins. The greatest improvement in map quality was obtained for cases where the original RESOLVE map had a correlation with the model map of less than 0.7, with smaller improvements obtained when the RESOLVE map was better than this. To provide a rough measure of the utility of the method, the automatic model-building capability of RESOLVE was applied to the maps obtained for each structure with and without information from local patterns ( Table 2). The percentage of main-chain residues built was essentially the same with and without information from local patterns for all the structures except nusA, which increased from 49 to 56% with the use of local patterns. On the other hand, the percentage of residues assigned to sequence and side chains built increased, on average, from 11 to 24% for those structures where the map correlation was considerably improved (UTP-synthase, nusA, NDP-kinase). This indicates that the map improvement can be enough to make a signi®cant difference in the ability of automated procedures to build a complete atomic model.

Structure
Although the templates used in this procedure were calculated using data to 2.6 A Ê , the procedure is not strongly dependent on resolution. Using the nusA data as a test case, the effect of resolution was examined by truncating the analysis at resolutions of 2.4 (all data), 2.6, 2.8 and 3.0 A Ê , respectively. The correlation of the original RESOLVE maps at each of these resolutions with the model maps calculated at the same resolutions were similar (0.65, 0.66, 0.69 and 0.69, respectively), as were the correlations of the ®nal maps density modi®ed including the local pattern information (0.85, 0.85, 0.85 and 0.86, respectively).

Prospects
We have shown here that local features of electron-density maps can be used as an important source of information in a density-modi®cation procedure. The improvements in map quality obtained using the information from local patterns range from none (0.87 to 0.87 for -catenin) to small (from 0.79 to 0.82 in correlation coef®cient for gene 5 protein) to very substantial (from 0.65 to 0.85 in correlation coef®cient for nusA).
The computational requirements of the methods are moderate. Carrying out a complete set of ®ve cycles of pattern identi®cation and density modi®cation using local patterns takes 90 min on a Compaq 833 Mhz Alpha for the`hypothetical' protein from P. aerophilum listed in Table 2 (494 amino acids); standard density modi®cation without using local pattern information takes about 5 min. Memory requirements are moderate as well: the libraries of patterns and indexing tables are large and (along with other parts of the software) require approximately 700 MB of swap space or more.
There are many additional applications of the procedures that we have developed here. A key aspect of the methods is that the image that is recovered from an electron-density map has errors that are relatively uncorrelated with those in the original map. This allows the use of the recovered image in phase improvement in the moderate-resolution range demonstrated here. It is also possible that the same approaches could be used for low-resolution as well as very high resolution phasing and phase extension. Additionally, the independence of errors means that an image recovered from a random map will have little or no correlation to the original map, while an image recovered from a map that has proteinlike features will have a correlation. Consequently, the method could be used to evaluate the quality of protein electrondensity maps. Similarly, points that are in the solvent region of a crystal will have local features unlike those found in the protein region and the methods described here could be used to distinguish the protein from solvent regions.
A weakness of the pattern-matching approach developed here is that it cannot readily distinguish protein-like features that are the result of systematic bias or errors in a map from those that actually re¯ect protein structure. This may be re¯ected in the small but signi®cant correlation of errors between the density-modi®ed model gene 5 protein map and its recovered image described above. Perhaps more importantly, it means that the method in its present form is not as well suited to improving maps that contain signi®cant bias towards protein-like patterns of density, such as those obtained using phases from an atomic model, as it is to improving maps in which the errors are essentially random, such as those obtained by experiment.
A useful extension of the methods described here will be to recalculate the templates and histograms using data in various resolution ranges and using various radii for the regions considered in obtaining templates and to apply the appropriate set to experimental data. The effects of the grid spacing used in calculations could also be investigated. The use of correlations to more than two templates could be used in (8) in estimates of local density (although our preliminary investigations indicated that using a third template added very little information to the calculation). In each of the cases described here, the templates and histograms were obtained from model maps calculated at a resolution of 2.6 A Ê . The use of templates at varying resolutions could potentially increase the applicability of the method to a wider resolution range. Other extensions include examining the patterns in different classes of protein structures and in crystals that contain other structures such as nucleic acids or various ligands.