- 1. Introduction
- 2. Maximum-likelihood density modification
- 3. Map-likelihood phasing
- 4. Bias and map-likelihood phases
- 5. Prime-and-switch phasing to remove model bias
- 6. Convergence of map-likelihood phasing
- 7. The bias ratio αB
- 8. Examples of map-likelihood phasing
- 9. Applications of map-likelihood and prime-and-switch phasing
- References
- 1. Introduction
- 2. Maximum-likelihood density modification
- 3. Map-likelihood phasing
- 4. Bias and map-likelihood phases
- 5. Prime-and-switch phasing to remove model bias
- 6. Convergence of map-likelihood phasing
- 7. The bias ratio αB
- 8. Examples of map-likelihood phasing
- 9. Applications of map-likelihood and prime-and-switch phasing
- References
research papers
Map-likelihood phasing
aBioscience Division, Mail Stop M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
*Correspondence e-mail: terwilliger@lanl.gov
The recently developed technique of ), Acta Cryst. D56, 965–972] allows a calculation of phase probabilities based on the likelihood of the electron-density map to be carried out separately from the calculation of any prior phase probabilities. Here, it is shown that phase-probability distributions calculated from the map-likelihood function alone can be highly accurate and that they show minimal bias towards the phases used to initiate the calculation. Map-likelihood phase probabilities depend upon expected characteristics of the electron-density map, such as a defined solvent region and expected electron-density distributions within the solvent region and the region occupied by a macromolecule. In the simplest case, map-likelihood phase-probability distributions are largely based on the flatness of the solvent region. Though map-likelihood phases can be calculated without prior phase information, they are greatly enhanced by high-quality starting phases. This leads to the technique of prime-and-switch phasing for removing model bias. In prime-and-switch phasing, biased phases such as those from a model are used to prime or initiate map-likelihood phasing, then final phases are obtained from map-likelihood phasing alone. Map-likelihood phasing can be applied in cases with solvent content as low as 30%. Potential applications of map-likelihood phasing include unbiased phase calculation from molecular-replacement models, iterative model building, unbiased electron-density maps for cases where 2Fo − Fc or σA-weighted maps would currently be used, structure validation and ab initio phase determination from solvent masks, or other knowledge about expected electron density.
density modification [Terwilliger (2000Keywords: map-likelihood phasing.
1. Introduction
Density-modification techniques are a firmly established and important tool for macromolecular ; Béran & Szöke, 1995; Bricogne, 1984, 1988; Cowtan & Main, 1993, 1996; Giacovazzo & Siliqi, 1997; Goldstein & Zhang, 1998; Gu et al., 1997; Lunin, 1993; Perrakis et al., 1997; Podjarny et al., 1987; Prince et al., 1988; Refaat et al., 1996; Roberts & Brünger, 1995; Rossmann & Arnold, 1993; Szöke, 1993; Szöke et al., 1997; Terwilliger, 2000; van der Plas & Millane, 2000; Vellieux et al., 1995; Wang, 1985; Wilson & Agard, 1993; Xiang et al., 1993; Zhang & Main, 1990; Zhang, 1993; Zhang et al., 1997). The central basis of these approaches is that prior knowledge about expected values of the electron density in part or all of the can be a very strong constraint on the crystallographic structure factors. For example, prior knowledge about electron density often consists of the identification of a region where the electron density is flat owing to the presence of disordered solvent (Wang, 1985). Real-space information of this kind has generally been used to improve the quality of crystallographic phases obtained by other means, such as multiple or multiwavelength experiments, but phase information from such real-space constraints can sometimes be so powerful as to be useful in ab initio phase determination (Béran & Szöke, 1995; van der Plas & Millane, 2000; Wang et al., 1998; Roversi et al., 2000).
These methods include such powerful approaches as solvent flattening, averaging, histogram matching, phase extension, maximization and iterative model building (Abrahams, 19962. density modification
We recently developed , 2000). This separation of phasing information allowed a statistical formulation for density modification that was very straightforward and avoided major existing difficulties with density modification. In density modification, the total likelihood of a set of structure factors {Fh} is defined in terms of three quantities: (i) any prior knowledge from other sources about these structure factors, (ii) the likelihood of measuring the observed set of structure factors {} if this set of structure factors were correct and (iii) the likelihood that the map resulting from this set of structure factors {Fh} is consistent with our prior knowledge about this and other macromolecular structures. This can be written as
density modification, a method for carrying out density modification in which the phasing information coming from various sources is explicitly kept separate (Terwilliger, 1999where LL({Fh}) is the log-likelihood of a possible set of crystallographic structure factors Fh, LLo({Fh}) is the log-likelihood of these structure factors based on any information that is known in advance, such as the distribution of intensities of structure factors (Wilson, 1949), LLOBS({Fh}) is the log-likelihood of these phases given the experimental data alone and LLMAP({Fh}) is the log-likelihood of the electron-density map resulting from these phases. In this formulation, density modification consists of maximizing the total likelihood given by (1).
We showed previously that the total likelihood in (1) could be maximized efficiently by an iterative procedure in which a probability distribution for each phase is calculated independently of those for all other phases in each cycle (Terwilliger, 1999, 2000). In one cycle of optimization, an is calculated using current estimates of the structure factors. Each is then considered separately from the others and a phase-probability distribution for that is calculated from the variation of the total likelihood in (1) with the phase (or phase and amplitude) of that structure factor.
3. Map-likelihood phasing
In previous applications of the , 2000). In principle, however, experimentally derived or other prior phase information need not necessarily be included in the density-modification procedure. Instead, phase information could be derived from the agreement of the map with expectations alone.
density-modification approach, phase information was derived from a combination of experimental probabilities and from the characteristics of the map (Terwilliger, 1999The overall procedure for one cycle of map-likelihood phasing has five basic steps which are based on the methods used for ). First, a starting set of phases is used to calculate a figure-of-merit-weighted electron-density map. This map is important because a comparison of this map with expected electron-density distributions in the will form the basis for the determination of phase probabilities. Next, the expectations about the electron-density distributions in this map are evaluated. As described in more detail below, this can consist of probability distributions for electron density in the protein and solvent regions along with probability estimates of whether each point in the map is within the protein or solvent region, for example. These probability distributions are crucial for defining the prior expectations about the electron-density map and therefore the log-likelihood of the map. Third, the log-likelihood of this map and the first and second derivatives of this log-likelihood with respect to electron density at each point in the map are calculated. These derivatives will be used to predict how the log-likelihood of the map will change as the electron density in the map is changed. Fourth, using the chain rule and an FFT-based algorithm, the first and second derivatives of the log-likelihood of the map with respect to structure factors are calculated. Fifth, for each reflection k the variation of the log-likelihood of the map with the phase (or phase and amplitude) of the reflection is estimated from these derivatives. This is the key step in map-likelihood phasing. Through the use of the derivatives of the log-likelihood of the map with respect to the k, map-likelihood phasing allows relative probabilities to be assigned for each possible value of the phase of reflection k. These phase probabilities are used to estimate a new estimate of the k.
density modification (Terwilliger, 2000In this calculation of the phase-probability distribution for reflection k, ordinarily the measured amplitude is kept fixed and the allowed phases for this reflection are sampled at regular intervals (typically increments of 5° for acentric reflections). The log-likelihood of the map is approximated in terms of a Taylor's series based on the derivatives with respect to structure factors as described previously (Terwilliger, 2000), with the addition of a cross-term in the Taylor's series as suggested by Cowtan (2000). To the extent that this approximation is accurate (that is, that higher order terms do not contribute substantially), this phase-probability calculation estimates how the log-likelihood of the map will vary with the phase of reflection k without regard to the value of the phase that was used to calculate the original electron-density map. Once all five steps in map-likelihood phasing are carried out, it is possible to calculate a new figure-of-merit-weighted electron-density map using the newly estimated phase-probability distributions. These phases can then be used to initiate a new cycle of map-likelihood phasing. As the phases are modified in this fashion, it is useful to update the analysis of the probability estimates for whether each point in the map is in the protein or solvent region and any other analyses based on the map.
The effect of each cycle in this procedure is to obtain a probability distribution for each phase independently of all the others, based on the agreement of the electron-density map with expectations. In the phase-probability calculations, all possible values of the phases are considered without any preference for the values used in the previous cycle.
The iteration of phasing and analysis of the map is ordinarily repeated until phase changes are minimal. As described below, however, in some cases where there is relatively little phase information available from the map-likelihood function it is useful to end the iterative process before complete convergence. Also, in some cases iterations of this procedure lead to some oscillations in which the changes in the k are strongly anticorrelated from one cycle to the next. In such cases a damping factor (typically 0.5) can be applied to the changes from one cycle to the next to reduce the oscillations.
estimate ofMap-likelihood phasing is related to the methods of Béran & Szöke (1995) and to the application of NCS phase starting from a solvent mask (Braik et al., 1994; Kleywegt & Read, 1997; van der Plas & Millane, 2000; Wang et al., 1998) in which crystallographic phases are obtained by matching the electron density in a part of the to a target value. The method of Béran & Szöke (1995), which employs simulated annealing to find a set of phases consistent with constraints on electron density, was shown to be capable of ab initio phase determination using a solvent mask. Similarly, high-order has been shown to be sufficient to determine crystallographic phases starting just from a solvent mask (Braik et al., 1994; Kleywegt & Read, 1997; van der Plas & Millane, 2000; Wang et al., 1998). The approach described here and in Terwilliger (2000) differs from the methods of Béran & Szöke (1995), van der Plas & Millane (2000) and Wang et al. (1998) in that probabilistic descriptions of the expected electron density are used, allowing a calculation of phase-probability distributions, rather than searching for a set of phases that is consistent with constraints.
The phase information coming from the map-likelihood function LLMAP({Fh}) comes from the agreement of the electron-density map with prior expectations about that map. This agreement depends on the phase of each reflection, in the context of the phases of all other reflections. In the implementation used in density modification, the probability (based on the map likelihood) for a particular that the phase has a value φ is given by the relative likelihood of the map obtained with this value of the phase. For example, a simple map-likelihood function might be based on defined regions containing the macromolecule and containing disordered solvent. A value of the phase for a particular reflection k that leads to a map with a relatively flat solvent is more likely to be correct than a phase that does not.
In a more general case, a map-likelihood function can be defined that describes solvent and `protein' regions of the electron-density map and probability distributions for electron density in each such region. The probability of a particular phase for a particular reflection can then be estimated from how well the resulting map matches these expected characteristics. The concept can also be extended further to include
A map-likelihood function could be constructed that reflects the extent to which symmetry-related density in the map is indeed similar, for example.A formulation of the map log-likelihood function LLMAP({Fh}) that follows this approach (Terwilliger, 2000) can be written as the integral over the map of a local log-likelihood of electron density, LL[ρ(x, {Fh})],
where this local log-likelihood of electron density describes the plausibility of the map at each point.
The local log-likelihood function, in turn, can be expressed in terms of whether the point is in the solvent or protein regions and the expected electron-density distributions in each case. As it is often uncertain whether a particular point x is in the protein or solvent region, it is useful to write the local map-likelihood function as the sum of conditional probabilities dependent on which environment the point is located in,
where pPROT(x) is the probability that x is in the protein region, p[ρ(x)|PROT] is the conditional probability for ρ(x) given that x is in the protein region and pSOLV(x) and p[ρ(x)|SOLV] are the corresponding quantities for the solvent region. The probability that x is in the protein or solvent regions can estimated by a modification of the methods of Wang (1985) and Leslie (1987) as described earlier (Terwilliger, 1999) or by other probability-based methods (Roversi et al., 2000).
The probability distributions for electron density given that a point is in the protein or solvent regions are central to map-based phasing. They define the expectations about electron density in the map. These expectations about electron-density distributions in the map are not derived from `perfect' maps, but rather from the current electron-density map. There are several reasons for doing this (Terwilliger, 2000). The key reason is that it is unreasonable to expect any value of the phase for a particular reflection to lead to a map matching expectations of a perfect map because the map has large errors from all the other reflections. In particular, the correct value of the phase for reflection k can only be expected to reduce the variation in the solvent region only slightly, not to make it perfectly flat. The amount by which the solvent can be expected to be flattened by adjusting just one reflection is dependent on the overall noise in the map. In effect, the expectations about the electron-density map include not just the features of a perfect map, but also effects of the errors in all of the structure factors other than the one under consideration. Consequently, for a starting phase set with large phase errors, the target probability distribution of electron density in the solvent region is very broad, while for a starting phase set that is very accurate this distribution can be very narrow.
Because the targeted features of the electron-density map are only weakly defined for poor starting phase sets but are more precisely defined for accurate ones, the phase information coming from the map-likelihood function becomes stronger as the phases improve. In essence, the more accurate the starting phases and the less noise in the map, the more precisely the phase of a particular reflection can be expected to lead to a map that matches the characteristics of a perfect map, and the more precisely the values of each phase can be determined.
4. Bias and map-likelihood phases
Somewhat paradoxically, although the quality of the starting phase set is an important factor in determining the phase information that comes from the map, the phase probability for a reflection obtained from map-likelihood phasing is completely unbiased with respect to the prior probabilities for that phase. On the other hand, the map-likelihood phase probability for a reflection can be biased by a model used to calculate all starting phases.
To see how the map-likelihood phase for a reflection can be unbiased with respect to prior probabilities for that phase, consider using map-likelihood phasing to obtain a probability distribution for the phase of reflection k. In order to make the situation clear, the procedure described will be a little simpler than the one used in practice. First, calculate an electron-density map using all reflections other than k. This map clearly has no bias towards the prior value for reflection k, as reflection k was not even used to obtain the map. Now examine all possible phases of the reflection k in question. For each phase, add to the map the electron density that would result from reflection k with this phase. Then compare the characteristics of the resulting electron-density map with the ones that we expect, given the location of solvent and macromolecule and given the expected distributions of electron density in solvent and protein regions. Some values of the phase of reflection k will generally lead to more plausible maps than others. This defines our probability distribution for the phase of reflection k and the process has made no use whatsoever of any prior information about this reflection. Consequently, the resulting phases are completely unbiased with respect to any prior information about reflection k. In practice, this cross-validation procedure is carried out with all the reflections at once employing an approximation and an FFT-based method (Terwilliger, 2000). The resulting phase-probability distributions are essentially the same as the ones described above, however.
Each individual phase-probability distribution obtained with map-likelihood phasing is independent of the prior phase-probability distribution for that reflection. Nevertheless, there are kinds of bias that can affect map-likelihood phasing. If the set of phases used to initiate map-likelihood phasing has been adjusted as a whole in a way that leads to a relatively flat solvent region, for example, then the first few cycles of map-likelihood phasing will tend to find these starting phases to be probable ones (because they lead to a flat solvent when combined with all the other starting phases) even if these starting phases are incorrect. This situation can occur for example if a model has been used to calculate the starting phases, as the solvent region will tend to be relatively flat even if the model is not entirely correct. It can also occur if the phases have been refined in order to flatten the solvent region. Fortunately, as described below, this type of model bias is generally removed by iterative application of map-likelihood phasing.
As described above, other approaches to using expectations about electron-density distributions in a map for determining crystallographic phases without including phase-probability distributions from other sources have been demonstrated (Wang et al., 1998; van der Plas & Millane, 2000; Béran & Szöke, 1995). Each of these approaches begins with no prior phase information and is designed to result in an ab initio phase determination. These approaches could be modified to begin with a starting phase set as described here for map-likelihood phasing; however, the probability-based approach described here is more general and can include a variety of expectations about the map. Additionally, map-likelihood phasing leads to phase-probability distributions rather than phases consistent with expectations, so that optimally weighted maps can be calculated.
5. Prime-and-switch phasing to remove model bias
Model bias is a very serious problem in macromolecular crystallography (Adams et al., 1999; Hodel et al., 1992; Kleywegt, 2000). A bias in phases that leads to electron-density patterns that are incorrect, yet look like features of a macromolecule, is very difficult to detect. Such a bias is much more serious than an equivalent amount of noise in a map that is distributed in a random fashion in the Bias of this kind commonly occurs when crystallographic phases are calculated based on a model that contains atoms that are incorrectly placed. Maps that are based on these phases tend to show peaks at the positions of these atoms even if the correct electron density would not.
Model bias in an electron-density map does not necessarily mean that the phases are very inaccurate. Relatively accurate phases, calculated from a largely correct model with some atoms in incorrect locations, can still lead to peaks at the coordinates of the incorrectly placed atoms. In this sense, the phases are biased. In the sense that the phases are close to the true phases, the phases are still relatively accurate. As in many situations, there is an important trade-off between accuracy and bias in the calculation of electron-density maps. In the crystallographic case, this trade-off is fundamentally related to the difference between errors in electron density that are randomly distributed about the
and those that are focused on certain locations in the cell. In many situations, the most accurate map (the one with the minimum expected mean-square error, for example) will be one that is based on all available information. Unfortunately, in the crystallographic situation the errors in such a map can be highly non-random. As mentioned above, high peaks can be obtained at the specific positions of atoms in a model used to calculate phases, even if those atoms are incorrectly placed. Such a map, though accurate, can be highly misleading. A map that is less accurate, but that does not suffer from this bias, could in many cases be far more informative.Many methods for reducing model bias in electron density maps have been developed. One of the most widely used approaches is the σA method of Read (1986), in which the weighting and amplitudes of structure factors (but not the phases) are optimized for minimizing effects of model bias. As the phases remain based on the model, σA weighting retains some model bias (Hodel et al., 1992). Another important method is the use of omit maps, in which all atoms in a region of the in the model are removed before using the model to calculate phases. This method reduces model bias, but leads to electron-density maps that are intrinsically much noisier than those calculated with all atoms present. Omit maps can still contain some model bias despite the omission of atoms in a region of space, as can adjust the parameters describing all the other atoms in such a way as to leave a `memory' of the coordinates of the omitted atoms (Hodel et al., 1992). This memory in omit maps corresponds to the model bias described above that can occur in the first few cycles of map-likelihood phasing. The residual bias in omit maps can be reduced by simulated annealing of the partial model (Hodel et al., 1992), if the resolution of the data and the accuracy of the starting model allows atomic In general terms, this corresponds to the iterative application of map-likelihood phasing to remove residual bias. of the model structure can also be used to reduce model bias even in cases where σA-weighted electron-density maps are not interpretable (Adams et al., 1999).
The technique of prime-and-switch phasing takes advantage of the lack of bias in map-likelihood phasing and the strong dependence of the accuracy of map-likelihood phases on the quality of the phases used to initiate the process discussed above. In this technique, all available phase information, including any coming from a model, is used to initiate map-likelihood phasing. The model phases are then set aside and not used further. As discussed above, model-based phases can be relatively accurate and biased at the same time. Owing to their accuracy, they can be useful in initiating map-likelihood phasing. Owing to their bias, setting them aside during map-likelihood phasing can reduce the bias in the final phases. As prime-and-switch phasing does not use all the phase information available, the final phases could be less accurate than a set that could be obtained using all this information. As shown below, in most cases any loss of accuracy is compensated for by a corresponding decrease in bias.
Map-likelihood phasing has the potential for producing electron-density maps that have little or no bias, as the phase probabilities for each reflection are independent of the prior phases for that reflection. However, as described above, it is possible for map-likelihood phasing to be biased by a starting phase set that has a systematic bias, for example by a starting set of incorrect phases that has a relatively flat solvent region. The iteration of cycles of map-likelihood phasing is a useful tool in reducing or eliminating this bias. The reason for expecting that an iterative application of map-likelihood phasing would remove the bias present in a single cycle is that the bias for an individual reflection comes from the set of starting phases as a whole. Once many of the phases in the set are substantially changed, the bias might be greatly reduced.
6. Convergence of map-likelihood phasing
There are two general cases that could arise in carrying out iterative cycles of map-likelihood phasing. If the solvent content or
are high, then the phases are likely to be well determined and simple iterative map-likelihood phasing would be effective. If the solvent content is low and is lacking, however, the phases might not be entirely determined by the map-likelihood function. In this case, it might be necessary to trade off a small bias towards the starting phase set in order to obtain a well defined set of phases. One way to carry out such a trade-off is simply to end the iteration of map-likelihood phasing before complete convergence. This is generally the preferable alternative in practice because, as shown below, map-likelihood phases often change very rapidly during early cycles then only very gradually after that.An alternate procedure is to introduce a small weighting towards the model-based prior phases. This introduction of some model-based phase information has several effects. One is to reintroduce some bias into the final phases. A second is to stabilize the phasing process. A third is (at least potentially) to increase the overall accuracy of the phases. The degree of bias towards the starting phase set in map-likelihood phasing can be adjusted using a weight on the prior phase probabilities. In cases where the phase information in the map is insufficient to fully define the phases (such as substantially less than 50% solvent content with no non-crystallographic symmetry), it is sometimes useful to trade off a small amount of bias in order to increase the stability of the iterative phasing process. This can typically be accomplished with a weighting of a few percent on the prior phase-probability distribution.
7. The bias ratio αB
A useful measure of the degree of bias towards model-based or other prior phases used to initiate prime-and-switch phasing or used as a source of combined phase information is the bias ratio αB, defined as
where φPRIOR is the centroid phase based on prior phase information, φMAP is that based on the map-likelihood function, 〈mPRIOR〉 and 〈mMAP〉 are estimates of 〈cos(φPRIOR − φTRUE)〉 and 〈cos(MAP − φTRUE)〉, the mean cosines of the phase differences between the true phase φTRUE and φPRIOR or φMAP, respectively, and the averages are taken over all reflections.
If the estimates of 〈mPRIOR〉 and 〈mMAP〉 are reasonably accurate, the bias ratio αB can be a useful measure of the extent of correlation between the prior phases φPRIOR and the map-likelihood phases φMAP, compared with the correlation expected for two independent sources of phasing. The numerator 〈cos(φPRIOR − φMAP)〉 is a measure of the actual correlation between prior and map-likelihood phases. If these sources of phase information are independent, then the errors in each are independent and we can write that
or
leading to a bias ratio αB of about unity. In contrast, if the sources of phase information are not independent, such as might occur if the bias in the model-based phases was not completely removed by iterative map-likelihood phasing, then the correlation between prior and map-likelihood phases will typically be greater than when they are independent,
and the bias ratio αB will generally be greater than unity.
The utility of the bias ratio is dependent on having reasonable estimates of the figures of merit of phasing 〈mPRIOR〉 and 〈mMAP〉 for the prior and map-likelihood phases. If these are overestimated, for example, then the bias ratio will be underestimated. In an extreme case, a bias ratio of substantially less than unity can be used as a diagnostic for overestimated values of the figures of merit of one or both of these sources of phasing. The bias ratio can potentially be used in a third approach for handling situations where the map-likelihood phase information is insufficient in itself to fully define the phases. In this approach, iterations of map-likelihood phasing are carried out until the bias ratio reaches a value of approximately unity, indicating that much of the bias from the prior model-based phases has been removed.
8. Examples of map-likelihood phasing
8.1. Separation of experimental and map-likelihood phase information
Fig. 1 illustrates how the phase information in density modification can be separated into experimentally derived phase information and map-likelihood phase information. Fig. 1(a) shows an experimental electron-density map based on of initiation factor 5a (Peat et al., 1998). Fig. 1(b) shows an electron-density map calculated from the map-likelihood phase probabilities obtained on the first cycle of density modification using the experimental map in Fig. 1(a) as a starting point. The crystals of initiation factor 5a contain about 60% solvent, so the phasing information that can be obtained from the map likelihood is very substantial. The map-likelihood phased map in Fig. 1(b) is clearly of equivalent or higher quality than the experimental map in Fig. 1(a). This is quite remarkable when it is recognized that the phase probabilities for the map in Fig. 1(b) are obtained simply by matching calculated and expected electron-density distributions in the solvent and protein regions. Fig. 1(c) illustrates that the accuracy of the starting phase set used in map-likelihood phasing has a substantial effect on the final phase-probability distributions. In this panel, the starting phases were those obtained after density modification with the program RESOLVE (Terwilliger, 2000) was applied to the data used in Fig. 1(a). This electron-density map is of even higher quality than those in Fig. 1(a) or Fig. 1(b).
8.2. Convergence and phase improvement in map-likelihood phasing
In order to evaluate the range of applicability of map-likelihood phasing and the utility of iterative phase improvement with this technique, several tests were carried out with model data, where the quality of phasing could readily be assessed. Figs. 2 and 3 illustrate the convergence properties of map-likelihood phasing as a function of percentage of the that is occupied by disordered solvent. Model data sets were constructed based on the refined structure of dehalogenase enzyme from Rhodococcus (Newman et al., 1999) to a resolution of 3 Å as described in Terwilliger (2000). To simulate varying amounts of solvent, varying numbers of water molecules and C-terminal residues were left out of the phase calculations. This led to models with disordered solvent content ranging from 31% (as in the actual crystals) to 73%. Starting phase sets with simulated errors were constructed and used along with the model amplitudes in map-likelihood phasing. In these simulations, a mask defining the solvent and protein regions was calculated from the atomic coordinates in the model, defining all points within 2.5 Å of an atom as being within the protein region. In each test, 20 cycles of phase calculation followed by figure-of-merit weighted map calculation were carried out. For each cycle, the mean true figure of merit, given by the cosine of the phase error 〈cosΔφ〉 is plotted.
Fig. 2(a) shows the effect of the percentage of the cell occupied by the macromolecule and by `solvent' (actually simply absence of atoms in these simulations) on the phases obtained from map-likelihood phasing starting with very poor initial phases. The starting mean true figure of merit in each case was 0.32 and the data extended to a resolution of 3 Å. For simulations with about 50% solvent or greater, each cycle of map-likelihood phasing resulted in phases that were at least as accurate as those in the previous cycle, with convergence essentially complete within 20 cycles. For those with 39% solvent, the phases became slightly worse with map-likelihood phasing compared with the starting phases; for the case with 31% solvent they were considerably worse. Fig. 2(b) illustrates the effect of high-resolution and low-resolution cutoffs on the quality of the phasing for the simulation with 53% solvent shown in Fig. 2(a). When all the data from 2.8 to 20 Å are included, the final mean true figure of merit was 0.56. When data from only 2.8 to 6 Å are included, the resulting true figure of merit decreases to only 0.44 and when data from only 2.8 to 5 Å are included, to only 0.34. Conversely, when data from only 5 to 20 Å are included, the resulting true figure of merit is only 0.28 and, as high-resolution data to 2.8 Å are included, this increases to 0.56.
Fig. 3 expands on the simulation shown in Fig. 2, illustrating the stability and convergence of phasing beginning with phases with varying errors, for solvent contents of 31, 47 and 73%. In the case of 31% solvent content, for all starting phase sets the quality of phases generally decreased with each cycle of map-likelihood phasing, although when the starting true figure of merit was about 0.6 or greater, the overall phasing process was relatively stable. In contrast, for the simulation with 47% solvent the quality of phases increased slightly with each cycle. Starting from phase sets with a true figure of merit of about 0.45 or greater, all of the test simulations converged to phase sets with similar true figures of merit of about 0.6. For 73% solvent, the quality of the phases reached the same very high true figure of merit of about 0.8, regardless of the true figure of merit of the starting set of phases in the range 0.3–0.8.
Fig. 4 illustrates the effect of errors in the definition of solvent and protein regions on phasing. The simulations in this figure were carried out in the same way as those in Fig. 2, except that the mask used was based on a model that was missing about 10% of the atoms, so that about 10% of the `protein' region was classified as `solvent'. The quality of the map-likelihood phases obtained was less than that obtained with the correct mask; even so, in the cases with about 50% or greater solvent content the phase quality with map-likelihood phasing improves over the starting phase set.
8.3. Ab initio phase determination with map-likelihood phasing
Fig. 3(c) showed that in cases with very high solvent content (73%), map-likelihood phasing yielded very substantial phase improvements and converged to essentially the same point regardless of the starting phase set used. Fig. 5 explores this further by illustrating the phase quality obtained by map-likelihood phasing as a function of solvent content, beginning with zero phase information (a blank map), but with a perfect solvent mask calculated from the atomic model. Fig. 5 shows that in cases with 66 and 73% solvent, map-likelihood phasing is sufficient in itself to determine crystallographic phases with high accuracy. In the model cases with 59 and 53% solvent, modest phase quality was obtained. These results are similar to those obtained by Béran & Szöke (1995) using a very different approach (simulated annealing) to find phase sets for model data that are compatible with defined solvent and protein regions.
It should be noted that although the map-likelihood approach was successful in ab initio phasing when using model data, tests carried out so far with experimental data have not resulted in substantial phase improvement. Presumably, this is because of complications from measurement errors and from the smaller differentiation between solvent and protein regions in real crystals compared with the model data sets examined here.
8.4. Reduction of model bias with prime-and-switch phasing
A very important feature of map-likelihood phasing is the potential for reducing or eliminating model bias in electron-density map calculations through the technique of prime-and-switch phasing. Test cases with model data were set up in order to examine how thoroughly model bias could be removed using prime-and-switch phasing and how this depended on the solvent content of the crystal. Additionally, the effect of including some prior phase information on bias and map quality for various solvent contents was examined.
Model data sets were constructed using the refined structure of dehalogenase enzyme from Rhodococcus (Newman et al., 1999) and leaving out varying numbers of water molecules and atoms from the C-terminus to simulate varying amounts of solvent content as in Fig. 2. These models were considered the `correct' structures in the tests. Then, from each correct model, a `molecular replacement' model was constructed by varying the coordinates of atoms in the correct model by an r.m.s.d. of 1.4 Å, using a function that varied sinusoidally in space so that the connectivity of the molecule remained intact. Next, all the atoms in the molecular-replacement model that were placed incorrectly were identified by noting the value of the electron density in a `perfect' map calculated with structure factors based on the correct model. All those atoms in the molecular-replacement model that were in density from −0.5σ to 0.5σ were considered to be incorrectly placed. From 20 to 30% of the atoms in the molecular-replacement models were incorrectly placed according to this criterion. The mean density at coordinates of these incorrectly placed atoms in the perfect electron-density maps for the simulations with various solvent content ranged from 0.03σ to 0.06σ and the mean density at the coordinates of atoms in the correct model in the perfect electron-density map ranged from 1.7σ to 2.9σ, with the higher values corresponding to higher solvent contents (in which most of the cell is solvent, so the ratio of peak height to the r.m.s. density of the map is higher even with perfect data).
In the tests of model bias, the overall accuracy of electron-density maps in these tests was assessed from the normalized mean value of electron density at the coordinates of atoms in the correct model. The model bias was assessed from the normalized mean value of electron density at coordinates of incorrectly placed atoms in the molecular-replacement model used in phasing. Fig. 6(a) shows the overall accuracy and model bias obtained by prime-and-switch phasing (with no prior phase information included in probability calculations) as a function of the solvent content in the model crystals. For comparison, the accuracy and model bias for σA-weighted maps based on the same data are shown. The overall accuracy of both the σA-weighted and prime-and-switch phased maps was quite high in all cases, with the prime-and-switch phased maps showing greater accuracy in all cases except at very low solvent content. The σA-weighted maps had mean values of electron density at coordinates of atoms in the correct model ranging from 0.9σ (31% solvent) to 1.8σ (73% solvent), while the prime-and-switch phased maps had mean values of electron density at coordinates of atoms in the correct model ranging from 0.9σ (31% solvent) to 2.6σ (73% solvent).
The level of bias was very different in the two methods. The σA-weighted maps had mean values of electron density at coordinates of incorrectly placed atoms in the molecular-replacment model ranging from 0.5σ (31% solvent) to 1.1σ (73% solvent). In contrast, the map-likelihood phased maps had values ranging from just 0.01σ (31% solvent) to 0.13σ (73% solvent), only slightly higher than the values of 0.03σ to 0.06σ found for a perfect map. Overall, the fractional bias, the ratio of the mean values of electron density at incorrectly placed to correctly placed atoms, for σA-weighted maps was in the range 0.5–0.6 for all values of solvent content (Fig. 6b). The fractional bias using prime-and-switch phasing was in the range 0.03–0.09 for all values of the solvent content, indicating that bias was nearly eliminated in all cases.
Fig. 7 illustrates the relationship between including model-based phase information and the resulting bias in the electron-density map. The overall quality of maps and fractional bias (as in Fig. 6) for map-likelihood phasing with 31, 47 and 73% solvent and including varying amounts of prior phase information, ranging from zero weight on prior phases to equal weighting of prior phases and map-likelihood phases, are shown. For the simulations with solvent content of 31 and 47%, the overall quality of the maps generally increases as expected with inclusion of prior phase information and then slowly decreases, with mean electron density at coordinates of atoms in the perfect model with 31% solvent increasing from 0.89 (zero prior phase information) to 1.09 (10% weight on prior information). When equal weight is placed on the prior information, overall quality decreases slightly, indicating that the prior phase-probability distributions may not be quite optimal. For the simulation with 73% solvent, inclusion of prior phase information had only a small and generally negative effect on the overall accuracy of phasing. This is presumably owing to the very high amount of unbiased phase information in the map-likelihood function in this case of high solvent content.
Fig. 8 illustrates the convergence of the map-likelihood phasing procedure as a function of the solvent content of the In an ordinary application of map-likelihood phasing about 50 cycles of iteration would be carried out. In order to examine the convergence properties in more detail, 400 cycles were carried out for each simulation, with weights on the prior phase information ranging from zero to unity. The procedure converges rapidly for the cases with 73% solvent, requiring fewer than 50 cycles for essentially complete convergence. In the cases with 53% and with 31% solvent, convergence was not fully achieved even after 400 cycles. This illustrates cases where one of the simple procedures discussed above for stopping the iterative phasing procedure before full convergence or for the introduction of a limited amount of prior phase information to stabilize the phasing process would be applicable.
8.5. Structure validation
An important application of map-likelihood phasing is likely to be structure validation (Wilson et al., 1998; Kleywegt, 2000). An unbiased method of comparing a model with amplitudes of structure factors that can identify specific places in the structure that are not fully compatible with the data would be of great help in structure validation. The map-likelihood phasing method is well suited to this task as it produces phase probabilities that are essentially unbiased by the starting phase set. Fig. 9 illustrates an example of this. The structure of gene 5 protein has been determined several times, and one of the earlier structures, refined at the moderate resolution of 2.3 Å (PDB entry 2gn5 ; Berman et al., 2000; Brayer & McPherson, 1983), differed in the loops and consequently in the register of the β-strands from structures determined at the higher resolution of 1.8 Å (PDB entry 1vqb ; Skinner et al., 1994) and by NMR (Folkers et al., 1994).
Structure validation of PDB entry 2gn5 was carried out in two steps. The data used consisted of the atomic model 2gn5 and measured structure factors from 20 to 2.6 Å. First, the atomic model 2gn5 was used to calculate model phases and the σA approach of Read (1986) was used to estimate phase probability distributions for all of the structure factors. A region of the σA-weighted map containing the loop at residues 64–67 of gene 5 protein is shown in Figs. 9(a) and 9(b). In Fig. 9(a), the atomic model from PDB entry 2gn5 is overlaid on the map and in Fig. 9(b) the atomic model from the higher resolution model 1vqb is overlaid on the map. Somewhat surprisingly, considering the difference in register between the models, in general this map agrees quite well with both structures. In the region from residues 64–67, neither model fits it perfectly and neither is entirely incompatible with the map. Next, the σA-weighted phases were used to initiate map-likelihood phasing and five cycles of solvent-mask identification, each with ten minor cycles of phase optimization, were carried out. In this map-likelihood phasing process, the σA-weighted starting phases were only used to initiate the first cycle of phasing and were not used in phase-probability calculations or in subsequent cycles of phasing. The crystals of gene 5 protein contain about 40% solvent. Figs. 6(c) and 6(d) illustrate the same region shown in Figs. 6(a) and 6(b), this time with the prime-and-switch based phasing. Once again, overall the map agrees relatively well with both structures, but the prime-and-switch based phasing results in a map that is clearly more consistent with the higher resolution structure 1vqb . Figs. 6(c) and 6(d) illustrate, for example, that in the region of residues 64–67, this map shows connectivity that is in excellent agreement with the higher-resolution atomic model 1vqb , even though it is derived from the model 2gn5 .
9. Applications of map-likelihood and prime-and-switch phasing
The technique of map-likelihood phasing has potential applications in many situations in X-ray crystallography. The critical characteristics of map-likelihood phasing are (i) that it derives phase information from the agreement of features of the electron-density map with expectation and (ii) that it produces phase (or amplitude and phase) probability information that is minimally biased by the starting phase set. The phases it produces are complementary to those obtained by experimental (e.g. MIR, MAD) approaches because the source of phase information is completely separate (e.g. solvent flatness versus MAD measurements). For the same reason, phases are also complementary to phases calculated from a model or partial model by σA-based (Read, 1986) or related approaches. Prime-and-switch phasing is a special case of map-likelihood phasing in which an accurate but potentially biased source of prior phase information such as might be obtained from an atomic model is used to initiate map-likelihood phasing but then is not used further in the phasing process.
The characteristics of map-likelihood phasing make it suited for a diverse set of applications, including minimally biased phase calculations from search models in the method of , 1995), iterative model-building (Perrakis et al., 1999), structure validation (Wilson et al., 1998) and ab initio phase determination from solvent masks or (Béran & Szöke, 1995; Rossmann, 1995; van der Plas & Millane, 2000; Wang et al., 1998).
(Rossmann, 1990The approach is applicable to any situation in which phase probabilities unbiased by a starting phase set are desirable and in which some characteristics of the electron-density map can be anticipated in advance. It is most readily applied to cases where a starting set of phases exists although, as shown above, this is not required.
The accuracy of the phases obtained using map-likelihood phasing can be expected to depend largely on two factors. One is the extent of constraints that are known in advance about the electron-density map. If the structure contains a very large amount of solvent, for example, then much phase information can be obtained because electron density in the solvent region is very highly constrained. The other is the quality of the starting phase information. In an extreme case, if the phases of all reflections with significant intensities except one were known perfectly, then the phase of the final reflection could be determined perfectly because only the perfect phase would lead to a perfectly flat solvent region. In general, the higher the quality of starting phase information, the better defined the resulting probability distributions.
Acknowledgements
The author would like to thank Joel Berendzen for discussion and the NIH and the US Department of Energy for generous support. Map-likelihood phasing is available in version 2.0 of the program RESOLVE, available from https://resolve.lanl.gov .
References
Abrahams, J. P. & Leslie, A. G. W. (1996). Acta Cryst. D52, 30–32. CrossRef CAS Web of Science IUCr Journals Google Scholar
Adams, P. D., Pannu, N. S., Read, R. J. & Brunger, A. T. (1999). Acta Cryst. D55, 181–190. Web of Science CrossRef CAS IUCr Journals Google Scholar
Béran, P. & Szöke, A. (1995). Acta Cryst. A51, 20–27. CrossRef Web of Science IUCr Journals Google Scholar
Berman, H. M., Westbrook, J., Feng., Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Braik, K., Otwinowski, Z., Hegde, R., Boisvert, D. D., Joachimiak, A., Horwich, A. L. & Sigler, P. B. (1994). Nature (London), 371, 578–586. PubMed Web of Science Google Scholar
Brayer, G. D. & McPherson, A. (1983). J. Mol. Biol. 169, 565–596. CrossRef CAS PubMed Web of Science Google Scholar
Bricogne, G. (1984). Acta Cryst. A40, 410–445. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (1988). Acta Cryst. A44, 517–545. CrossRef CAS Web of Science IUCr Journals Google Scholar
Cowtan, K. D. (2000). Acta Cryst. D56, 1612–1621. Web of Science CrossRef CAS IUCr Journals Google Scholar
Cowtan, K. D. & Main, P. (1993). Acta Cryst. D49, 148–157. CrossRef CAS Web of Science IUCr Journals Google Scholar
Cowtan, K. D. & Main, P. (1996). Acta Cryst. D52, 43–48. CrossRef CAS Web of Science IUCr Journals Google Scholar
Folkers, P. J. M., Nilges, M., Folmer, R. H. A., Konings, R. N. H. & Hilbers, C. W. (1994). J. Mol. Biol. 236, 229–246. CrossRef CAS PubMed Web of Science Google Scholar
Giacovazzo, C. & Siliqi, D. (1997). Acta Cryst. A53, 789–798. CrossRef CAS Web of Science IUCr Journals Google Scholar
Goldstein, A. & Zhang, K. Y. J. (1998). Acta Cryst. D54, 1230–1244. Web of Science CrossRef CAS IUCr Journals Google Scholar
Gu, Y., Zheng, C., Zhao, Y., Ke, H. & Fan, H. (1997). Acta Cryst. D53, 792–794. CrossRef CAS Web of Science IUCr Journals Google Scholar
Hodel, A., Kim, S.-H. & Brünger, A. T. (1992). Acta Cryst. A48 851–858. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kleywegt, G. J. (2000). Acta Cryst. D56, 249–265. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kleywegt, G. J. & Read, R. J. (1997). Structure, 5, 1557–1569. Web of Science CrossRef CAS PubMed Google Scholar
Leslie, A. G. W. (1987). Proceedings of the CCP4 Study Weekend, pp. 25–31. Warrington: Daresbury Laboratory. Google Scholar
Lunin, V. Y. (1993). Acta Cryst. D49, 90–99. CrossRef CAS Web of Science IUCr Journals Google Scholar
Newman, J., Peat, T. S., Richard, R., Kan, L., Swanson, P. E., Affholter, J. A., Holmes, I. H., Schindler, J. F., Unkefer, C. J. & Terwilliger, T. C. (1999). Biochemistry, 38, 16105–16114. Web of Science CrossRef PubMed CAS Google Scholar
Peat, T. S., Newman, J., Waldo, G. S. Berendzen, J. & Terwilliger, T. C. (1998). Structure, 15, 1207–1214. Web of Science CrossRef Google Scholar
Perrakis, A., Morris, R. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458–463. Web of Science CrossRef PubMed CAS Google Scholar
Perrakis, A., Sixma, T. K., Wilson, K. S. & Lamzin, V. S. (1997). Acta Cryst. D53, 448–455. CrossRef CAS Web of Science IUCr Journals Google Scholar
Plas, J. L. van der & Millane, R. P. (2000). Proc. SPIE, 4123, 249–260. Google Scholar
Podjarny, A. D., Bhat, T. N. & Zwick, M (1987). Annu. Rev. Biophys. Biophys. Chem. 16, 351–373. CrossRef CAS PubMed Google Scholar
Prince, E., Sjolin, L. & Alenljung, R. (1988). Acta Cryst. A44, 216–222. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Refaat, L. S., Tate, C. & Woolfson, M. M. (1996). Acta Cryst. D52, 252–256. CrossRef CAS Web of Science IUCr Journals Google Scholar
Roberts, A. L. U. & Brünger, A. T. (1995). Acta Cryst. D51, 990–1002. CrossRef CAS IUCr Journals Google Scholar
Rossmann, M. G. (1990). Acta Cryst. A46, 73–82. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rossmann, M. G. (1995). Curr. Opin. Struct. Biol. 5, 650–655. CrossRef CAS PubMed Web of Science Google Scholar
Rossmann, M. G. & Arnold, E. (1993). International Tables for Crystallography, Vol. B. edited by U. Shmueli, pp. 230–258. Dordrecht: Kluwer Academic Publishers. Google Scholar
Roversi, P., Blanc, E., Vonrhein, C., Evans, G. & Bricogne, G. (2000). Acta Cryst. D56, 1316–1323. Web of Science CrossRef CAS IUCr Journals Google Scholar
Skinner, M. M., Zhang, H., Leschnitzer, D. H., Guan, Y., Bellamy, H., Sweet, R. M., Gray, C. W., Konings, R. N. H., Wang, A. H.-J. & Terwilliger, T. C. (1994). Proc. Natl Acad. Sci. USA, 91, 2071–2075. CrossRef CAS PubMed Web of Science Google Scholar
Szöke, A. (1993). Acta Cryst. A49, 853–866. CrossRef Web of Science IUCr Journals Google Scholar
Szöke, A., Szöke, H. & Somoza, J. R. (1997). Acta Cryst. A53, 291–313. CrossRef Web of Science IUCr Journals Google Scholar
Terwilliger, T. C. (1999). Acta Cryst. D55, 1863–1871. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2000). Acta Cryst. D56, 965–972. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999). Acta Cryst. D55, 849–861. Web of Science CrossRef CAS IUCr Journals Google Scholar
Vellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). J. Appl. Cryst. 28, 347–351. CrossRef CAS IUCr Journals Google Scholar
Wang, B.-C. (1985). Methods Enzymol. 115, 90–112. CrossRef CAS PubMed Google Scholar
Wang, J. M., Hartling, J. A. & Flanagan, J. M. (1998). J. Struct. Biol. 124, 151–163. Web of Science CrossRef PubMed CAS Google Scholar
Wilson, A. J. C. (1949). Acta Cryst. 2, 318–321. CrossRef IUCr Journals Web of Science Google Scholar
Wilson, C. & Agard, D. A. (1993). Acta Cryst. A49, 97–104. CrossRef CAS Web of Science IUCr Journals Google Scholar
Wilson, K. S., Butterworth, S., Dauter, Z., Lamzin, V. S., Walsh, M., Wodak, S., Pontius, J., Richelle, J., Vaguine, A., Sander, C., Hooft, R. W. W. & Vriend, G. (1998). J. Mol. Biol. 276, 417–436. CrossRef PubMed Google Scholar
Xiang, S., Carter, C. W. Jr, Bricogne, G. & Gilmore, C. J. (1993). Acta Cryst. D49, 193–212. CrossRef CAS Web of Science IUCr Journals Google Scholar
Zhang, K. Y. J. (1993). Acta Cryst. D49, 213–222. CrossRef CAS Web of Science IUCr Journals Google Scholar
Zhang, K. Y. J., Cowtan, K. D. & Main, P. (1997). Methods Enzymol. 277, 53–64. CrossRef PubMed CAS Web of Science Google Scholar
Zhang, K. Y. J. & Main, P. (1990). Acta Cryst. A46, 41–46. CrossRef CAS IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.