research papers
Decision-making in structure solution using Bayesian estimates of map quality: the PHENIX AutoSol wizard
aLos Alamos National Laboratory, Los Alamos, NM 87545, USA, bLawrence Berkeley National Laboratory, One Cyclotron Road, Building 64R0121, Berkeley, CA 94720, USA, and cDepartment of Haematology, University of Cambridge, Cambridge CB2 0XY, England
*Correspondence e-mail: terwilliger@lanl.gov
Estimates of the quality of experimental maps are important in many stages of PHENIX AutoSol wizard carries out automated structure solution based on any combination of SAD, MAD, SIR or MIR data sets. The wizard is based on tools from the PHENIX package and uses the Bayesian estimates of map quality described here to choose the highest quality solutions after experimental phasing.
of macromolecules. Map quality is defined here as the correlation between a map and the corresponding map obtained using phases from the final refined model. Here, ten different measures of experimental map quality were examined using a set of 1359 maps calculated by re-analysis of 246 solved MAD, SAD and MIR data sets. A simple Bayesian approach to estimation of map quality from one or more measures is presented. It was found that a Bayesian estimator based on the skewness of the density values in an electron-density map is the most accurate of the ten individual Bayesian estimators of map quality examined, with a correlation between estimated and actual map quality of 0.90. A combination of the skewness of electron density with the local correlation of r.m.s. density gives a further improvement in estimating map quality, with an overall of 0.92. TheKeywords: structure solution; scoring; Protein Data Bank; phasing; decision-making; PHENIX; experimental electron-density maps.
1. Introduction
Structure solution in macromolecular crystallography is a multi-step procedure in which more than one plausible possibility often exists at the conclusion of each step. At the start of the process, one or more MAD, SAD, SIR or MIR data sets are collected and reduced to a list of indices and structure-factor amplitudes (Leslie, 1992; Kabsch, 1993; Otwinowski & Minor, 1997; Pflugrath, 1999). Even at this stage there are often several possibilities for the that must be considered. For each possible the process continues with finding a containing heavy atoms or anomalously scattering atoms (Grosse-Kunstleve & Adams, 2003; Schneider & Sheldrick, 2002; Terwilliger & Berendzen, 1999a,b; Weeks et al., 2003). There is often more than one plausible at this stage. For example, in space groups that are not chiral the two possible hands of the cannot normally be distinguished. Furthermore, for MAD data sets there may be alternative solutions found by searching for the using different data sets (from various wavelengths or combining data from different wavelengths using FA values; Terwilliger, 1994). Similarly, for MIR data sets there may also be substructures found for several different derivatives. In addition to these intrinsic possibilities, it is possible that more than one set of parameters or even more than one set of software might be used to generate possible solutions. The potential heavy-atom substructures found are then used to calculate the phases of structure factors, which are in turn used as the starting point for density modification (Wang, 1985) and subsequent model building (e.g. Perrakis et al., 1999; Terwilliger et al., 2008). Normally, one of the best indications of map quality is that the map can be interpreted in terms of an atomic model.
If every possibility at every stage were investigated fully by calculating maps, carrying out density modification and model building, the process might take many hours or days to complete. To speed up the process, the possibilities at each stage are generally ranked, with only the highest ranked possibilities being considered for the next step. This approach can be efficient, but if it is to yield the best solution at the end it requires a reliable method for deciding which members of a set of solutions are of the highest quality.
The definition of `quality' when applied to electron-density maps normally refers to the correlation between the values of electron density in the map and the values of electron density in a hypothetical `true' map for the same structure. In this work, when tests are carried out to assess various measures of map quality, the `true' quality or map correlation is calculated between the map in question and a map obtained using measured amplitudes but with phases calculated from a refined model of the corresponding structure. Maps that have a high map correlation as defined in this way are generally more useful for model building and interpretation than those with a low map correlation. However, it should be noted that map correlation is not a perfect way to assess the utility of a map, as low-resolution terms are generally stronger and therefore have a higher relative contribution to the correlation than high-resolution terms, while the high-resolution terms are generally essential for the interpretation of a map. Consequently, a map could have a moderately high correlation to a model map, based largely on low-resolution terms, yet not be interpretable.
A number of methods for evaluating the quality of experimental macromolecular electron-density maps have been developed. The methods can generally be grouped into real-space calculations and reciprocal-space calculations.
Real-space methods are based on an examination of the electron-density map and generally answer the question `Does this map look like an electron-density map of a macromolecule?' There are many distinctive features of macromolecular electron-density maps that can be used to answer this question. A good map may be expected to have continuous chains of density (Baker et al., 1993). It may have local patterns of density that reflect shapes and interatomic spacings common to macromolecules (Colovos et al., 2000; Terwilliger, 2003). It may have a distribution of electron densities with a positive skewness, reflecting the large number of points with moderate or low electron density, the lack of points with negative density and the points with very positive electron density located near atoms in the structure (Podjarny, 1976; Lunin, 1993). There may be a large variation (contrast) in the local r.m.s.d. of electron density, reflecting regions of the structure containing the macromolecule (with high local variation) and solvent (with low local variation; Terwilliger & Berendzen, 1999a; Sheldrick, 2002). The contiguous nature of the regions of relatively flat solvent may be detected from the correlation of local r.m.s.d. at one point in a map with that at neighboring points (Terwilliger & Berendzen, 1999b). If is present in the structure, then the correlation of NCS-related density can be detected (Cowtan & Main, 1998; Vellieux et al., 1995; Terwilliger, 2002a).
Reciprocal-space methods for evaluation of map quality generally address questions involving structure factors and expectations about the structure such as the model for the solvent region or for the heavy-atom ; Terwilliger & Berendzen, 1999a,b), when estimated correctly, is similar in magnitude to the correlation between the experimental and true maps and can be used as an estimate of this correlation. Another question addresses the data and the expectations about the electron-density map: `Is the amplitude of each consistent with the value expected based on the amplitudes and phases of all other reflections and the model of the solvent region?' This question can be answered based on the R factor in the first cycle of density modification (which reflects the agreement between each measured amplitude and an estimate of that amplitude based on all other amplitudes and phases along with expectations about features in the map; Cowtan & Main, 1996; Terwilliger, 2001). A related question can be asked about the phases: `If a phase is estimated from the model of the solvent region, measured amplitudes of structure factors and the experimental values of all other phases, is this phase correlated with its experimentally determined value?' This question can be answered using the correlation of experimental phases with map probability phases obtained in statistical density modification (Terwilliger, 2001). A third question that might be asked is `Do the phases calculated using only the highest peaks in the map match the experimental phases?' This question can be answered by truncating the density at a high level, calculating phases from the map and comparing these with the experimental phases (Baker et al., 1993).
One such question is simply `Given the anomalously scattering atom model and the observed data, what is the expected correlation between the experimental map and the true map?' The value of the figure of merit of phasing (Blow & Crick, 1959It is important to note that the measures of map quality are analyzed here for their utility in estimating the qualities of experimental electron-density maps, as opposed to maps that have been calculated using a partially correct model or maps that have had density modification applied. An important difference between experimental maps and those obtained using a model or based on density modification is that in the latter cases the maps have been specifically adjusted in order to maximize one or more of the properties that are being measured. For example, density modification typically flattens the solvent region of the map. Similarly, a map calculated from a model will tend to have a high skewness of the density values and a high connectivity of high electron density. Some of these measures may also be useful in these two other important cases, but the values of each measure corresponding to a particular quality of map are likely to be substantially different.
In this work, we implement ten different measures of quality of experimental electron-density maps, develop a simple Bayesian approach to estimating map quality from each and show how the individual estimates can be combined to yield useful overall estimates of map quality. These map-quality estimates are incorporated into the PHENIX AutoSol wizard and are used to make decisions during automated structure solution.
2. Materials and methods
2.1. Structure solution with the PHENIX AutoSol wizard
The PHENIX AutoSol wizard carries out structure solution for SAD/MAD or MIR/SIR/SIRAS data and any combination of these. If data representing more than one heavy-atom were available, the data were grouped into `data sets' with common heavy-atom substructures. All the structure solutions described here had been carried out previously and refined structures were available in each case. Default values were used here for most parameters, but the number and type of anomalous and heavy-atom scatterers and initial values of scattering factors were taken from this prior work.
2.1.1. Analysis with phenix.xtriage
Each available set of data was analyzed using phenix.xtriage (Zwart et al., 2005) for circumstances such as translational unexpectedly strong or weak reflections or groups of reflections or anisotropic overall atomic displacement parameters that may complicate The data were corrected for anisotropy before structure solution was carried out if the overall anisotropy correction yielded values that were highly anisotropic (by default, defined as greater than a 1.5-fold ratio among the values of the parameters along the three principal reciprocal axes and greater than 20 Å2 difference between the highest and lowest values). If an anisotropy correction was applied, then the resulting corrected data were used for structure solution only and not for (as an anisotropy correction is applied as part of the process itself).
2.1.2. solution with HySS
For each data set (i.e. a MAD or SAD data set or an SIR data set) possible heavy-atom substructures were found using the hybrid search (HySS; Grosse-Kunstleve & Adams, 2003) from isomorphous, anomalous or dispersive differences or from FA values (Terwilliger, 1994). The high-resolution limit used for the search was typically 3 Å. By default, HySS was run multiple times on each data set using a different random seed each time and the solution with the highest correlation coefficient between structure factors calculated from the heavy-atom model and the structure-factor differences or FA values was kept. The was also used, along with the number of sites found, to determine whether to continue searching. Normally, the search was carried out ten times unless the expected number of sites was found and a correlation of 0.3 was obtained. By default, if no solution was found with a correlation of at least 0.2 at a particular resolution, then up to two additional high-resolution limits were tested in steps of 1 Å (e.g. using a high-resolution limit of 3 Å followed if necessary by high-resolution limits of 4 and 5 Å.
2.1.3. Phasing with Phaser and SOLVE and map evaluation
Each potential heavy-atom Phaser (for SAD phasing; McCoy et al., 2004) or SOLVE (for MAD, SIR and MIR phasing; Terwilliger & Berendzen, 1996, 1997, 1999a,b). (In the examples shown in this work and in PHENIX versions up to v.1.3 the hand of the was fixed; later versions of PHENIX automatically invert chiral space groups when considering the inverse of the substructure.) The resulting phases and amplitudes of structure factors, along with weights (the figure of merit of phasing), were used to calculate experimental electron-density maps using a high-resolution limit of 2.5 Å (or lower if data were not available to this resolution). The high-resolution limit was applied in order to reduce the effects of variable resolution limits on the features of electron-density maps. These maps were evaluated with the measures of map quality described in this work and the overall Bayesian estimate of quality was used to rank solutions. In cases where two solutions have very similar heavy-atom parameters (r.m.s.d. among heavy-atom coordinates of less than 1/10 of the high-resolution limit of the data), only the solution with the higher estimate of quality was considered. The estimate of uncertainty in the map quality was used to identify solutions that might plausibly (5% possibility or greater) be the best solution and normally all such solutions were considered at each step. By default, up to three of the highest ranking solutions (six for MIR structures) for the heavy-atom were used to calculate phases and weights at the full available resolution of the data and for density modification.
found above (along with its inverse) was used to calculate phases withIn the structure determinations carried out below for development of the map evaluation criteria, rankings were instead obtained using a Z-score procedure (Terwilliger & Berendzen, 1999a,b) based only on the skewness of the electron density (as defined below).
2.1.4. Statistical density modification with RESOLVE
The experimental phases obtained above were used as a starting point for statistical density modification using RESOLVE (Terwilliger, 2000).
In statistical density modification with the PHENIX AutoSol wizard, a probabilistic estimate of the boundary between macromolecule and solvent is identified in two ways and that leading to the lower R factor in density modification is used. The first method (Wang, 1985) is based on the local r.m.s. density, smoothing the squared density using a sphere (Leslie, 1987) with a smoothing radius (rsmooth) given by an empirically derived formula (chosen by optimizing parameters carrying out density modification using model data),
where dmin is the high-resolution limit of the data and 〈m〉 is the mean figure of merit of phasing. The second method for solvent-boundary identification uses a comparison of histograms of density based on model maps calculated with partially randomized phases with local histograms of density in the experimental map to assign a probability that each point in the map is part of the macromolecule or part of the solvent region. In both cases a probabilistic solvent boundary is obtained (Terwilliger, 1999).
Noncrystallographic symmetry (NCS) is used in density modification if it is detected based on the heavy-atom substructure and the presence of correlated density at NCS-related positions in the electron-density map (Terwilliger, 2002a,b). The value of rsmooth described above is used as a smoothing radius in a local correlation map to identify the region over which NCS holds (Vellieux et al., 1995).
2.1.5. Model building with RESOLVE
After density modification, the PHENIX AutoSol wizard carries out automated model building using a single cycle of building with the PHENIX AutoBuild wizard (Terwilliger et al., 2008) or using rapid methods for building secondary structure of proteins and (T. Terwilliger, unpublished work). Initially, a secondary-structure-only model is built into each map. The correlation between a map calculated from the model and the density-modified map is then determined. If the value of the map–model correlation is less than a preset value (typically 0.35), then the building procedure is repeated with a standard cycle of building using the methods in the PHENIX AutoBuild wizard. If a map–model correlation of a given value (typically 0.20) or greater is obtained for at least one solution, then the top solution is identified as that with the highest value of the map–model correlation. If a lower map–model correlation is obtained, then the top solution is identified (see below) based on the Bayesian estimates of quality using the skewness of electron density (skew) and the correlation of local r.m.s. density (r2RMS).
2.2. Evaluation of measures of map quality
A set of measures of map quality were applied to experimental maps (or structure-factor amplitudes, phases and weights) obtained from real but re-enacted structure determinations. Each of the structures considered had been determined previously, so that phases from a refined model could be used with measured amplitudes to calculate a model map to use as a standard. The `true' quality of each map was taken to be the correlation with the corresponding standard map calculated at the same nominal resolution. Each measure of quality was applied to each map and the resulting scores were saved along with the corresponding `true' quality. The structure-solution process was automatically carried out by the PHENIX AutoSol wizard and each experimentally phased map that was obtained during the structure-solution process was examined in this way. To reduce the number of near-duplicate solutions considered, all solutions for a structure that had nearly identical values of the map correlation to the standard map (within a range of ±0.0005 in map correlation) were considered to be the same and only the first was used in the analysis. For comparisons involving two possible enantiomers of a solution, the two enantiomers of a solution sometimes differed only slightly (i.e. the heavy-atom substructure was nearly centrosymmetric). In these analyses of enantiomeric pairs, only those that differed by an r.m.s.d. of at least 0.5 Å were considered.
For analysis of map quality, electron-density maps and structure factors were calculated using a high-resolution limit of 2.5 Å (if data were available to that resolution), as described above for the PHENIX AutoSol wizard. Before applying each of the measures of map quality, the experimental maps were normalized to a mean of zero and a variance of unity. They were then adjusted in two steps to reduce the contribution from high density at the coordinates of heavy-atom sites. (The high density at heavy-atom sites might otherwise lead to high values for the skewness, NCS correlation, contrast and possibly other measures.) Firstly, the electron density within a radius (r) of each heavy-atom site used in phasing (where r was given by twice the resolution of the data or 5 Å, whichever was greater) was limited to values less than or equal to twice the r.m.s. (2σ) of the map. Secondly, the electron density everywhere in the map was limited to values in the range −5σ to +5σ. This modified map is referred to below as the normalized truncated experimental electron-density map.
Weighted electron-density maps were calculated in the PHENIX environment (Adams et al., 2002) using RESOLVE (Terwilliger, 2000) on a grid with a spacing of 1/3 of the high-resolution limit of the data or finer. Map correlations were obtained by calculating the of a pair of maps at all the grid points in the of the Model–map correlations were calculated in the same way, except that one map was calculated from the model and an overall B factor (b_overall) was adjusted to maximize the correlation. This correlation was further maximized by adjusting a parameter (rFFT) representing the radius around atoms in the model to be included in FFT-based density calculations (typically about equal to the high-resolution limit of the data). For protein chains, an increment in isotropic thermal factors (beta_b) for each bond between side-chain atoms and the Cβ atom was also applied to maximize the correlation.
2.3. Real-space map-quality measures
The measures of map quality used in this work are described in this and the following section and are summarized in Table 1.
|
2.3.1. Skewness of electron density
The skewness (skew) of each normalized truncated map (as described in §2.1) was calculated using the relation
where the electron density (ρ) was calculated at all the grid points in the This quantity reflects the skewness of the density values in the map.
2.3.2. Contrast of electron density
The contrast between the r.m.s. (root-mean-square) density in the solvent region and the r.m.s. density in the macromolecular region was calculated from the standard deviation of the local r.m.s. density over the entire a; Sheldrick, 2002). The normalized truncated density described in §2.2 was first squared. The squared density was then smoothed by averaging all values within a moving sphere with radius (r) given by the larger of 6 Å or twice the high-resolution limit of the data. The standard deviation (s) of the smoothed squared density was then calculated. To compensate for the effect of the solvent fraction in the crystal (f) on the resulting value, the standard deviation (s) calculated above was multiplied by the factor [(1 − f)/f]1/2 to yield the contrast c,
(Terwilliger & Berendzen, 1999The correction factor [(1 − f)/f]1/2 was chosen because it leads to a value of 1 for the contrast for a map for which the entire solvent region has zero variance and the nonsolvent region has a constant and nonzero variance.
2.3.3. Correlation of local r.m.s. density
The presence of contiguous flat solvent regions in a map was detected using the r2RMS. In this way the local value of the r.m.s. density within a small local region (typically within a radius of 3 Å) is compared with the local r.m.s. density in a larger local region (typically within a radius of 6 Å). If there were a large contiguous solvent region and another large contiguous region containing the macromolecule, the local r.m.s. density in the small region would be expected to be highly correlated with the r.m.s. density in the larger region. On the other hand, if the `solvent' region were broken up into many small flat regions, then this correlation would be expected to be smaller.
of the smoothed squared electron density, calculated as described above, with the same quantity calculated using half the value of the smoothing radius, yielding the correlation of r.m.s. density,2.3.4. Flatness of the solvent region
A normalized truncated electron-density map was partitioned between regions of solvent and macromolecule as described in §2.1.4. The r.m.s. electron density in the solvent region (r.m.s.SOLVENT) and in the region of the macromolecule (r.m.s.PROT) were then calculated. The flatness (F) of the solvent region was expressed as the difference between the two,
2.3.5. Number of regions enclosing high density
A threshold of density (t) was found such that 5% of the volume of the of the crystal had a density greater than this threshold t. All the grid points in the map above the threshold t were marked. The number of discrete regions (Nregions) containing marked points was then counted. For this purpose, a discrete region was defined as a set of all marked grid points that can be connected by tracing from one adjacent marked grid point to another (including symmetry-related marked grid points). To partially compensate for the fact that lower resolution maps have fewer grid points, the number of regions was multiplied by the high-resolution limit of the data used to calculate the map (dmin). To further compensate for the volume of the containing the macromolecule, the number of regions was then divided by the fraction of the that contains macromolecule (f) and the volume of the (V) to yield the normalized number of regions per unit volume (Nr),
2.3.6. Overlap of NCS-related density
If ; Vellieux et al., 1995). The overlap (ONCS) between density at NCS-related locations was used to evaluate noncrystallographic symmetry,
was found in the heavy-atom for a solution, then the map was examined for the presence of correlated density at NCS-related locations in the map (Cowtan & Main, 1998where ρi and ρj are density at NCS-related locations in the and the average is either within a sphere with radius rsmooth (as described above for identifying the solvent boundary) or over a region within the The values of density ρi used were those from the normalized truncated map described above. The region where NCS applies was identified as a contiguous region in which the local mean of the overlap is at least cMIN, where this cutoff cMIN was selected to yield a total volume occupied by all NCS copies that was approximately the same as the total volume (f) occupied by the macromolecule in the (Terwilliger, 2002a). For the purposes of evaluating a map, the mean value of the overlap of NCS density, ONCS, was calculated over this entire NCS region. If the value of the overlap found was less than OMIN (typically, OMIN = 0.3), the NCS was ignored.
2.4. Reciprocal-space map-quality measures
2.4.1. R factor and phase correlation from statistical density modification
The amplitudes and phases of structure factors calculated using statistical density modification, but without including the experimental phase probabilities, can be compared with the observed amplitudes and experimental phases (Cowtan & Main, 1996; Terwilliger, 2001). These comparisons yield an R value (RDENMOD) for the amplitudes and a mean cosine of the phase difference (mDENMOD) for the phases.
2.4.2. Figure of merit of phasing
The mean figure of merit of phasing (〈m〉) was used directly from Phaser (for SAD phasing calculations; McCoy et al., 2004) or SOLVE (for MIR and calculations; Terwilliger & Berendzen, 1999a,b) as an estimate of the quality of a map.
2.4.3. Density truncation (peak-picking)
The number of non-H atoms (n) in the was roughly estimated from the fraction of the that contains macromolecule (f) and the volume of the (V) using an approximate average atomic volume Vo = 19 Å3 (Stroud & Fauman, 1995) using the relation n = fV/Vo. The highest 3n/4 grid points in the of the electron-density map were then identified and C atoms were placed at these grid points. A map was calculated from these C atoms and the correlation (r2TRUNCATION) with the original map was obtained after adjusting an overall thermal factor to maximize this correlation.
2.5. Bayesian estimates of map quality
A simple Bayesian approach was used to create estimators of map quality based on one or more of the measures of map quality described in §§2.3 and 2.4. For each measure (e.g. skew), the comparison of maps with the corresponding solved structures yielded a list of values of `true' map correlation (r2MODEL) and the measure of quality (e.g. skew). A two-dimensional histogram was created to represent the joint distribution p(r2MODEL, skew). The distributions were sampled with 30 bins for each variable, with the range of allowed values of each ranging from −0.1 to 1.1. Any values obtained outside this range were put in the closest available bin. To compensate for the fact that insufficient data (1359) were present to generate an accurate value for all 900 bins, the values of p(r2MODEL, skew) were smoothed using a Gaussian smoothing algorithm in which p(r2MODEL, skew) was convoluted with a Gaussian function G(r) with a radius (σ) of three bins {G(r) ∝ exp[−(u2 + v2/(2σ2)]}, reducing the effective number of bins to about 100.
To estimate the value of map quality (r2MODEL) from a new observation of the quality measure (skew), Bayes' rule (Hamilton, 1964) was used,
where the normalization factor A assures that the integrated probability for r2MODEL is unity and is given by
(7a) says that the (posterior) probability of a particular value of r2MODEL, given the measurement skew, is the prior probability of r2MODEL [po(r2MODEL)] multiplied by the conditional probability [p(skew|r2MODEL)] of measuring this value of skew given that r2MODEL is the correct value, divided by a normalization factor. We calculated the conditional probability p(skew|r2MODEL) in (7a) from the joint probability distribution p(r2MODEL, skew) using the relation
For the present work, we assume that the prior probability distribution po(r2MODEL) is uniform on [0, 1].
If several measures of map quality (e.g. skew and the contrast c) have been measured, then the estimates can be combined using the same approach:
We approximate the probability distribution p(skew, c|r2MODEL) as the product of the two two-dimensional conditional probabilities that we have estimated above,
which amounts to assuming that the skewness and contrast c are conditionally independent for a given fixed r2MODEL value.
To obtain the estimated value and variance of r2MODEL given a set of observations of predictor variables (e.g. skew, c), we used the probability distribution given by (8a) and calculated the of 〈r2MODEL〉,
An improved estimate of the conditional probability distributions such as p(skew, c|r2MODEL) could potentially be obtained by calculating the covariance of the variables skew and c for each fixed value of r2MODEL and assuming a normal distribution of skew and c for this fixed value of r2MODEL. This formulation differs from that in (9) by including correlations between skew and c instead of assuming that they are zero and also through the assumption of normality in the distributions of skew and c for fixed r2MODEL. Leaving out the fixed value of r2MODEL for clarity, representing the two-dimensional vector (skew, c) as x = (skew, c) and the mean values of skew and c for this value of r2MODEL as u = (〈skew〉, 〈c〉), we can write (Hamilton, 1964)
where Σ is the covariance matrix with elements σij representing the variation of skew and c around their means 〈skew〉 and 〈c〉,
To test this approach we used the data described above, but grouped in bins of r2MODEL. The observations in each bin of r2MODEL were analyzed using (11a)–(11d) based on the values of the N predictor variables (skew, c…) for all the observations in that bin to obtain an approximation of the conditional probability distribution p(skew, c|r2MODEL) for that bin. This set of approximations (one for each bin of r2MODEL) was then used in (8) to estimate r2MODEL for individual sets of observations of the N predictor variables. This approach gave correlations that were at most marginally improved over those obtained using estimates of the conditional probability distribution p(skew, c|r2MODEL) based on (9). For example, using skew and correlation of local r.m.s. density (r2RMS) as predictor variables and analyzing the same data shown in Table 3 (but without cross-validation), the overall between the true values of r2MODEL and estimates obtained using (9) (in which independence of skew and r2RMS is assumed) was 0.925. Using (10) (assuming Gaussian distributions for skew and r2RMS) and setting the covariance terms to zero (assuming independence of skew and r2RMS) yielded a value of 0.926; the same analysis but including the covariance terms yielded a value of 0.927. As this approach did not significantly improve the correlation, it was not used. Fig. 1(c) suggests that the assumption of normality in the distributions of the predictor variables (e.g. skew and r2RMS) for fixed r2MODEL is not well justified. This may partially explain the poor performance of this approach.
2.6. Structures and data used
Data from 47 structures in the PHENIX library of MAD, SAD and MIR data sets were used along with 246 MAD and SAD structures from the Joint Center for Structural Genomics (JCSG; https://www.jcs.org ). The structures from the PHENIX library included 1029B (PDB code 1n0e; Chen et al., 2004), 1038B (1lql; Choi et al., 2003), 1063B (1lfp; Shin et al., 2002), 1071B (1nf2; Shin, Roberts et al., 2003), 1102B (1l2f; Shin, Nguyen et al., 2003), 1167B (1s12; Shin et al., 2005), aep-transaminase (1m32; Chen et al., 2002), armadillo (3bct; Huber et al., 1997), calmodulin (1exr; Wilson & Brunger, 2000), cobd (1kus; Cheong et al., 2002), cp-synthase (1l1e; Huang et al., 2002), cyanase (1dw9; Walsh et al., 2000), epsin (1edu; Hyman et al., 2000), flr (1bkj; Tanner et al., 1996), fusion-complex (1sfc; Sutton et al., 1998), gene-5 (1vqb; Skinner et al., 1994), gere (1fse; Ducros et al., 2001), gpatase (1ecf; Muchmore et al., 1998), granulocyte (2gmf; Rozwarski et al., 1996), groEL (1oel; Braig et al., 1995), group2-intron (1kxk; Zhang & Doudna, 2002), hn-rnp (1ha1; Shamoo et al., 1997), ic-lyase (1f61; Sharma et al., 2000), insulin (2bn3; Nanao et al., 2005), lysozyme (unpublished results; CSHL Macromolecular Crystallography Course), mbp (1ytt; Burling et al., 1996), mev-kinase (1kkh; Yang et al., 2002), myoglobin (A. Gonzales, personal communication), nsf-d2 (1nsf; Yu et al., 1998), nsf-n (1qcs; Yu et al., 1999), p32 (1p32; Jiang et al., 1999), p9 (1bkb; Peat et al., 1998), pdz (1kwa; Daniels et al., 1998), penicillopepsin (3app; James & Sielecki, 1983), psd-95 (1jxm; Tavares et al., 2001), qaprtase (1qpo; Sharma et al., 1998), rab3a (1zbd; Ostermeier & Brunger, 1999), rh-dehalogenase (1bn7; Newman et al., 1999), rnase-p (1nz0; Kazantsev et al., 2003), rnase-s (1rge; Sevcik et al., 1996), rop (1f4n; Willis et al., 2000), s-hydrolase (1a7a; Turner et al., 1998), sec17 (1qqe; Rice & Brunger, 1999), synapsin (1auv; Esser et al., 1998), synaptotagmin (1dqv; Sutton et al., 1999), tryparedoxin (1qk8; Alphey et al., 1999), ut-synthase (1e8c; Gordon et al., 2001) and vmp ( l8w; Eicken et al., 2002).
The structures from the JCSG included PDB (Bernstein et al., 1977; Berman et al., 2000) entries 1o1x (Xu et al., 2004), 1vjf, 1vjr, 1vk4, 1vk8, 1vk9, 1vkd, 1vkn, 1vl0, 1vl5, 1vli, 1vlo, 1vly, 1vm8, 1vmg, 1vmi, 1vp8, 1vpm, 1vpz (Rife et al., 2005), 1vqr (Xu, Schwarzenbacher, McMullan et al., 2006), 1vqs, 1vqy, 1vqz, 1vr0 (DiDonato et al., 2006), 1vr3 (Xu, Schwarzenbacher, Krishna et al., 2006), 1vr5, 1vr8 (Xu, Krishna et al., 2006), 1vrm (Han et al., 2006), 1z82, 1z85, 1zbt, 1zej, 1zh8, 1zko, 1ztc, 1zx8 (Jin et al., 2006), 1zy9, 1zyb, 2a3n, 2aam, 2aml, 2ax3, 2b8n (Schwarzenbacher et al., 2006), 2etd, 2ets, 2evr, 2f4i, 2f4l, 2fg0, 2fg9, 2fna, 2ftr, 2fup, 2fur, 2g0w, 2gb5, 2gc9, 2gf6, 2gfg, 2ghr (Zubieta et al., 2007), 2gno, 2go7, 2gpi, 2gpj, 2grj, 2gvh, 2h1q, 2h1t, 2h9f, 2hcf, 2hh6, 2hhz, 2hi0, 2hq7, 2hq9, 2hr2, 2hsz, 2huh, 2hx1, 2hx5, 2hxv, 2i02, 2i8d, 2i9w, 2ig6, 2ii1, 2ilb, 2isb, 2it9, 2itb, 2nuj, 2o08, 2o2g, 2o2x, 2o2z, 2o3l, 2o62, 2oa2, 2oaf, 2oc6, 2od5, 2ogi, 2oh1, 2oh3, 2oik, 2ooj, 2ook, 2op5, 2opl, 2oqm, 2ord, 2osd, 2otm, 2ou3, 2ou5, 2ou6, 2own, 2oyo, 2ozg, 2ozj, 2p10, 2p1a, 2p7i, 2p8j, 2pbl, 2peb, 2pfw, 2pg4, 2pgc, 2pke, 2pn1, 2pq7, 2pr7, 2prr, 2prv, 2pv4, 2pv7, 2pwn, 2py6, 2pyq, 2pyx, 2q02, 2q04, 2q0t, 2q14, 2q3l, 2q78, 2q7x, 2q9k, 2q9r, 2qe6, 2qe9, 2qez, 2qg3, 2qhp, 2qj8, 2ql8, 2qml, 2qpx, 2qr6, 2qtp, 2qtq, 2qw5, 2qww, 2qwz, 2qyv, 2r01, 2r0x, 2r1i, 2r3b, 2r44, 2r4i, 2r9v, 2ra9, 2ras, 2rcc, 2rcd, 2rd9, 2rdc, 2re3, 2re7, 2rfp, 2rgq, 2rha, 2rhm, 2rij, 2ril, 2rkh, 3b5e, 3b5o, 3b77, 3b7f, 3b81, 3b8l, 3bb5, 3bb9, 3bcw, 3bdd and 3bde.
3. Results and discussion
3.1. Measures of map quality
A key goal of this work was to identify one or more quality measures of maps or of structure factors that are simple to calculate and that can yield accurate estimates of the qualities of the corresponding electron-density maps. Table 1 lists six measures of map quality examined here that are based on the features of the maps (real-space measures) and Table 2 lists four additional measures that depend on the structure factors and phases used to calculate maps. The measures were chosen to represent a range of possible measures that cover many important features of electron-density maps and structure factors.
|
To evaluate these measures of map quality, we carried out a re-analysis of data for 246 previously solved MAD, SAD and MIR structures, creating electron-density maps during the structure-determination process and analyzing them with each of the measures in Tables 1 and 2. As the structures are all known, the `true' map quality for each map could be calculated as the r2MODEL between each map and the corresponding map obtained using phases calculated from the refined model of the structure (after any necessary origin shifts are applied) using the PHENIX tool phenix.get_cc_mtz_mtz.
For each of the 246 data sets, the PHENIX AutoSol wizard was used to scale the data, calculate anomalous or isomorphous differences and identify potential heavy-atom solutions. As both hands of the heavy-atom would normally be considered, at least two sets of heavy-atom solutions were generally obtained for each data set. Additionally, as MIR and MAD data sets have more than one set of anomalous or isomorphous differences, these data sets generally yielded additional heavy-atom solutions. Also for MIR and MAD structure determinations, difference Fourier analysis was used to generate even more heavy-atom solutions. Consequently, there were a total of 1359 heavy-atom solutions analyzed in this work even though there were only 246 data sets.
Figs. 1(a)–1(j) show the values of each measure plotted against r2MODEL for 1359 maps based on structures calculated from the MAD, SAD and MIR data listed in §2.6. The maps represent the phases obtained at several stages in Some were calculated using heavy-atom solutions found from anomalous or isomorphous differences or from FA values with HySS (Grosse-Kunstleve & Adams, 2003). Others were calculated using the corresponding substructures with inverted hand. Others were obtained from difference Fourier (MIR) and anomalous difference Fourier (MAD) analyses. In the case of MIR, a large number of additional solutions were obtained by combinations of partial solutions from different derivatives.
The general features of the plots in Fig. 1 are illustrated by a discussion of Fig. 1(a), which shows the skewness of electron density (skew) in experimental maps as a function of the true map quality r2MODEL. In Fig. 1(a) the purple squares correspond to data sets with a nominal resolution lower than 2 Å and the black diamonds to data sets with resolutions of 2 Å or higher. (Note that the data for all these calculations were truncated at a resolution of 2.5 Å, so that most resolution-dependent differences are likely to be the consequence of data-set-dependent decreases of intensities with resolution rather than the resolution of the data.)
Fig. 1(a) shows that the skewness of the electron density depends strongly on the map quality, as represented by the correlation of the density in the map with that of a model map (r2MODEL). The skewness is approximately zero for maps with a correlation in the range 0.0 < r2MODEL < 0.2. It increases slightly for maps with correlations in the range 0.2 < r2MODEL < 0.4 and then increases substantially for maps with higher correlations (r2MODEL > 0.4). The standard deviation of the values of the skewness is about 0.05–0.10 over most ranges of map correlation. For example, for values of map correlation with r2MODEL < 0.2 the mean skewness is −0.02 and the standard deviation is 0.07 and for values of map correlation with 0.4 < r2MODEL < 0.5 the mean skewness is 0.14 with a standard deviation of 0.06. For values of map correlation with 0.6 < r2MODEL < 0.7 the mean skewness is 0.38 with a standard deviation of 0.10. Another way to view these relationships is to note that the difference (0.16) in the mean values of the skewness between values of map correlation of r2MODEL < 0.2 and values of map correlation in the range 0.4 < r2MODEL < 0.5 is about twice the standard deviation of the skewness in either range. This means that the skewness can be expected to differentiate between maps with model correlations r2MODEL of zero and 0.4, but that it cannot differentiate them correctly all of the time. This can also be seen directly from Fig. 1(a), in which some of the values of skewness for maps with model correlations r2MODEL near 0.4 are lower than values for maps with near-zero values of r2MODEL.
The maps represented in Fig. 1(a) that are based on high-resolution data sets (<2 Å) have values of skewness that are similar to those of lower resolution data sets. This similarity is most likely to reflect the fact that all the data in these calculations were truncated at a resolution of 2.5 Å.
Several of the other nine measures of map quality examined have relationships to model map correlation similar to those described above for the skewness. The contrast (c; Fig. 1b), correlation of local r.m.s. density (r2RMS; Fig. 1c) and flatness of the solvent region (F; Fig. 1d) in particular show very similar behaviour, except that none of these discriminate as well as the skewness between maps of moderate quality (correlations r2MODEL near 0.4) and those of very low quality with correlations near zero. These three measures are all related as they all are based on the presence of solvent and nonsolvent regions in the crystal. However, the calculations differ in that the contrast (c) does not require knowledge of the solvent boundary while the flatness (F) does. Additionally, the correlation of local r.m.s. density reflects the contiguous nature of the solvent region while the contrast (c) and flatness (F) reflect the presence of a solvent region, whether contiguous or not.
A somewhat different behavior is shown by the number of contiguous regions (Nr) required to enclose the highest 5% of density in a map (Fig. 1e). This measure decreases with increasing map quality, but only slightly, so that it is not a strong discriminator between maps of low and moderate quality.
The overlap of NCS-related density (Fig. 1f) is a measure which, as implemented here, only applies to maps where NCS can be identified from the symmetry present in the heavy-atom sites. It is therefore different from the measures discussed so far and cannot be used as a general measure of map quality. It is nevertheless useful in differentiating between maps with very high model map correlations (r2MODEL) and those that have lower model map correlation.
Figs. 1(g) and 1(h) show the phase correlations (mDENMOD) and R factors (RDENMOD) obtained from the first cycle of statistical density modification using the same structure factors, phases and weights that were used to calculate the electron-density maps analyzed in Figs. 1(a)–1(f). In the first cycle of statistical density modification with RESOLVE (Terwilliger, 2000), estimates of the phase and amplitude of a reflection k were obtained using only information from all the other reflections in the data set. The amplitude and phase for reflection k from the density-modification procedure can then be compared with the experimentally observed amplitude and the `experimental' phase (derived using isomorphous or anomalous differences) to yield an R factor for density modification (RDENMOD) and a mean cosine of the phase difference (mDENMOD). Fig. 1(g) shows that, as expected, the R factor for density modification decreases with increasing map quality, while Fig. 1(h) shows that the phase correlation increases over the same range.
Fig. 1(i) shows that the correlation of pseudo-maps calculated using dummy atoms placed at the highest peaks in a map with their corresponding original maps (r2TRUNCATION) is weakly related to the quality of the map. It seems possible that more sophisticated methods of map skeletonization (Baker et al., 1993) might be more useful in map evaluation than our simple measure.
Finally, Fig. 1(j) shows that the mean figure of merit of phasing (〈m〉) is related to the quality of the map, but that there are many maps with very low correlation to the corresponding model maps that nevertheless have high mean figures of merit. This relationship can be understood by considering that the figures of merit of phasing of two maps that are calculated using the same data but opposite enantiomers of the heavy-atom are normally identical for SAD phasing if all the anomalous scatterers are of the same type. Typically, one of these maps may have a high correlation to the model map while the other may have a very low correlation.
Overall, Fig. 1 shows that several measures of map quality based on different features of the map and on the structure factors and phases leading to the map have strong relationships to the quality of the electron-density map, with the skewness of electron density clearly being one of the best indicators of map quality.
In Fig. 1(a) there is one point at (r2MODEL = 0.03, skew = 0.31) that is quite far from all the others, with a value of the skewness that is far greater than all the other points with very small values of r2MODEL. This point corresponds to a heavy-atom solution found during the analysis of data from PDB entry 2re3 which yields an electron-density map that is incorrect but not at all random. The crystal has translational and the electron density in the electron-density map for this solution is offset from that the correct map by an origin shift that is noncrystallographic. Consequently, our analysis of the two maps, which only allows crystallographic translational offsets, shows a near-zero correlation of the maps despite considerable similarity (a correlation of 0.73 when offset). We note that the translation involved does correspond to a real difference: if the coordinates of PDB entry 2re3 are shifted by this translation (0, 0.735, 0) in P43212 the amplitudes of the structure factors do change and the R factor based on experimental amplitudes for the model in this position is 0.53, compared with a value of 0.23 for the deposited model. Note that this solution also appears in Fig. 2(a) at the position (0.60, 0.03) and in Fig. 2(b) at the position (0.62, 0.03) where it is again an outlier.
3.2. Estimation of map quality using features of the map and of the structure factors used to calculate the map
Fig. 1 showed that each of the six different features of electron-density maps and the four characteristics of structure factors we examined depend in some way on the quality of the corresponding map. We used the Bayesian approach described in §2.5 to use this information to estimate map quality from these ten features. The general idea of this approach is very simple. Imagine that a particular map has been examined, yielding a value of the skewness of electron density of 0.20. Considering the plot in Fig. 1(a), it is reasonable to conclude that this map is very likely to have a correlation (r2MODEL) with the corresponding model map in the range 0.4 < r2MODEL < 0.6, because nearly all examples in Fig. 1(a) with a skewness of about 0.20 are in this range. Equation 7(a) is a mathematical way to make this statement. Equation 8(a) is a similar statement, except that it includes more than one measure of map quality. As described in §2.5, we assume here that the various measures of map quality (skewness, contrast etc.) are independent. This allows the simple calculation in (8a) to be used to estimate r2MODEL from several measures of map quality.
Fig. 2(a) shows the results of using (7a) to estimate r2MODEL from the skewness of electron density. In Fig. 2(a) the abscissa is the Bayesian estimate of r2MODEL using the skewness of electron density and the ordinate is the true value of r2MODEL. To ensure that the parameters in the Bayesian estimator did not contain information on the specific cases being tested, a cross-validation procedure was used in which all solutions for the structure being examined were excluded when constructing the Bayesian estimators. Fig. 2(a) shows that in cases where the true value of r2MODEL is in the range 0.0 < r2MODEL < 0.2, the estimates of r2MODEL all have very similar values of about 0.1. This can be understood from Fig. 1(a), in which the skewness is seen to be insensitive to values of r2MODEL in this range. The Bayesian estimates of r2MODEL for low values of skewness are all close to the midpoint of this range, as they are simply the average of plausible values of r2MODEL given the observation of the value of the skewness. For higher values of r2MODEL, the estimates of r2MODEL are closer to the true values. Overall, the between the Bayesian estimates and the true values of r2MODEL is 0.90 and the r.m.s. error in prediction of r2MODEL is 0.10. As a check on our procedures, we note that the mean uncertainty estimates for r2MODEL obtained from the Bayesian procedure was 0.11, which is quite similar to the actual r.m.s. error in prediction of r2MODEL of 0.10.
Table 3 summarizes the accuracy of the Bayesian estimates of map quality based on each of the measures described in Tables 1 and 2 (with the exception of the overlap of NCS density, which is not included because it does not apply to most of the maps in our tests). For each measure, Table 3 lists the values of the of the Bayesian estimates and the true map quality (r2MODEL) along with the r.m.s. prediction error in r2MODEL. Overall, the skewness of electron density, with a between Bayesian estimates and true values of r2MODEL of 0.90, is the most reliable indicator of map quality, with the correlation of local r.m.s. density being the next best (correlation of 0.85) and with contrast, flatness of solvent region and density-modification phase correlations and R factor giving only slightly poorer predictions of r2MODEL, with correlations in the range 0.75–0.80.
|
To identify an optimal combination of measures for estimation of map quality, we began with the best single measure (skew) and used (9) to combine information from each of the other measures. The measure giving the best prediction of r2MODEL in combination with the skewness of electron density was the correlation of local r.m.s. density (r2RMS; Table 3). Fig. 2(b) shows how the estimates of map quality obtained using just the correlation of r.m.s. electron density compare with actual map quality and Fig. 2(c) shows estimates based on both skewness and correlation of r.m.s. electron density. The correlation of r.m.s. density was the next-best single predictor after skew; in addition, the correlation of prediction errors from these two variables was relatively low (0.61; Table 4). The assumptions in (9) are therefore relatively well justified and it is not surprising that the resulting estimator is improved over that using just the skewness of electron density. This process was continued but no further improvement was obtained in the Bayesian estimator. The optimized combination of measures based on skewness and correlation of local r.m.s. density yielded a between the Bayesian estimates and true values of r2MODEL of 0.92 and an r.m.s. prediction error of 0.09 (Table 3 and Fig. 2c).
3.3. Identification of the hand of heavy-atom substructures using measures of map quality
A particularly important application of measures of map quality is the identification of the hand of heavy-atom substructures. The hand of the heavy-atom HySS procedure (Grosse-Kunstleve & Adams, 2003) used here. Consequently, some procedure is needed for identifying which hand of the heavy-atom is correct. Figs. 3(a)–3(i) compare the values obtained for nine measures of map quality based on 353 pairs of heavy-atom substructures with correct and inverted handedness from the 186 data sets in this work for which the was not chiral (structures with chiral space groups were excluded so the hand of the could be fixed in this analysis). The mean figure of merit of phasing is not shown because it is essentially identical for the two hands of the in all the cases examined. The 706 maps represented by these 353 pairs are a subset of the 1359 maps used in the calculations shown in Fig. 1.
cannot normally be identified directly during determination by such as theIt is somewhat remarkable that these nine measures of map quality all give very good discrimination between the correct and incorrect hands of heavy-atom substructures (Fig. 3 and Table 5), even though they are not all so useful in estimating the absolute quality of maps (Table 3). The best discrimination between correct and incorrect hands is obtained with the skewness of electron density (Fig. 3a), as expected from the high correlation of estimates of map quality based on skewness with actual map quality (Table 3). Using the skewness of electron density to make decisions on handedness (Fig. 3a), 98% of decisions (in cases where the quality of the maps for the two hands differs by at least 0.05) would correctly identify the map with the higher quality (Table 5). Note that for SIR or MIR data without anomalous differences none of these techniques can identify the correct hand because the inverse hand of the heavy atoms leads to a map that has inverse but is otherwise identical. A similar argument would partially apply in cases where an anomalous signal is present but is weak. This situation is presumably the cause of the large number of MIR-derived points near the diagonal of the panels in Fig. 3.
|
3.4. Identification of the highest quality density-modified map for a structure
The scoring procedures described above are based on an analysis of the phases and structure-factor amplitudes corresponding to an experimental electron-density map. Prior to final map interpretation, however, the experimentally determined phases of structure factors are normally optimized by density modification (Wang, 1985). Several additional parameters are required for density modification, including identification of (if any), solvent content and the solvent region. It seemed possible that these parameters might not always be chosen optimally and the best experimental maps might not always lead to the best density-modified maps. Consequently, some additional method of scoring the density-modified maps might be useful.
To investigate this possibility, we carried out , this time with default parameters in the PHENIX AutoSol wizard including Bayesian estimates of experimental map quality based on the skewness of electron density (skew) and the correlation of local r.m.s. density (r2RMS). For each structure, the final steps were to carry out density modification with RESOLVE (Terwilliger, 2000) on the top-ranked solution or solutions and then to build a preliminary atomic model. In cases where there was one solution that was much better than all others (see §2), then only that solution was used in density modification. However, in most cases there were multiple solutions with similar Bayesian estimates of quality and up to three (MAD, SAD) or six (MIR) of these were used in density modification.
with the data sets used in Fig. 1Fig. 4(a) shows the relationship between the qualities of experimental maps and the qualities of the corresponding density-modified maps for 569 experimental maps for 260 data sets. For experimental maps of high quality (correlation with model map over 0.6), the quality of the density-modified map is generally (but not always) very high, typically ranging from 0.75 to 0.90. For very poor experimental maps (correlation with model map of less than 0.2) the density-modified maps were also uniformly poor (typical map correlation of 0–0.1). On the other hand, for experimental maps of moderate quality (map correlation between 0.2 and 0.5) the quality of the density-modified maps vary over a wide range (from about 0.1 to about 0.9).
Much of the variability in density modification for experimental maps of moderate quality illustrated in Fig. 4(a) could arise from the intrinsic differences in solvent content, noncrystallographic symmetry, type of experiment and resolution between the different structures. To examine this, we have plotted in Fig. 4(b) the true map qualities of density-modified maps for all 176 pairs of solutions from Fig. 4(a) that are from the same structure, use the same number of non-crystallographic symmetry operators (if any) in density modification and have values of true experimental map correlation within 0.05 of each other. In Fig. 4(b) each point corresponds to one pair of solutions. The abscissa is the value of density-modified map quality for the solution with the higher value of experimental map quality and the ordinate is the density-modified map quality for the solution with lower experimental map quality. Each member of such a pair has identical solvent content, resolution, actual number of operators identified and experiment type and differs only slightly in true experimental map quality. Fig. 4(b) shows that when all these factors are controlled the quality of pairs of density-modified maps is very similar in most cases, but substantial differences in the qualities of the density-modified maps remain in some cases.
The remaining variation in effects of density modification illustrated in Fig. 4(b) suggests that it might be useful to carry out a final ranking of solutions based on a measure of quality of the corresponding density-modified maps. We used the map–model correlation between density-modified maps and the preliminary atomic models built with the PHENIX AutoSol wizard as such a measure of quality. Table 6 shows the utility of this map–model correlation in identifying the solution with the best density-modified map for each of the 149 structures used in Fig. 4(a) in which there was more than one solution tested by density modification and model building and in which the model-building process yielded a model with a model–map correlation of at least 0.20.
|
The first row in Table 6 provides a background for this analysis by considering the use of our Bayesian estimates of experimental map quality to identify the best solutions. In Table 6 experimental map quality and density-modified map quality are examined separately. Using the Bayesian estimates (which are based on the experimental maps), the best experimental map for a particular structure could be identified 91% of the time. The worst error in identification of the best experimental map corresponded to a difference in map correlation of 0.29. Next, density-modified maps were examined. The solution with the highest Bayesian estimate of experimental map quality led to the best density-modified map in 88% of cases; however, the worst error in identification of the best density-modified map corresponded to a very large difference in map correlation of 0.58.
Using the map–model correlation for the model built into the density-modified maps in decision-making the situation is reversed, with the best experimental map identified only 87% of the time and the best density-modified map identified 92% of the time. Further, the density-modified map yielding the highest map–model correlation was never worse than the very best density-modified map obtained by more than a difference in correlation of 0.26, showing that the model–map correlation is a useful criterion for final ranking of solutions. Overall, Table 6 indicates that model–map correlation is an improvement over Bayesian estimates of experimental map quality for the identification of the best density-modified map.
3.5. Using the PHENIX AutoSol wizard to redetermine structures from the PHENIX structure library
To test the overall utility of the Bayesian estimates of map quality in the overall context of PHENIX structure library with the PHENIX AutoSol wizard. The structures in this library range from relatively straightforward cases of SAD and MAD to considerably more complex cases that involved combinations of SAD or MAD with MIR and difficult-to-solve heavy-atom substructures. In the tests carried out here, only one source of phase information was used for each structure (i.e. MAD, SAD or MIR), except in the case of the fusion-complex structure (PDB code 1sfc ; Sutton et al., 1998), in which SAD and SIR data were combined.
we carried out automated structure determinations on all 48 MAD, SAD and MIR structures in theTo evaluate the overall contribution of the Bayesian scoring approach described here to structure solution, we compared the qualities of the final density-modified maps obtained with the PHENIX AutoSol wizard using each of three different methods of making decisions during the heavy-atom solution and phasing steps of The first method (`perfect scoring') was to use the actual of each experimental map with that of the corresponding idealized map (using phases from a refined model) to decide which map was best during structure solution. Once density-modified maps had been calculated, the correlations of those maps with the idealized map were used for the final ranking. The second method (`Bayesian scoring') was to use the Bayesian estimates based on the combination of the skewness of electron density and the correlation of local r.m.s. density for decision-making during structure solution. Once density-modified maps had been calculated, a model was built and the correlation between this model and the density-modified map was used for final ranking. The third method (`random scoring') was to use random scores for decision-making during structure solution and then to use the model–map correlation for the final ranking. Fig. 5(a) illustrates these comparisons for MAD structure determinations, Fig. 5(b) illustrates them for SAD structure determinations and Fig. 5(c) for MIR structure determinations.
For MAD, SAD and MIR structure determinations the decision-making procedure using Bayesian estimates of experimental map quality and model–map correlations as estimates of density-modified map quality led to density-modified electron-density maps that were very similar in quality to those obtained using the decision-making process based on actual map quality (Fig. 5). This indicates that the quality of final density-modified maps produced by the PHENIX AutoSol wizard are essentially as good as they can be with any decision-making system, given the algorithms and parameters used to find heavy-atom sites and to carry out phasing, density modification and model building in the wizard.
In addition to the `perfect scoring' and `Bayesian scoring' approaches shown in Fig. 5, the figure includes density-modified map quality for solutions obtained using random scores for experimental maps (but still using model–map correlation to evaluate final density-modified maps). Each `random scoring' value is the average of ten runs with differing random seeds, so they represent an average value of the quality of final maps obtained with random scoring of experimental maps. The quality of these maps is generally lower than that of those obtained with either of the other two methods, showing that the scoring is contributing important information to the structure-determination process.
Although Fig. 5 indicates that the quality of the final maps obtained with the PHENIX AutoSol wizard are essentially as good as they can be with the structure-solution algorithms in the wizard, it is likely that the number of solutions that need to be examined at each stage in could be lowered if improved estimates of experimental map quality were available. The default parameters in the PHENIX AutoSol wizard defining the number of solutions to keep at each stage were chosen to be large enough that the best solution was generally in the set that was considered at each stage using the 48 MAD, SAD and MIR data sets examined in Fig. 5. If improved scoring methods are developed, then a systematic re-examination of these default parameters would probably be useful. In the meantime, modifying these parameters to include larger or smaller numbers of solutions at each stage may be useful in cases that are more challenging or that are more straightforward, respectively.
The skewness of electron-density values in an electron-density map has been recognized for some time as a potential indicator of the quality of the map (Podjarny, 1976; Lunin, 1993). As the skewness of a map is not a familiar quantity to most crystallographers, we illustrate it for `poor' and `good' experimental electron-density maps. Both maps were based on experimental data for aep-transaminase (PDB code 1m32 ; Chen et al., 2002) and were obtained during the course of automated analysis of this data with the PHENIX AutoSol wizard. The poor map was calculated using an incorrect set of heavy-atom sites and the good map was calculated using a largely correct set of heavy-atom sites. Fig. 6 shows histograms of the number of grid points in each map with various values of electron density. The x axis in Fig. 6 corresponds to electron density in a map normalized to the r.m.s. in the map after subtracting the mean of the map from all values. The dotted lines in Fig. 6 illustrate the fraction of grid points in the poor map that correspond to each value of normalized electron density. It may be seen that this histogram of densities from a poor map has a very nearly Gaussian shape. This poor map had a skewness of 0.004 and its correlation to a map based on the refined model of the structure was 0.04. In contrast, the solid lines in Fig. 6 illustrate the fraction of grid points in the good map corresponding to various values of electron density. This histogram differs from that derived from the poor map in that it is not symmetrical. The peak is slightly negative of the origin and it has a distinct tail on the positive side of the peak. This good map had a skewness of 0.4 and its correlation to the map based on the refined model was 0.66. Note that the differences in shapes of the histograms based on poor or good maps can be rather small, as in Fig. 6. Nevertheless, the skewness can usually be estimated very accurately because there are typically tens of thousands of grid points in the maps, so that the shapes of the histograms are very precisely defined.
4. Conclusions
Each of the ten measures of the quality of experimental electron-density maps evaluated here has some utility in estimating the true quality of these maps. These measures of map quality reflect a wide range of characteristics (Tables 1 and 2) ranging from the flatness of the solvent region typically found in macromolecular structures to the connectivity of regions of high electron density corresponding to the chains of polymers in these structures. Overall, the skewness of electron density stands out as the best of these measures (Table 3 and Fig. 2). Used in a simple Bayesian estimator, the correlation between map quality estimated with the skewness of electron density with true map quality is about 0.90, while the next-best estimator (the correlation of local r.m.s. density) gives a correlation of only 0.85. Combining the two yields the most useful estimator we have developed, with a correlation between estimated and actual map quality of 0.92 and an r.m.s. prediction error in map quality of 0.09.
With the exception of the mean figure of merit of phasing, which does not depend on the hand of the heavy-atom substructure, all the measures of map quality analyzed are remarkably good discriminators between maps calculated using the correct and inverse hands of the heavy-atom substructure (Fig. 3).
The PHENIX AutoSol wizard uses a combination of the skewness of electron density and the correlation of local r.m.s. density to form a Bayesian estimator of map quality. The PHENIX AutoSol wizard makes decisions about the heavy-atom substructures to pursue based on these map-quality estimates. Once density-modified maps are available, a model is built into the maps and the map–model correlation is used to identify the best overall solutions. This process yields density-modified electron-density maps of approximately the same overall quality as those obtainable with a perfect decision-making system (Fig. 5).
Our Bayesian estimates of map quality, while highly useful in evaluating experimental maps, are nevertheless not the best indicators of the quality of the corresponding density-modified maps. The map–model correlation obtained after preliminary model building is a better indicator of the quality of density-modified maps (Fig. 4 and Table 6).
In this work, we have ignored the resolution-dependence of the measures of map quality. This is made possible in part by the use of a high-resolution limit of 2.5 Å for all the calculations of map quality and is generally justified by the relatively small remaining resolution-dependence of most of the measures of map quality (Fig. 1). Nevertheless, it seems possible that some improvement in estimation of map quality might be obtained by including the resolution-dependence (or the effective overall isotropic displacement factor) of the data in the analysis. Additionally, we have assumed independence of the various measures of map quality in (8a). We were not able to improve the estimates of map quality using a simple covariance-matrix approach to combining estimates of map quality, but other more sophisticated approaches, together with a much greater set of sample data, might lead to improved estimates of map quality.
Acknowledgements
The authors would like to thank the NIH Protein Structure Initiative for generous support of the PHENIX project (1P01 GM063210). This work was supported in part by the US Department of Energy under Contract No. DE-AC02-05CH11231. RJR is supported by a Principal Research Fellowship from the Wellcome Trust (UK). The authors are grateful to the Joint Center for Structural Genomics for making raw data available at https://www.jcsg.org and to the many researchers who contributed their data to the PHENIX structure library. The worksheets used to generate the figures and the data and scripts used to calculate the values in the figures and tables are available at https://solve.lanl.gov/pub/solve/scoring_2009 .
References
Adams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948–1954. Web of Science CrossRef CAS IUCr Journals Google Scholar
Alphey, M. S., Leonard, G. A., Gourley, D. G., Tetaud, E., Fairlamb, A. H. & Hunter, W. N. (1999). J. Biol. Chem. 274, 25613–25622. Web of Science CrossRef PubMed CAS Google Scholar
Baker, D., Bystroff, C., Fletterick, R. J. & Agard, D. A. (1993). Acta Cryst. D49, 429–439. CrossRef CAS Web of Science IUCr Journals Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, I. N., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542. CrossRef CAS PubMed Web of Science Google Scholar
Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802. CrossRef CAS IUCr Journals Web of Science Google Scholar
Braig, K., Adams, P. D. & Brünger, A. T. (1995). Nature Struct. Biol. 2, 1083–1094. CrossRef CAS PubMed Web of Science Google Scholar
Burling, F. T., Weis, W. I., Flaherty, K. M. & Brünger, A. T. (1996). Science, 271, 72–77. CrossRef CAS PubMed Web of Science Google Scholar
Chen, C. C. H., Zhang, H., Kim, A. D., Howard, A., Sheldrick, G. M., Mariano-Dunaway, D. & Herzberg, O. (2002). Biochemistry, 41, 13162–13169. Web of Science CrossRef PubMed CAS Google Scholar
Chen, S., Jancrick, J., Yokota, H., Kim, R. & Kim, S.-H. (2004). Proteins, 55, 785–791. Web of Science CrossRef PubMed CAS Google Scholar
Cheong, C. G., Bauer, C. B., Brushaber, K. R., Escalante-Semerena, J. C. & Rayment, I. (2002). Biochemistry, 41, 4798–4808. Web of Science CrossRef PubMed CAS Google Scholar
Choi, I.-G., Shin, D. H., Brandsen, J., Jancarik, J., Busso, D., Yokota, H., Kim, R. & Kim, S.-H. (2003). J. Struct. Funct. Genomics, 4, 31–34. CrossRef PubMed CAS Google Scholar
Colovos, C., Toth, E. A. & Yeates, T. O. (2000). Acta Cryst. D56, 1421–1429. Web of Science CrossRef CAS IUCr Journals Google Scholar
Cowtan, K. D. & Main, P. (1996). Acta Cryst. D52, 43–48. CrossRef CAS Web of Science IUCr Journals Google Scholar
Cowtan, K. & Main, P. (1998). Acta Cryst. D54, 487–493. Web of Science CrossRef CAS IUCr Journals Google Scholar
Daniels, D. L., Cohen, A. R., Anderson, J. M. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 317–325. Web of Science CrossRef CAS PubMed Google Scholar
DiDonato, M. et al. (2006). Proteins, 65, 771–776. Web of Science CrossRef PubMed CAS Google Scholar
Ducros, V. M., Lewis, R. J., Verma, C. S., Dodson, E. J., Leonard, G., Turkenburg, J. P., Murshudov, G. N., Wilkinson, A. J. & Brannigan, J. A. (2001). J. Mol. Biol. 306, 759–771. Web of Science CrossRef PubMed CAS Google Scholar
Eicken, C., Sharma, V., Klabunde, T., Lawrenz, M. B., Hardham, J. M., Norris, S. J. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 21691–21696. Web of Science CrossRef PubMed CAS Google Scholar
Esser, L., Wang, C. R., Hosaka, M., Smagula, C. S., Sudhof, T. C. & Deisenhofer, J. (1998). EMBO J. 17, 977–984. Web of Science CrossRef CAS PubMed Google Scholar
Gordon, E., Flouret, B., Chantalat, L., van Heijenoort, J., Mengin-Lecreulx, D. & Dideberg, O. (2001). J. Biol. Chem. 276, 10999–11006. Web of Science CrossRef PubMed CAS Google Scholar
Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966–1973. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hamilton, W. C. (1964). Statistics in Physical Science. New York: The Ronald Press Company. Google Scholar
Han, G. W. et al. (2006). Proteins, 64, 1083–1090. Web of Science CrossRef PubMed CAS Google Scholar
Huang, C.-C., Smith, C. V., Glickman, M. S., Jacobs, W. R. Jr & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 11559–11569. Web of Science CrossRef PubMed CAS Google Scholar
Huber, A. H., Nelson, W. J. & Weis, W. I. (1997). Cell, 90, 871–882. CrossRef CAS PubMed Web of Science Google Scholar
Hyman, J., Chen, H., Di Fiore, P. P., De Camilli, P. & Brunger, A. T. (2000). J. Cell. Biol. 149, 537–546. Web of Science CrossRef PubMed CAS Google Scholar
James, M. N. & Sielecki, A. R. (1983). J. Mol. Biol. 163, 299–361. CrossRef CAS PubMed Web of Science Google Scholar
Jiang, J., Zhang, Y., Krainer, A. R. & Xu, R. M. (1999). Proc. Natl Acad. Sci. USA, 96, 3572–3577. Web of Science CrossRef PubMed CAS Google Scholar
Jin, K. K. et al. (2006). Proteins, 63, 1112–1118. Web of Science CrossRef PubMed CAS Google Scholar
Kabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kazantsev, A. V., Krivenko, A. A., Harrington, D. J., Carter, R. J., Holbrook, S. R., Adams, P. D. & Pace, N. R. (2003). Proc. Natl Acad. Sci. USA, 100, 7497–7502. Web of Science CrossRef PubMed CAS Google Scholar
Leslie, A. G. W. (1987). Acta Cryst. A43, 134–136. CrossRef CAS Web of Science IUCr Journals Google Scholar
Leslie, A. G. W. (1992). Jnt CCP4/ESF–EACBM Newsl. Protein Crystallogr. 26 Google Scholar
Lunin, V. Y. (1993). Acta Cryst. D49, 90–99. CrossRef CAS Web of Science IUCr Journals Google Scholar
McCoy, A. J., Storoni, L. C. & Read, R. J. (2004). Acta Cryst. D60, 1220–1228. Web of Science CrossRef CAS IUCr Journals Google Scholar
Muchmore, C. R., Krahn, J. M., Kim, J. H., Zalkin, H. & Smith, J. L. (1998). Protein Sci. 7, 39–51. Web of Science CrossRef CAS PubMed Google Scholar
Nanao, M. H., Sheldrick, G. M. & Ravelli, R. B. G. (2005). Acta Cryst. D61, 1227–1237. Web of Science CrossRef CAS IUCr Journals Google Scholar
Newman, J., Peat, T. S., Richard, R., Kan, L., Swanson, P. E., Affholter, J. A., Holmes, I. H., Schindler, J. F., Unkefer, C. J. & Terwilliger, T. C. (1999). Biochemistry, 38, 16105–16114. Web of Science CrossRef PubMed CAS Google Scholar
Ostermeier, C. & Brunger, A. T. (1999). Cell, 96, 363–374. Web of Science CrossRef PubMed CAS Google Scholar
Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326. CrossRef CAS Web of Science Google Scholar
Peat, T. S., Newman, J., Waldo, G. S., Berendzen, J. & Terwilliger, T. C. (1998). Structure, 6, 1207–1214. Web of Science CrossRef CAS PubMed Google Scholar
Perrakis, A., Morris, R. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458–463. Web of Science CrossRef PubMed CAS Google Scholar
Pflugrath, J. W. (1999). Acta Cryst. D55, 1718–1725. Web of Science CrossRef CAS IUCr Journals Google Scholar
Podjarny, A. D. (1976). PhD thesis. Weizmann Institute of Science. Google Scholar
Rice, L. M. & Brunger, A. T. (1999). Mol. Cell, 4, 85–95. Web of Science CrossRef PubMed CAS Google Scholar
Rife, C. et al. (2005). Proteins, 61, 449–453. Web of Science CrossRef PubMed CAS Google Scholar
Rozwarski, D. A., Diederichs, K., Hecht, R., Boone, T. & Karplus, P. A. (1996). Proteins, 26, 304–313. CrossRef CAS PubMed Google Scholar
Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779. Web of Science CrossRef CAS IUCr Journals Google Scholar
Schwarzenbacher, R. et al. (2006). Proteins, 65, 243–248. Web of Science CrossRef PubMed CAS Google Scholar
Sevcik, J., Dauter, Z., Lamzin, V. S. & Wilson, K. S. (1996). Acta Cryst. D52, 327–344. CrossRef CAS IUCr Journals Google Scholar
Shamoo, Y., Krueger, U., Rice, L. M., Williams, K. R. & Steitz, T. A. (1997). Nature Struct. Biol. 4, 215–222. CrossRef CAS PubMed Web of Science Google Scholar
Sharma, V., Grubmeyer, C. & Sacchettini, J. C. (1998). Structure, 6, 1587–1599. Web of Science CrossRef CAS PubMed Google Scholar
Sharma, V., Sharma, S., Hoener zu Bentrup, K., McKinney, J. D., Russell, D. G., Jacobs, W. R. Jr & Sacchettini, J. C. (2000). Nature Struct. Biol. 7, 663–668. Web of Science CrossRef PubMed CAS Google Scholar
Sheldrick, G. M. (2002). Z. Kristallogr. 217, 644–650. Web of Science CrossRef CAS Google Scholar
Shin, D. H., Lou, Y., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2005). J. Struct. Biol. 152, 113–117. Web of Science CrossRef PubMed CAS Google Scholar
Shin, D. H., Nguyen, H. H., Jancarik, J., Yokota, H., Kim, R. & Kim, S.-H. (2003). Biochemistry, 42, 13429–13437. Web of Science CrossRef PubMed CAS Google Scholar
Shin, D. H., Roberts, A., Jancarik, J., Yokota, H., Kim, R., Wemmer, D. E. & Kim, S.-H. (2003). Protein Sci. 12, 1464–1472. Web of Science CrossRef PubMed CAS Google Scholar
Shin, D. H., Yokota, H., Kim, R. & Kim, S.-H. (2002). Proc. Natl Acad. Sci. USA, 99, 7980–7985. Web of Science CrossRef PubMed CAS Google Scholar
Skinner, M. M., Zhang, H., Leschnitzer, D. H., Guan, Y., Bellamy, H., Sweet, R. M., Gray, C. W., Konings, R. N., Wang, A. H. & Terwilliger, T. C. (1994). Proc. Natl Acad. Sci. USA, 91, 2071–2075. CrossRef CAS PubMed Web of Science Google Scholar
Stroud, R. M. & Fauman, E. B. (1995). Protein Sci. 4, 2392–2404. CrossRef CAS PubMed Google Scholar
Sutton, R. B., Ernst, J. A. & Brunger, A. T. (1999). J. Cell Biol. 147, 589–598. Web of Science CrossRef PubMed CAS Google Scholar
Sutton, R. B., Fasshauer, D., Jahn, R. & Brünger, A. T. (1998). Nature (London), 395, 347–353. Web of Science CAS PubMed Google Scholar
Tanner, J. J., Lei, B., Tu, S. C. & Krause, K. L. (1996). Biochemistry, 35, 13531–13539. CrossRef CAS PubMed Web of Science Google Scholar
Tavares, G. A., Panepucci, E. H. & Brunger, A. T. (2001). Mol. Cell, 8, 1313–1325. Web of Science CrossRef PubMed CAS Google Scholar
Terwilliger, T. C. (1994). Acta Cryst. D50, 11–16. CrossRef CAS Web of Science IUCr Journals Google Scholar
Terwilliger, T. C. (1999). Acta Cryst. D55, 1863–1871. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2000). Acta Cryst. D56, 965–972. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2001). Acta Cryst. D57, 1763–1775. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2002a). Acta Cryst. D58, 2082–2086. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2002b). Acta Cryst. D58, 2213–2215. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2003). Acta Cryst. D59, 1688–1701. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. & Berendzen, J. (1996). Acta Cryst. D52, 749–757. CrossRef CAS Web of Science IUCr Journals Google Scholar
Terwilliger, T. C. & Berendzen, J. (1997). Acta Cryst. D53, 571–579. CrossRef CAS Web of Science IUCr Journals Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999a). Acta Cryst. D55, 501–505. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999b). Acta Cryst. D55, 1872–1877. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61–69. Web of Science CrossRef CAS IUCr Journals Google Scholar
Turner, M. A., Yuan, C. S., Borchardt, R. T., Hershfield, M. S., Smith, G. D. & Howell, P. L. (1998). Nature Struct. Biol. 5, 369–376. Web of Science CrossRef CAS PubMed Google Scholar
Vellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). J. Appl. Cryst. 28, 347–351. CrossRef CAS IUCr Journals Google Scholar
Walsh, M. A., Otwinowski, Z., Perrakis, A., Anderson, P. M. & Joachimiak, A. (2000). Structure, 8, 505–514. Web of Science CrossRef PubMed CAS Google Scholar
Wang, B.-C. (1985). Methods Enzymol. 115, 90–112. CrossRef CAS PubMed Google Scholar
Weeks, C. M., Adams, P. D., Berendzen, J., Brunger, A. T., Dodson, E. J., Grosse-Kunstleve, R. W., Schneider, T. R., Sheldrick, G. M., Terwilliger, T. C., Turkenburg, M. G. & Usón, I. (2003). Methods Enzymol. 374, 37–82. Web of Science CrossRef PubMed CAS Google Scholar
Willis, M. A., Bishop, B., Regan, L. & Brunger, A. T. (2000). Structure Fold. Des. 8, 1319–1328. Web of Science CrossRef PubMed CAS Google Scholar
Wilson, M. A. & Brunger, A. T. (2000). J. Mol. Biol. 301, 1237–1256. Web of Science CrossRef PubMed CAS Google Scholar
Xu, Q. et al. (2004). Proteins, 56, 171–175. Web of Science CrossRef PubMed CAS Google Scholar
Xu, Q., Krishna, S. S. et al. (2006). Proteins, 65, 777–782. Web of Science CrossRef PubMed CAS Google Scholar
Xu, Q., Schwarzenbacher, R., Krishna, S. S. et al. (2006). Proteins, 64, 808–813. Web of Science CrossRef PubMed CAS Google Scholar
Xu, Q., Schwarzenbacher, R., McMullan, D. et al. (2006). Proteins, 62, 292–296. Web of Science CrossRef PubMed CAS Google Scholar
Yang, D., Shipman, L. W., Roessner, C. A., Scott, A. I. & Sacchettini, J. C. (2002). J. Biol. Chem. 277, 9462–9467. Web of Science CrossRef PubMed CAS Google Scholar
Yu, R. C., Hanson, P. I., Jahn, R. & Brünger, A. T. (1998). Nature Struct. Biol. 5, 803–811. Web of Science CrossRef CAS PubMed Google Scholar
Yu, R. C., Jahn, R. & Brünger, A. T. (1999). Mol. Cell, 4, 97–107. Web of Science CrossRef PubMed CAS Google Scholar
Zhang, L. & Doudna, J. A. (2002). Science, 295, 2084–2088. Web of Science CrossRef PubMed CAS Google Scholar
Zubieta, C. et al. (2007). Proteins, 68, 999–1005. Web of Science CrossRef PubMed CAS Google Scholar
Zwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 43, contribution 7. Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.