To B or not to B: a question of resolution?
`And thus the native hue of resolution is sicklied o'er with the pale cast of thought', Hamlet, act 3 scene 1.
In choosing and refining any crystallographic structural model, there is tension between the desire to extract the most detailed information possible and the necessity to describe no more than what is justified by the observed data. A more complex model is not necessarily a better model. Thus, it is important to validate the choice of parameters as well as validating their refined values. One recurring task is to choose the best model for describing the displacement of each atom about its mean position. At atomic resolution one has the option of devoting six model parameters (a `thermal ellipsoid') to describe the displacement of each atom. At medium resolution one typically devotes at most one model parameter per atom to describe the same thing (a `B factor'). At very low resolution one cannot justify the use of even one parameter per atom. Furthermore, this aspect of the structure may be described better by an explicit model of bulk displacements, the most common of which is the translation/libration/screw (TLS) formalism, rather than by assigning some number of parameters to each atom individually. One can sidestep this choice between atomic displacement parameters and TLS descriptions by including both treatments in the same model, but this is not always statistically justifiable. The choice of which treatment is best for a particular structure refinement at a particular resolution can be guided by general considerations of the ratio of model parameters to the number of observations and by specific statistics such as the Hamilton R-factor ratio test.
Since at least the time of the 14th-century logician William of Occam, scientific models have been scrutinized for the possible flaw of being overly complex. A succinct modern formulation of Occam's `razor' is the admonition by Albert Einstein that `everything should be made as simple as possible, but not simpler'. Our confidence in a complicated model with many adjustable parameters is weakened if it is supported by only a small number of data points. Conversely, we assign only weak utility to a simplistic model that fails to explain obvious patterns in a very rich set of observations. Unfortunately, it is not always a straightforward task to decide whether a model is too complex or too simple.
Applying Occam's razor to structural models in biological crystallography is rarely straightforward. Although the number of observations is large, the number of parameters required to describe a biological macromolecule is also very large. If the diffraction measurements are limited to 2 Å resolution then there are typically about eight intensity measurements available for each non-H atom in a protein crystal structure. If one models each atom as having a position [x, y, z] and a thermal parameter Biso, this corresponds to ∼2 observations per model parameter (Fig. 1). As Lyle Jensen observed in earlier days of macromolecular crystallography, `The problem is overdetermined and there is no in principle reason why refinement should not be possible' (Jensen, 1974). The number of observations falls as the cube of the resolution, however, so that the problem ceases to be overdetermined at lower resolutions. Thus, in practice, model refinement is possible only if the experimental observations are supplemented with restraints on the model's geometric and other properties. In a typical protein refinement at less than atomic resolution the number of restraints can be much larger than the number of observations. Sufficiently strong restraints can force the model to become numerically well behaved during refinement. The restraints mitigate, but do not remove, the concern that inclusion of unjustified parameters in the model being refined can degrade the model quality through overfitting. A simple model with fewer restraints may still be better than a more complex highly restrained model, particularly when low resolution limits the observation-to-parameter ratio.
How, then, can one decide whether a model has been sufficiently restrained, whether it has been overfitted and whether it is too complex? I will first consider some general approaches and then examine a series of examples in which there is a choice between a simpler model and a more complex model. In all of these examples the difference in model complexity arises from different parameterization of the description of atomic displacements. The simplest such treatment is to assign only an overall description Uoverall that applies equally to all atoms. This results in a model that contains three parameters [x, y, z] for each atom plus one to six global parameters depending on whether Uoverall is isotropic or anisotropic. Thus, for an N-atom structure the simplest model contains (3N + 1) parameters. The most complex model that we will consider is a treatment that assigns each atom an individual 3 × 3 symmetric tensor Uij describing a thermal ellipsoid, which yields a total of 9N model parameters. Between these two extremes are hybrid models that contain some combination of individual per-atom isotropic terms Biso and translation/libration/screw (TLS) group descriptions of bulk displacement (Schomaker & Trueblood, 1968; Winn et al., 2001; Painter & Merritt, 2006).
Until recently, the issue of how to model atomic displacement was often reduced to a rough rule of thumb that at very high resolution one should model anisotropy of individual atoms using a six-parameter thermal ellipsoid, at very low resolution one should assign a shared B factor to groups of atoms and for everything in between one should refine an isotropic B factor for each atom. Even aside from the vagueness of where to draw the boundaries for `very high' and `very low' resolution, the introduction of TLS as an alternative description of atomic displacement has made this rule of thumb obsolete. Unfortunately, it sometimes seems to have been replaced by an assumption that the best treatment at all resolutions is to include both individual isotropic B factors and some number of TLS groups. I advocate that rather than following any such rule of thumb, the best treatment of displacements and anisotropy should be validated for each structure based on the experimental data and refinement statistics.
In validating an existing structural model we are confirming that it does not conflict with the experimental data and equally that it does not conflict with prior knowledge. Validation is essential to assure confidence in the scientific conclusions drawn on the basis of the model. Comprehensive overviews of crystallographic model validation may be found elsewhere (Kleywegt, 2000, 2009; Chen et al., 2010). Here, we will only touch briefly upon issues related to validation of B factors or other descriptions of atomic displacement.
The agreement of the model with the experimental data is conventionally quantified as a crystallographic residual R, which may be calculated using intensities (I) or structure factors (F). Several variants of R are in common use. The conventional unweighted residual R calculated for F is given by
R is closely related to the target function being minimized during refinement. Therefore, if the same observations contributing to refinement are used to calculate R, its value will normally decrease as a consequence of refinement. In order to test for overfitting, a fraction of the reflections may be omitted from the target function during refinement. Two residuals are then calculated. The reflections used in refinement are used to calculate Rwork, and the remaining reflections omitted from refinement are used to calculate a corresponding residual Rfree. If Rfree does not decrease in parallel with Rwork as a result of refinement, this is an indication of overfitting (Brünger, 1992, 1997; Tickle et al., 1998, 2000). It is important to note that in this context Rfree is being used to validate choices affecting the progress of a refinement, for example the strength of restraint weights, rather than to validate the selection of a model to be refined.
A related quantity, the use of which will be explored below, is Hamilton's generalized residual RG,
Most validation tests assess agreement of the model with prior knowledge. These tests encompass everything from detecting local problems such as a single poorly modeled residue to global issues such as implied inconsistency with the known biological properties of the molecule. For example, there is much prior knowledge about bond lengths and angles in organic molecules and about the joint distribution of the paired torsion angles [φ, ψ] at each peptide linkage in a protein. Significant deviation from these expectations for a particular residue may indicate a local problem, reducing our confidence in local features of the model, without necessarily implying that the overall model is poor.
Analogous prior expectations can be applied to detect local problems in atomic displacement parameters (ADPs). The tensor Uij describing anisotropic displacement of a particular atom, whether refined directly or derived from inclusion in a TLS group, should be positive definite. Bonded atoms are expected to exhibit similar atomic vibrations; in particular, the vibrational components along their mutual bond are expected to be equal (Hirshfeld, 1976; Rosenfield et al., 1978). The overall distribution of anisotropy for atoms in a protein crystal structure is expected to be approximately Gaussian, with a mean axial ratio in the range 0.45–0.55 (Merritt, 1999a; Zucker et al., 2010). If atomic displacements within a protein chain are described by segmenting the chain into multiple TLS groups, then the vibration of atoms at the junction of two adjoining TLS groups is expected to be described consistently by both sets of TLS parameters (Zucker et al., 2010). One resource that provides validation tests based on agreement with prior expectations for these various properties of displacement parameters is the PARVATI server http://www.bmsc.washington.edu/parvati .
There is a possible problem in validating B factors. Before we can identify prior expectations for their distribution, we must first establish the physical meaning of individual values. The IUCr defines both isotropic and anisotropic ADPs as representing `atomic motion and possible static displacive disorder' (Trueblood et al., 1996). Under this interpretation the [x, y, z] coordinates of an atom represent its true mean position, and the ADP values represent displacement about this mean position. Because we understand this displacement as arising from physical vibration, we can establish an expected distribution of ADP values based on models of physically reasonable modes of vibration. We expect that vibrational modes involving multiple atoms will lead to correlated displacement of those atoms and hence to correlated ADP values.
However, some programs also use or allow large B values to represent general uncertainty that a portion of the structure has been correctly modeled. Under this interpretation the nominal [x, y, z] coordinates may not be correct at all, and the `B value' is a measure of relative confidence rather than displacement about some mean position. Although it could be argued that being somewhere else entirely is an example of `displacive disorder' allowed by the IUCr definition, such interpretation is of little use in establishing prior expectations or validation criteria. From this perspective, general uncertainty and the known presence of multiple possible locations of the atom or group in question are both represented better by occupancy <1 rather than by an arbitrarily large B factor. This distinction is particularly important if the B values are used to determine, refine or validate the assignment of TLS groups.
While validation of the stereochemistry and other physical properties of a model after refinement is essential, it is not the end of the story. It neither asks nor answers the question `was this the best model to refine?'. In particular, it does not address the question of whether a simpler model would suffice. I have already noted that a more complex model is expected to yield a better residual R after refinement, and that a failure to reduce Rfree in parallel is an indicator for overfitting. However, even if the more complex model yields both lower R and lower Rfree, we can still ask whether this improvement is statistically significant.
One approach is to compare the residuals obtained experimentally for the new structure with either empirical or theoretical expectations for the conventional R and Rfree obtained for a model of this size and complexity (Kleywegt & Brünger, 1996; Tickle et al., 2000). In order to derive a quantitative significance level, it is preferable to replace the conventional residuals with variants whose statistical properties are better defined. If one replaces the conventional residual R with the generalized residual RG (equation 2), then it is possible to derive significance by consideration of the ratio of the R factors for the simple and the complex models (Hamilton, 1965). Hamilton's original formulation considered the case in which a simpler model was related to a more complex model by the addition of a set of linear constraints. Furthermore, Hamilton was concerned with the typical crystallographic problems of the day, for which both the number of observations and the number of parameters were small and the weighting factor wi used in refinement was the same for all reflections.
Bacchi et al. (1996) reformulated this approach for application to macromolecular models, where both the number of observations and the number of parameters are much larger and both the simple and complex models are refined with restraints. With a slight change in notation, we may restate the reformulated significance test as follows.
Now consider two refined models with residuals RG(1) and RG(2) and degrees of freedom DF(1) and DF(2). Let model 2 be the more complex model; by which we mean that it has more parameters and/or fewer restraints. By (3) above, the complex model has fewer degrees of freedom than the simpler model, so DF(1)/DF(2) is always greater than one. The simpler model is expected to have a higher R factor than the more complex model, in which case the ratio RG(1)/RG(2) will also be greater than one. However, the lower R factor for the more complex model indicates a significant improvement only if this ratio also satisfies
Note that the number of degrees of freedom depends on an effective restraint weight defined such that weffective = 0 corresponds to ignoring the restraints and weffective = 1 corresponds to treating each restraint as a full constraint analogous to adding one observation or reducing the parameter count by one parameter. Because we will be considering model pairs that differ only in their treatment of ADPs, we further subdivide the restraints into geometric restraints present in both models and ADP restraints that may be present in only one of the two models,
A major difficulty in applying the Hamilton R-factor ratio test is that the value of weffective is in general unknown. In some cases the analysis can proceed nevertheless by evaluating (4) across the entire range of possible values for weffective (Bacchi et al., 1996). If the test for significance yields the same result when evaluated at both extreme values of weffective then we can accordingly either accept or reject the more complex model even though the exact value of [DF(1)/DF(2)]1/2 remains unknown. One extreme, weffective = 0, corresponds to unrestrained refinement. Evaluation of (4) at this extreme is straightforward. The other extreme is bounded by weffective < 1, but 1 is a very weak upper bound that could only be reached if all restraints were independent. In practice, the restraints applied during macromolecular refinement are far from independent (there are many more restraints than there are parameters) and are assigned a fractional weight during refinement in order to balance their contribution to the overall residual. As a result, weffective << 1.
It is possible that one could derive a good estimate for weffective based on the deviation of the restrained parameters from their target restraint values at the end of refinement, i.e. the largest deviations are expected when the refinement is unrestrained (weffective = 0) and the smallest deviations, possibly zero, are expected when the restraints are so tight that they act as constraints. However, no quantitative procedure for making such an estimate has yet been developed. Nevertheless, for the examples presented below we use this argument to set an upper bound on the possible values of wgeom and wADP. Since a fully constrained geometric model would require no more than one constraint per coordinate, we set an upper bound on the limiting condition max(wgeom) = 3 × Natoms/Ngeom_restraints. Similarly, a fully constrained set of isotropic ADP values would require at most one constraint per atom, so we set an upper bound on the limiting condition max(wADP) = Natoms/NADP_restraints. These are weak upper bounds, a fact that can be seen empirically by noting that refinement of the restrained model does not normally converge to a model in which the parameter values fully conform to the restraint targets as they would for true constraints.
The upper bound for wADP is especially weak, because all of the restraints applied to isotropic ADPs during refinement contain a multiplicative term of the form [B(atom i) − B(atom j)], where B is either the individual isotropic Biso or the residual per-atom contribution to a TLS model Bresid. Thus, for isotropic ADPs a fully constrained model satisfying these restraints would have equal B terms for all atoms. The limiting case of a model refined with max(wADP) as defined above converges to being identical to a simpler model with a single Boverall and perhaps one or more TLS groups. Thus, we know that in practice wADP << max(wADP) both because the R factors for the simple and complex models are different and because the refined B values are not, in fact, identical.
The number of observations available per model parameter becomes an increasing concern for lower resolution data. At 3 Å resolution, the number of available observations is insufficient to support refinement of four parameters per atom in the absence of additional restraints (Fig. 1). Depending on the individual structure being modeled, the available data may or may not justify refinement of separate ADPs for each atom even in the presence of restraints. Table 1 and Fig. 2 illustrate the use of refinement statistics to guide the choice between refining a conventional model with four parameters per atom (x, y, z, Biso) or a simpler model containing no per-atom displacement parameters. We chose PDB entry 3hzr (Merritt et al., 2011) as a representative 3 Å resolution structure to use for this example. The 3hzr model contains three dimers in the asymmetric unit, comprising a total of 2262 protein residues with no water molecules or other nonprotein atoms.
We first consider the choice between two very simple models that contain no per-atom displacement parameters. The simpler of the two models contains six parameters Uijoverall. The slightly more complex alternative is a pure TLS model containing one TLS group to describe each protein chain for a total of 120 ADPs (six protein chains, one TLS group per chain, 20 parameters per TLS group). The more complex model yields substantially lower residuals R and Rfree (Table 1). The corresponding Hamilton R-factor ratio is 0.2827/0.2366 = 1.19. Although we do not know the exact value of wgeom, in this case the ratio DF(1)/DF(2) is insensitive to this unknown parameter and is strictly less than 1.19 over the entire range of possible values for wgeom (Fig. 2a). Therefore, the improvement in residuals for the more complex model is significant and we choose the pure TLS model over the simpler alternative model.
We next compare the pure TLS model in turn to a more complex model with no TLS but containing one ADP, Biso, for each atom. The conventional R factor yielded by refinement is nearly the same for both models, but Rfree is considerably higher for the more complex model (Table 1). Therefore, in this case examination of the conventional R factors already indicates that the more complex model is not justified. Let us see what the Hamilton R-factor ratio test indicates. For this test case RG(1)/RG(2) = 1.04 and the criterion in (4) could only be satisfied for values of wADP very near its limiting value Natoms/NADP_restraints (Fig. 2b). However, we know that wADP is not near the limiting case of fully constrained Biso values, because that would correspond to a model in which all ADP values are nearly equal. That is, the limiting case of maximal wADP is equivalent to the model with a single overall description Uoverall, which we have already considered and rejected. For values of wgeom and wADP away from their limiting maxima, the Hamilton test indicates rejection of the more complex model with individual Biso parameters in favor of the simpler pure TLS model.
This set of tests does not inevitably yield the same decision (that one should use a pure TLS model) when applied to other 3 Å resolution structure refinements. Although we selected 3hzr as representative, it has at least two features that are atypical. Its solvent content is 56%, which is higher than average and results in a slightly higher number of observations per atom than most 3 Å resolution structures. This would tend to increase our expectation that the more complex Biso model might be statistically justified. Counteracting this tendency, the structure exhibits atypically extreme overall anisotropy (Amean = 0.30, σA = 0.15), perhaps owing to loose lattice packing. The simple TLS model allows description of this anisotropy, whereas the more complex Biso model does not. This raises the question whether in this particular case the Biso model is a failure not because of the larger number of parameters, but because it fails to account for anisotropy. We will next test whether a more complex hybrid model that includes both Biso terms and TLS terms is statistically justified.
It has become increasingly common to model the ADPs in a macromolecular structure using both an individual Biso parameter for each atom and some form of TLS model to describe anisotropic displacement of those same atoms. Let us continue examination of the 3hzr refinement to evaluate the justification for such a hybrid model at the low end of the resolution range where it might be applicable (Fig. 3a). The simpler model in this case is the same pure TLS model with one TLS group per chain used in Fig. 2(b). The more complex model in this case includes these same TLS groups and in addition contains individual Biso terms for the protein atoms. The surfaces in Fig. 3(a) are remarkably similar to that in Fig. 2(b) and the conclusion is the same. The more complex model is statistically justified only if we believe that the Biso parameters are so tightly restrained that they are close to functioning as constraints.
Fig. 3(b) shows the application of the same significance test using as a test case the structure of a homolog to 3hzr that was determined at 2.32 Å resolution (PDB entry 3m5w ; Center for Structural Genomics of Infectious Diseases, unpublished work). The refinement statistics for 3m5w are given in Table 2. In contrast to the case of 3hzr , the hybrid model for 3m5w is superior to the pure TLS model with one TLS group per chain for all possible restraint weights except the unrealistic set wADP = wgeom ≃ 0 corresponding to unrestrained refinement.
It may seem natural to use an analogous significance test to determine whether or not it is justified to add a TLS description to a model that has already been refined with individual Biso parameters. However, the Hamilton R-factor ratio is only a weak test for this purpose because the change in the overall number of parameters is very small. That is, there are typically already thousands of ADP parameters; adding 20 more for each TLS group is a very small incremental change. For a structure with thousands of atoms per chain, associating an additional 20 TLS parameters with each chain will yield a test criterion [DF(1)/DF(2)]1/2 on the order of 1.001. Thus, according to the R-factor ratio criterion, the addition of TLS can be justified by any marginal improvement in the residuals. In the particular case of 3m5w , the simple TLS model describing each protein chain by a single TLS group yields only a slight improvement in the conventional R and Rfree (Table 2) and the corresponding Hamilton R-factor ratio is only RG(1)/RG(2) = 1.01. Nevertheless, this is larger than the test criterion for all possible values of the effective restraint weights, justifying acceptance of the hybrid model.
If true atomic resolution data have been measured, it is both justifiable and informative to refine a structural model containing anisotropic ADPs Uij for each atom (Schneider, 1996; Howard et al., 2004). As the available resolution falls off from this extreme, the number of observations eventually becomes insufficient to support such a complex model and simpler alternative models should be considered. It is instructive to see whether the R-factor ratio test is capable of indicating this resolution-dependent breakdown in the validity of a fully anisotropic model. One way to explore this is to conduct a set of parallel refinements that use the same starting model and differ only in the resolution of the data used. Fig. 4 shows the result of three such parallel refinements using as a test case human carbonic anhydrase II. When data to 1.3 Å resolution are used (Fig. 4a), the Hamilton test clearly indicates that it is justified to select a fully anisotropic model rather than a simpler hybrid model. If the data are limited to 1.7 Å resolution (Fig. 4c), the same test clearly indicates that the fully anisotropic model is not justified, and thus the simpler hybrid model is preferable. Given the weak bounds we are able to place on wgeom and wADP, it is perhaps not surprising that the analysis is indecisive at the intermediate resolution of 1.5 Å (Fig. 4b). Over most of the range of the effective restraint weights in this intermediate case the R-factor ratio test indicates we should reject use of the fully anisotropic model, but rejection is not indicated if wADP lies near its upper bound.
One could of course consider choosing a purely isotropic model even at very high resolution. Continuing with the use of carbonic anhydrase as a test case, Table 3 lists the outcomes of refining isotropic, hybrid and anisotropic models against 1.5 Å resolution data. Comparison of the fully isotropic model to the fully anisotropic model using the R-factor ratio test at this resolution yields an inconclusive result similar to that in Fig. 4(b). However, applying the R-factor ratio test to directly compare the purely isotropic model with the hybrid model clearly indicates that the hybrid model is preferred (not shown).
In cases where application of the Hamilton R-factor ratio test indicates that a more complex model should be rejected, can one find empirical evidence of defects in the rejected model? To address this question, we chose as a test case the well studied structure of human carbonic anhydrase II. Diffraction data for this structure are available to better than 0.90 Å resolution. We had previously refined atomic resolution models for this structure using several protocols (Behnke et al., 2010). One of these was a 0.95 Å resolution refinement using SHELXL (Sheldrick & Schneider, 1997) that included full-matrix estimation of the final error in both the coordinates and the anisotropic ADP terms Uij (PDB entry 1lug ). The 1lug model was chosen as a reference gold standard for assessing the accuracy of model ADPs obtained from refinement using data truncated to successively lower resolution limits. This is an idealized test case, as both the data and the starting model taken into refinement at lower resolutions are unrealistically good. That is, an atomic resolution data set truncated to, say, 1.8 Å is of better quality than a typical 1.8 Å resolution data set. Furthermore, the starting model taken into refinement included features identified in the original atomic resolution refinement, for example alternate conformations and partial-occupancy water sites, that would not typically be part of a model initially determined at lower resolution. For these reasons it is probable that this idealized test underestimates the typical degradation in the accuracy of model parameters at any specific resolution. Nevertheless, the statistical signatures of increasing model degradation as the data available for refinement decrease should parallel that expected for less ideal data.
Fig. 5 shows the conventional crystallographic residuals R and Rfree resulting from refinement of the same starting model using the 1lug 0.9 Å resolution data truncated successively to eight different resolution limits from 1.1 to 1.8 Å. At each resolution, three different models were refined, differing in their treatment of ADPs. The simplest, isotropic, model contained one ADP (Biso) for each atom. The most complex, fully anisotropic, model contained six ADPs (Uij) for each atom. The third model was a hybrid in which each atom was assigned an individual isotropic parameter Biso and in addition the protein chain was divided into 16 segments each described by a set of 20 TLS parameters. The net anisotropic displacement of each atom in the hybrid model is thus the sum of contributions from the TLS description for the group to which it belongs and from the individual atomic Biso.
Note that at every resolution both R and Rfree are highest for the isotropic model and lowest for the anisotropic model. Thus, if one were using only the existence of a drop in Rfree as a guide to model selection the fully anisotropic model would be chosen even at the poorest resolution, 1.8 Å. As we saw in Fig. 4, this is contradicted by the Hamilton R-factor ratio test, which indicates that the decrease in R for the fully anisotropic model is not statistically significant for the lower resolution refinements and thus should be rejected. Because we have a gold standard available for comparison, we can directly assess the validity of ADPs obtained at lower resolutions by comparing them atom-by-atom with the gold standard anisotropic ADPs in the atomic resolution 1lug model. We will use two statistical measures in this comparison: SUV (Merritt, 1999b) and the Kullback–Leibler divergence (Kullback & Leibler, 1951).
A symmetric form of the Kullback–Leibler divergence between the three-dimensional Gaussian density distributions described by U and V can be calculated using the equation KLUV = trace(UV−1 + VU−1 − 2I) (Murshudov et al., 2011). In the present case, U is the tensor of gold-standard ADPs for a particular atom and V is a lower resolution anisotropic model for that same atom. The value of KLUV is zero if U = V and increases without bound as the difference between the two distributions increases. The lower set of bars in Fig. 6 shows the median value of KLUV obtained by comparing the ADPs V for every atom in each resolution-limited refinement with the gold-standard ADPs U for that same atom in the gold standard. This test shows that the refined ADPs in the resolution-limited anisotropic model refinements become increasingly divergent from the gold standard as the resolution limit becomes more severe. Although the numerical value of the Kullback–Leibler divergence does not by itself tell us at what resolution the model has diverged `too far' from the gold standard, it does allow us to test at what point the ADPs obtained by refinement of a fully anisotropic model become worse than those obtained by refinement of a hybrid TLS model. As seen in Fig. 6, the ADPs from the hybrid TLS model are closer to the gold standard than the ADPs from the fully anisotropic model starting at 1.5 Å resolution.
The statistic SUV is based on the real-space correlation coefficient between two electron-density distributions described by the pair of ADP tensors U and V. A value of SUV > 1 indicates that the electron-density distribution described by U correlates better with the anisotropic distribution described by V than it does with an isotropic distribution. A value of SUV < 1 indicates that the anisotropic model V has worse correlation than an isotropic model. Values of SUV very near to 1 indicate that the agreement of the isotropic and anisotropic models with the gold standard is approximately the same. The fraction of protein atoms in each of these categories is shown in the upper set of bars in Fig. 6.
As one would expect, full anisotropic refinement against data minimally truncated from 0.9 to 1.1 Å resolution does not substantially reduce the agreement of the refined ADP values with the gold standard. Anisotropic treatment at this resolution is better than isotropic treatment for about 81% of the atoms and is no worse for another 17%. The quality of the refined ADP values degrades as the data are further truncated. By 1.6 Å, only 22% of the atoms are described better by an anisotropic model than by an isotropic model, and at this resolution the refined anisotropic ADPs for 32% of the atoms are actually a worse model for the true atomic resolution structure than an isotropic model. We can again compare this with similar analysis of refinement using a hybrid TLS model. In concordance with the analysis based on Kullback–Leibler divergence, the agreement of the hybrid model with the gold standard matches or exceeds that of the fully anisotropic model starting with the 1.5 Å resolution-limited refinement (Fig. 6).
Thus, evaluation of the refined models using either of two measures, SUV or Kullback–Leibler divergence, illustrates that as the resolution decreases the ADPs yielded by fully anisotropic refinement become invalid even though the refinement may remain numerically stable and the internal model statistics appear acceptable. This resolution-dependent breakdown in validity could not be detected by inspection of Rfree, which is lower for the fully anisotropic model than for either the isotropic or hybrid TLS models across the entire resolution range examined (1.1–1.8 Å). In contrast to this, the Hamilton R-factor ratio test is consistent with both empirical assessments in indicating that for this idealized test case the choice of a fully anisotropic model ceases to be justified at roughly 1.5 Å resolution.
6. Refinement protocols
The model statistics in Tables 1, 2 and 3 are the result of refinement using REFMAC v.5.6.0095 (Murshudov et al., 2011). In all cases the starting point for model comparisons was generated by subjecting the corresponding PDB entry coordinates to automated refinement in REFMAC with an isotropic B factor for each atom, no TLS treatment and default settings for all restraint weights. This coordinate set was then used as input for parallel refinements using alternative treatments for atomic displacements. Each refinement protocol was run first using a fixed overall geometric weighting term set to the value used in generating the starting model. If necessary, the refinement was then re-run using a manually adjusted value for the overall geometric weighting term chosen to yield deviations from ideality of the bond lengths and angles of the final model for that refinement protocol close to those of the starting model. Refinement of 3hzr used strong NCS restraints relating the six independent chains. Refinement of 3m5w used no NCS restraints. The refinements of 1lug in Table 3 were all conducted against data truncated to 1.5 Å resolution.
The parallel refinements of 1lug shown in Figs. 5 and 6 all used as a starting point the coordinates and isotropic ADPs from the 1.5 Å isotropic model shown in Table 3. The hybrid models additionally included a 16-group TLS model whose initial parameter values were taken from the 1.5 Å hybrid model shown in Table 3. In each case refinement consisted of 15 cycles of positional and ADP refinement; for the hybrid models, this was preceded by 15 cycles of TLS refinement. For anisotropic refinements, the along-bond ADP restraint RBON was set to 0.1. REFMAC was allowed to set the overall geometric weighting term automatically. The control settings for the individual refinements within a protocol (isotropic, hybrid, anisotropic) differed only in the resolution limit of the data used in refinement.
Hamlet tempered his initial resolve by thinking about the significance of the alternatives available to him. My hope is that the examples presented here will encourage crystallographers to do likewise. Before final acceptance of a structural model, even one that has been refined and validated, it is good to consider whether a simpler alternative model is available. Although the current discussion focuses on alternative treatments of B factors, this advice also applies to other model choices such as the treatment of noncrystallographic symmetry.
I have also taken the opportunity to explore the use of the largely neglected Hamilton R-factor ratio test as one approach to judging whether a more complex structural model is statistically justified. Widespread adoption of this test faces two hurdles: the key residual RG is not reported by commonly used refinement programs and the test itself is weakened by the lack of a precise estimate for the effective restraint weights. The first hurdle can be easily overcome. For example, the PDB_REDO project is implementing automated evaluation of the Hamilton R-factor ratio for model selection (Joosten et al., 2012). It may also be possible to lower the second hurdle by extending the argument advanced above to set weak bounds on wADP and wgeom so that it yields tighter bounds.
The choice `to B' or `not to B' is brought into sharp focus at both high and low resolution by the availability of TLS as an alternative description of atomic displacements. At high resolution the difference in number of parameters between a full anisotropic model and a hybrid Biso + TLS model is more than a factor of two. At low resolution the difference in number of parameters between a model with individual Biso terms and a pure TLS model is even larger. As we saw for the case of 3hzr in Figs. 2 and 3, the answer at 3 Å resolution is sometimes `not to B'.
Similarly, two different empirical assessments of ADP model quality using an atomic resolution structure determination as a gold standard illustrate that a drop in Rfree is not a sufficient indication for choosing the more complex full anisotropic model at high resolution. The Hamilton R-factor ratio test, however, correctly indicates for the test case examined that a fully anisotropic model ceases to be valid at roughly 1.5 Å resolution. This particular resolution should not be interpreted as a new rule-of-thumb! Both the data quality and the starting model used in the idealized test case were unrealistically good for their nominal resolution. It seems likely that for a more representative starting model refined against more typical experimental data, the critical resolution at which a hybrid Biso + TLS model becomes preferred to a full anisotropic model will lie closer to atomic resolution. In any case, the analysis summarized in Figs. 4, 5 and 6 reinforces the recommendation that statistical validation is desirable before accepting a model with a hugely larger number of parameters, even if it yields a decrease in Rfree.
This work was supported by NIH award R01GM080232. Fig. 1 was inspired by a similar figure on a web page created by Konrad Hinsen. I thank Garib Murshudov for suggesting the use of Kullback–Leibler divergence to compare anisotropic ADPs.
Bacchi, A., Lamzin, V. S. & Wilson, K. S. (1996). Acta Cryst. D52, 641–646. CrossRef CAS IUCr Journals
Behnke, C. A., Le Trong, I., Godden, J. W., Merritt, E. A., Teller, D. C., Bajorath, J. & Stenkamp, R. E. (2010). Acta Cryst. D66, 616–627. Web of Science CrossRef IUCr Journals
Brünger, A. T. (1992). Nature (London), 355, 472–475. PubMed Web of Science
Brünger, A. T. (1997). Methods Enzymol. 277, 366–396. CrossRef PubMed CAS Web of Science
Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. Web of Science CrossRef CAS IUCr Journals
Hamilton, W. C. (1965). Acta Cryst. 18, 502–510. CrossRef CAS IUCr Journals Web of Science
Hirshfeld, F. L. (1976). Acta Cryst. A32, 239–244. CrossRef IUCr Journals Web of Science
Howard, E., Sanishvili, R., Cachau, R., Mitschler, A., Chevrier, B., Barth, P., Lamour, V., Van Zandt, M., Sibley, E., Bon, C., Moras, D., Schneider, T., Joachimiak, A. & Podjarny, A. (2004). Proteins, 55, 792–804. CrossRef PubMed CAS
Jensen, L. H. (1974). Annu. Rev. Biophys. Bioeng. 3, 81–93. CrossRef CAS PubMed
Joosten, R. P., Joosten, K., Murshudov, G. N. & Perrakis, A. (2012). Acta Cryst. D68, 484–496. Web of Science CrossRef CAS IUCr Journals
Kleywegt, G. J. (2000). Acta Cryst. D56, 249–265. Web of Science CrossRef CAS IUCr Journals
Kleywegt, G. J. (2009). Acta Cryst. D65, 134–139. Web of Science CrossRef CAS IUCr Journals
Kleywegt, G. J. & Brünger, A. T. (1996). Structure, 4, 897–904. CrossRef CAS PubMed Web of Science
Kullback, S. & Leibler, R. A. (1951). Ann. Math. Stat. 22, 79–86. CrossRef Web of Science
Merritt, E. A. (1999a). Acta Cryst. D55, 1109–1117. Web of Science CrossRef CAS IUCr Journals
Merritt, E. A. (1999b). Acta Cryst. D55, 1997–2004. Web of Science CrossRef CAS IUCr Journals
Merritt, E. A., Arakaki, T. L., Gillespie, R., Napuli, A. J., Kim, J. E., Buckner, F. S., Van Voorhis, W. C., Verlinde, C. L., Fan, E., Zucker, F. & Hol, W. G. (2011). Mol. Biochem. Parasitol. 177, 20–28. CrossRef CAS PubMed
Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367. Web of Science CrossRef CAS IUCr Journals
Painter, J. & Merritt, E. A. (2006). Acta Cryst. D62, 439–450. Web of Science CrossRef CAS IUCr Journals
Rosenfield, R. E., Trueblood, K. N. & Dunitz, J. D. (1978). Acta Cryst. A34, 828–829. CrossRef CAS IUCr Journals
Schneider, T. R. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 133–144. Warrington: Daresbury Laboratory.
Schomaker, V. & Trueblood, K. N. (1968). Acta Cryst. B24, 63–76. CrossRef CAS IUCr Journals Web of Science
Sheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319–343. CrossRef PubMed CAS Web of Science
Tickle, I. J., Laskowski, R. A. & Moss, D. S. (1998). Acta Cryst. D54, 547–557. CrossRef CAS IUCr Journals
Tickle, I. J., Laskowski, R. A. & Moss, D. S. (2000). Acta Cryst. D56, 442–450. Web of Science CrossRef CAS IUCr Journals
Trueblood, K. N., Bürgi, H.-B., Burzlaff, H., Dunitz, J. D., Gramaccioli, C. M., Schulz, H. H., Shmueli, U. & Abrahams, S. C. (1996). Acta Cryst. A52, 770–781. CrossRef CAS Web of Science IUCr Journals
Winn, M. D., Isupov, M. N. & Murshudov, G. N. (2001). Acta Cryst. D57, 122–133. Web of Science CrossRef CAS IUCr Journals
Zucker, F., Champ, P. C. & Merritt, E. A. (2010). Acta Cryst. D66, 889–900. Web of Science CrossRef CAS IUCr Journals
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.