research papers
A robust bulk-solvent correction and anisotropic scaling procedure
aLawrence Berkeley National Laboratory, One Cyclotron Road, Building 64R0121, Berkeley, CA 94720, USA
*Correspondence e-mail: pafonine@lbl.gov
A reliable method for the determination of bulk-solvent model parameters and an overall anisotropic scale factor is of increasing importance as CCTBX bulk-solvent correction and scaling module.
becomes more automated. Current protocols require the manual inspection of results in order to detect errors in the calculation of these parameters. Here, a robust method for determining bulk-solvent and anisotropic scaling parameters in macromolecular is described. The implementation of a target function for determining the same parameters is also discussed. The formulas and corresponding derivatives of the likelihood function with respect to the solvent parameters and the components of anisotropic scale matrix are presented. These algorithms are implemented in theKeywords: bulk-solvent correction; anisotropic scaling.
1. Introduction
Analysis of the Protein Data Bank (PDB; Bernstein et al., 1977; Berman et al., 2000) shows that macromolecular crystals contain a significant amount of disordered solvent. The total solvent content varies around a mean of 55%, with a lower bound of approximately 20% and an upper bound of approximately 95%. The contribution of this bulk solvent to the diffracted amplitudes becomes non-negligible at lower resolution (d > 8.0 Å). In the past, it has been common practice to truncate the low-resolution data and use only middle- and high-resolution shells for crystallographic calculations. More recently, it has been demonstrated that low-resolution data are very important for electron-density map analysis (Urzhumtsev, 1991), crystallographic (Kostrewa, 1997) and the translation search in the molecular-replacement method (Urzhumtsev & Podjarny, 1995; Fokine & Urzhumtsev, 2002b). For a review and more complete set of references see, for example, Jiang & Brünger (1994), Badger (1997) and Urzhumtsev (2000).
Jiang & Brünger (1994) demonstrated that a flat bulk-solvent model (Phillips, 1980) is the most reliable model and proposed an algorithm for calculation of the parameters. This involves the calculation of a solvent mask and the determination of two bulk-solvent parameters, ksol and Bsol. Fokine & Urzhumtsev (2002a) analyzed the distribution of bulk-solvent parameters and provided a more physical insight for this model. Alternatively, an exponential model for correcting for the effects of bulk solvent (Moews & Kretsinger, 1975; Tronrud, 1997) can be used. This is available in some programs: SHELX (Sheldrick & Schneider, 1997), REFMAC (Murshudov et al., 1997; REFMAC also provides the option for the flat bulk solvent described above) and TNT (Tronrud, 1997). However, it has been shown that this method is only correct at very low resolution (lower than 15 Å) and inappropriate at higher resolution (Podjarny & Urzhumtsev, 1997). Therefore, in this work we only consider the flat bulk-solvent model.
The bulk-solvent parameters ksol and Bsol are usually determined along with an overall scale factor between observed and calculated structure factors. It was demonstrated that the use of an anisotropic overall scale factor is physically more appropriate and can significantly reduce both the R and Rfree factors (Sheriff & Hendrickson, 1987; Murshudov et al., 1998). The criterion traditionally used to attain this goal is
where N = is a normalization factor (Brünger et al., 1989; Jiang & Brünger, 1994), the model structure factors
accumulate structure factors from the atomic model Fcalc (macromolecule plus ordered solvent), contribution from the bulk solvent
and overall anisotropic scale factor can be either in exponential form (Sheriff & Hendrickson, 1987) with six parameters to be determined, as implemented in CNS (Brünger et al., 1998) and REFMAC (Murshudov et al., 1998),
or the linear function of 12 parameters as implemented in SHELXL (Usón et al., 1999; Parkin et al., 1995). In this work, we consider only the exponential form of the anisotropic scale factor (4).
The scale k is chosen such that the derivative of LS with respect to k is zero, k = (Fsmodel)2, which is a necessary condition to make LS minimal (Brünger et al., 1989), h is a column vector with the of a reflection, ht is the transposed vector, Bcart, the overall anisotropic scale matrix, has the same units and conversion rules as Bcart defined in equations (2), (3b) and (7) of Grosse-Kunstleve & Adams (2002), A is an orthogonalization matrix, ksol and Bsol are the flat bulk-solvent model parameters, s2 = htG*h, where G* is the reciprocal-space and Fmask are the structure factors calculated from a molecular mask (a binary function with zero values in the protein region and unit values in the solvent region). The use of Bcart makes it straightforward to apply the isotropic component of the tensor to both Bsol and the atomic isotropic B factors in order to compensate for the high correlation of these parameters with the overall anisotropic scale matrix.
The correction for bulk solvent and scaling is usually the first step in a crystallographic is used in optimization of atomic model parameters, then the use of the same target function for the scaling and bulk-solvent parameters determination is well justified. However, if the maximum-likelihood-based strategy is chosen (Bricogne, 1991; Pannu & Read, 1996; Bricogne & Irwin, 1996; Murshudov et al., 1997), the use of function (1) for bulk-solvent and scale-parameter determination is less justified. In this case, it is more natural to also determine the bulk-solvent and anisotropic scale parameters from the likelihood function, allowing all the parameters to be optimized using the same criterion. The use of a likelihood function for the determination of bulk-solvent parameters has been discussed by Blanc et al. (2004).
protocol. If a least-squares-based procedure is chosen, where a target function of form (1)It has been observed that the determination of bulk-solvent parameters is a numerically challenging problem (Jiang & Brünger, 1994; Fokine & Urzhumtsev, 2002a). Inclusion of the anisotropic overall scale factor makes the problem even more complicated. Some possible reasons for this are the following.
|
Therefore, it is not surprising to find 95 models in the PDB (see selection criteria below; scoring performed August 2004) with bulk-solvent parameters beyond the physically meaningful range discussed in Fokine & Urzhumtsev (2002a).
In this paper, we describe a robust protocol for the determination of bulk-solvent and anisotropic scaling parameters using both Computational Crystallographic Toolbox (CCTBX; Grosse-Kunstleve et al., 2002).
and least-squares target functions and its implementation in the2. The target function and its derivatives with respect to bulk-solvent parameters and components of the anisotropic scale matrix
The negative logarithm of the ), which is implemented in CCTBX as one of the crystallographic target functions for structure can be presented as
function (Lunin & Skovoroda, 1995with
Here, is the calculated structure-factor magnitude for the reflection s from the available atomic model. The coefficient ∊s depends on the three-dimensional index s and on the and is equal to the number of symmetry operations that, when applied to the vector s, leave it unchanged. The parameters αs and βs accumulate the uncertainties in atomic coordinates and temperature factors (Lunin & Urzhumtsev, 1984; Read, 1986, 1990, 2001; Lunin & Skovoroda, 1995; Pannu & Read, 1996; Urzhumtsev et al., 1996). It is worth noting that the scale coefficient between observed and calculated structure factors, if not introduced explicitly, is also accumulated in these two parameters.
The explicit introduction of the anisotropic scale factor and the contribution from the bulk solvent into (5) can be realised by replacing with as defined in (2),
The derivatives of Ψ with respect to the six anisotropic scale-matrix elements Bcart and the solvent parameters ksol and Bsol required for first-derivative minimization methods such as LBFGS (Liu & Nocedal, 1989) are provided in Appendix A.
3. Algorithm for determination of ksol, Bsol and Bcart
Fokine & Urzhumtsev (2002a) have shown that the bulk-solvent parameters ksol and Bsol are distributed around 0.35 e Å−3 and 46 Å2 and the physically reasonable range for these parameters can be approximately defined as ksol ∈ (0.1, 0.8) and Bsol ∈ (10, 80). These observations make it possible to implement a systematic search procedure for the determination of ksol and Bsol, therefore making the whole protocol very robust and insensitive to the potential minimization problems mentioned above.
Fig. 1 outlines the algorithm implemented in the CCTBX using the likelihood function. Starting from zero values for ksol, Bsol and Bcart, the values for α and β (Lunin & Skovoroda, 1995) are calculated using cross-validation data with smoothing over resolution shells using spline functions (Lunin & Skovoroda, 1997). The value of the ML function (7) is evaluated at this initial point. In the next step, a grid-search procedure is applied in order to find ksol and Bsol: for each trial pair (ksol, Bsol) the parameters α, β are updated and the value of ML is recalculated. The set of (α, β, ksol, Bsol) with the minimum value of the function ML is then selected. The LBFGS minimization algorithm is used to optimize ML with respect to the six components of the Bcart tensor with the parameters for α, β, ksol and Bsol found in the previous step held constant. Symmetry restrictions are applied to the elements of Bcart (Sheriff & Hendrickson, 1987); however, they can optionally be turned off. The value of the ML function is evaluated again in order to determine if the procedure has converged; convergence has taken place when the difference of the target function between two steps is less then a certain tolerance value. This tolerance value is fixed as 1% of the relative drop in the target function value. Otherwise, the procedure is repeated starting with the set of parameters obtained in the previous step until convergence is reached.
For reasons of efficiency, the sampling step used in the grid-search procedure is quite coarse. For example, Bsol is by default varied within the range 10–80 Å2 with a sampling step of 5 Å2. Finer sampling can be used, but increases the computational time. The parameters ksol and Bsol obtained in such a way are then used as the start values for the next calculations, which are the same as above but with the grid search for ksol and Bsol replaced with the LBFGS minimization. This allows ksol and Bsol to be determined more precisely. However, if the minimization fails the best parameters from the previous step are retained. The procedure using the LS function (1) as a criterion is implemented in a similar way. The default parameters for the mask calculation are rsolv = 1.0 Å and rshrink = 1.0 Å and the grid step is the highest resolution of the data divided by 4 (for the definition of these parameters, see Jiang & Brünger, 1994).
It should be emphasized that all available data are used throughout the procedure without any partitioning by resolution.
4. Numerical tests
The goal of this test was to compare the performance of two proposed algorithms with least-squares (1) and (7) target functions using simulated models of different quality with simulated experimental data.
We used the model of a Fab fragment of a monoclonal antibody (Fokine et al., 2000) which consists of 439 amino acid residues and 213 water molecules. The crystals belong to P212121, with unit-cell parameters a = 72.24, b = 72.01, c = 86.99 Å. The values of were simulated by the amplitudes of structure factors calculated from the complete exact model at 2.2 Å resolution. The contributions of bulk solvent with ksol = 0.25 e Å−3 and Bsol = 55.0 Å2 and anisotropy with the diagonal elements (4, 8, −6) Å2 were added to in accordance with (2) and (3). Random errors with mean values in the range 0.0–0.6 Å were then introduced into the atomic coordinates of the complete exact model. Incomplete models were obtained by random deletion of 5 and 10% of atoms from the ensemble of models with errors; this generated a total of 21 models.
Fig. 2 shows the distribution of bulk-solvent parameters obtained using (1) and (7) as the target functions. With the exception of two pairs, all pairs of ksol and Bsol obtained with the likelihood target are within the physically reasonable range and, depending on the model quality, relatively close to the exact value of 0.25 e Å−3 and 55.0 Å2. In contrast, most of the solvent parameters calculated using the least-squares function are outside the correct range, with some values for Bsol reaching 200 Å2. This is not unexpected as the least-squares target does not include any mechanism to correct for model incompleteness and hence all eight adjustable parameters, ksol, Bsol and Bcart, model the contribution from bulk solvent and anisotropy along with the model errors and incompleteness. For the likelihood-based the distribution parameters α and β compensate for model errors and incompleteness. It is the high correlation between all of the model parameters which makes it necessary to develop the thorough and robust algorithm described in the previous section.
5. Tests with experimental data
In order to evaluate this new procedure for bulk-solvent correction and anisotropic scaling, we selected all `problem' models from the PDB, i.e. those with physically unreasonable values for the flat bulk-solvent model parameters. The exact selection criteria were structures solved by X-ray diffraction with the flat bulk-solvent model used, ksol < 0.1 or ksol > 1.0 e Å−3 and Bsol < 10 or Bsol > 100 Å2. This selected 95 models. The further demand for experimental data and cross-validation flags (`test' set of reflections) combined with an evaluation of the overall data correctness reduced the selected number of models to 35.
In most cases the new procedure yields physically reasonable parameters using both LS and ML target functions (Fig. 3). However, for some models (for example, PDB codes 1jh7 , 1k33 , 1kk7 , 1lee , 1r30 and 2gwx ) the parameters ksol and Bsol were outside the reasonable range, which may indicate insufficient data or poor model quality. In such cases the procedure sets the parameters to the best found in the search grid in step I (Fig. 1).
In order to evaluate the model improvement arising from more reasonable bulk-solvent parameters, R factors versus resolution were calculated for all selected models and a typical example for one model (PDB code 1jj1 ) is presented in Fig. 4(a). The use of corrected parameters significantly improves the fit for the low-resolution data, while the R factor calculated with the unreasonable parameters, taken from the PDB file, is 6% higher in the lowest resolution shell and about 11% higher for the case where no correction was performed. Analogous calculations were performed using the target function (Fig. 4b). Again, the parameters determined with the new method improve the likelihood target function compared with calculations with incorrect parameters or without any scaling and solvent correction.
In addition, tests were performed in order to compare the calculation of flat bulk-solvent and anisotropic scaling parameters in selected programs that provide this option (Fig. 5). In many cases CNS1.1 performs significantly better then CNS1.0 (Fig. 5a). This is because the bulk-solvent correction procedure in CNS1.1 was improved by changing the initial values for ksol and Bsol from zero to the observed mean values (Fokine & Urzhumtsev, 2002a), 0.35 e Å−3 and 46.0 Å2, respectively. In some cases CNS1.1 gives similar or slightly worse results than CCTBX (Fig. 5a). However, there are cases where the new procedure gives noticeably better results than both CNS1.0 and CNS1.1 (Fig. 5b). Finally, analogous calculations of flat bulk-solvent correction and anisotropic scaling with REFMAC using the SCALE SIMPLE option gave similar results to those seen with CNS1.0.
6. Conclusions
A robust method for the determination of anisotropic scale factor and flat bulk-solvent model parameters is required as CNS1.1 or REFMAC in determining optimum parameters for typical structures and works significantly better for `problem' structures.
becomes more automated. The new method we have described here, in combination with the likelihood function for optimization of the parameters, will minimize the occurrence of errors. The robustness of the algorithm has been proven on 35 structures selected from the PDB where unreasonable bulk-solvent parameters were reported. In most of these cases the new procedure found values close to those typically observed in refined structures. In our tests, the new procedure is as good as or better thanThese new algorithms are implemented in the CCTBX bulk-solvent correction and scaling module. CCTBX is available as open-source software at https://cctbx.sourceforge.net . All results presented are based on the CCTBX source code bundle with the version tag 2005_03_02_2358.
APPENDIX A
The derivatives of
target function with respect to bulk-solvent parameters and components of the anisotropic scale matrixGiven the function Ψ defined by (6) its derivatives with respect to the six anisotropic scale-matrix elements (Bcart)ij can be obtained following the chain rule,
where the function is defined below.
The calculation of derivatives with respect to the bulk-solvent parameters ksol and Bsol requires more attention. We can define a function (z) of complex variables as z = u + g(p)v, where u and v are complex variables and g(p) is a function with real arguments. Remembering that |z| = (z*z)1/2 and using the chain rule, one can obtain the derivative with respect to p as
Replacing u, v and g(p) with , and ksolexp(−Bsols2/4), the desired derivatives are
where
and
Acknowledgements
This work was supported in part by the US Department of Energy under Contract No. DE-AC03-76SF00098 and NIH/NIGMS grant 1P01GM063210. We thank Andrey Fokine (Purdue University) and Alexander Urzhumtsev (LCM3B Lab, France) for useful discussions.
References
Badger, J. (1997). Methods Enzymol. 277, 344–352. CrossRef PubMed CAS Web of Science Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542. CrossRef CAS PubMed Web of Science Google Scholar
Blanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210–2221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bricogne, G. (1991). Acta Cryst. A47, 803–829. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory. Google Scholar
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals Google Scholar
Brünger, A. T., Karplus, M. & Petsko, G. A. (1989). Acta Cryst. A45, 50–61. CrossRef IUCr Journals Google Scholar
Fokine, A. V., Afonine, P. V., Mikhailova, I. Yu., Tsygannik, I. N., Mareeva, T. Yu., Nesmeyanov, V. A., Pangborn, W., Li, N., Duax, W., Siszak, E. & Pletnev, V. Z. (2000). Russ. J. Bioorg. Chem. 26, 512–519. Google Scholar
Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387–1392. Web of Science CrossRef CAS IUCr Journals Google Scholar
Fokine, A. & Urzhumtsev, A. (2002b). Acta Cryst. A58, 72–74. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 477–480. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100–115. CrossRef CAS PubMed Web of Science Google Scholar
Kostrewa, D. (1997). CCP4 Newsl. 34, 9–22. Google Scholar
Liu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503–528. CrossRef Web of Science Google Scholar
Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880–887. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lunin, V. & Skovoroda, T. (1997). In Validation and Refinement of Macromolecular Structures, Porto, Portugal, August 29–30, 1997, Collected Abstracts. Google Scholar
Lunin, V. Y. & Urzhumtsev, A. (1984). Acta Cryst. A40, 269–277. CrossRef CAS Web of Science IUCr Journals Google Scholar
Moews, P. C. & Kretsinger, R. H. (1975). J. Mol. Biol. 91, 201–228. CrossRef PubMed CAS Web of Science Google Scholar
Murshudov, G. N., Davies, G. J., Isupov, M., Krzywda, S. & Dodson, E. J. (1998). CCP4 Newsl. Protein Crystallogr. 35, 37–43. Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659–668. CrossRef CAS Web of Science IUCr Journals Google Scholar
Parkin, S., Moezzi, B. & Hope, H. (1995). J. Appl. Cryst. 28, 53–56. CrossRef CAS Web of Science IUCr Journals Google Scholar
Phillips, S. E. V. (1980). J. Mol. Biol. 142, 531–554. CrossRef CAS PubMed Web of Science Google Scholar
Podjarny, A. D. & Urzhumtsev, A. G. (1997). Methods Enzymol. 276, 641–658. CrossRef CAS Web of Science Google Scholar
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (1990). Acta Cryst. A46, 900–912. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (2001). Acta Cryst. D57, 1373–1382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319–343. CrossRef PubMed CAS Web of Science Google Scholar
Sheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118–121. CrossRef CAS Web of Science IUCr Journals Google Scholar
Tronrud, D. E. (1997). Methods Enzymol. 277, 306–319. CrossRef CAS PubMed Web of Science Google Scholar
Urzhumtsev, A. (1991). Acta Cryst. A47, 794–801. CrossRef CAS Web of Science IUCr Journals Google Scholar
Urzhumtsev, A. G. (2000). CCP4 Newsl. 38, 38–49. Google Scholar
Urzhumtsev, A. G. & Podjarny, A. D. (1995). Acta Cryst. D51, 888–895. CrossRef CAS Web of Science IUCr Journals Google Scholar
Urzhumtsev, A., Skovoroda, T. P. & Lunin, V. Y. (1996). J. Appl. Cryst. 29, 741–744. CrossRef CAS Web of Science IUCr Journals Google Scholar
Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158–1167. Web of Science CrossRef IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.