research papers
A robust bulksolvent correction and anisotropic scaling procedure
^{a}Lawrence Berkeley National Laboratory, One Cyclotron Road, Building 64R0121, Berkeley, CA 94720, USA
^{*}Correspondence email: pafonine@lbl.gov
A reliable method for the determination of bulksolvent model parameters and an overall anisotropic scale factor is of increasing importance as CCTBX bulksolvent correction and scaling module.
becomes more automated. Current protocols require the manual inspection of results in order to detect errors in the calculation of these parameters. Here, a robust method for determining bulksolvent and anisotropic scaling parameters in macromolecular is described. The implementation of a target function for determining the same parameters is also discussed. The formulas and corresponding derivatives of the likelihood function with respect to the solvent parameters and the components of anisotropic scale matrix are presented. These algorithms are implemented in theKeywords: bulksolvent correction; anisotropic scaling.
1. Introduction
Analysis of the Protein Data Bank (PDB; Bernstein et al., 1977; Berman et al., 2000) shows that macromolecular crystals contain a significant amount of disordered solvent. The total solvent content varies around a mean of 55%, with a lower bound of approximately 20% and an upper bound of approximately 95%. The contribution of this bulk solvent to the diffracted amplitudes becomes nonnegligible at lower resolution (d > 8.0 Å). In the past, it has been common practice to truncate the lowresolution data and use only middle and highresolution shells for crystallographic calculations. More recently, it has been demonstrated that lowresolution data are very important for electrondensity map analysis (Urzhumtsev, 1991), crystallographic (Kostrewa, 1997) and the translation search in the molecularreplacement method (Urzhumtsev & Podjarny, 1995; Fokine & Urzhumtsev, 2002b). For a review and more complete set of references see, for example, Jiang & Brünger (1994), Badger (1997) and Urzhumtsev (2000).
Jiang & Brünger (1994) demonstrated that a flat bulksolvent model (Phillips, 1980) is the most reliable model and proposed an algorithm for calculation of the parameters. This involves the calculation of a solvent mask and the determination of two bulksolvent parameters, k_{sol} and B_{sol}. Fokine & Urzhumtsev (2002a) analyzed the distribution of bulksolvent parameters and provided a more physical insight for this model. Alternatively, an exponential model for correcting for the effects of bulk solvent (Moews & Kretsinger, 1975; Tronrud, 1997) can be used. This is available in some programs: SHELX (Sheldrick & Schneider, 1997), REFMAC (Murshudov et al., 1997; REFMAC also provides the option for the flat bulk solvent described above) and TNT (Tronrud, 1997). However, it has been shown that this method is only correct at very low resolution (lower than 15 Å) and inappropriate at higher resolution (Podjarny & Urzhumtsev, 1997). Therefore, in this work we only consider the flat bulksolvent model.
The bulksolvent parameters k_{sol} and B_{sol} are usually determined along with an overall scale factor between observed and calculated structure factors. It was demonstrated that the use of an anisotropic overall scale factor is physically more appropriate and can significantly reduce both the R and R_{free} factors (Sheriff & Hendrickson, 1987; Murshudov et al., 1998). The criterion traditionally used to attain this goal is
where N = is a normalization factor (Brünger et al., 1989; Jiang & Brünger, 1994), the model structure factors
accumulate structure factors from the atomic model F^{calc} (macromolecule plus ordered solvent), contribution from the bulk solvent
and overall anisotropic scale factor can be either in exponential form (Sheriff & Hendrickson, 1987) with six parameters to be determined, as implemented in CNS (Brünger et al., 1998) and REFMAC (Murshudov et al., 1998),
or the linear function of 12 parameters as implemented in SHELXL (Usón et al., 1999; Parkin et al., 1995). In this work, we consider only the exponential form of the anisotropic scale factor (4).
The scale k is chosen such that the derivative of LS with respect to k is zero, k = (F_{s}^{model})^{2}, which is a necessary condition to make LS minimal (Brünger et al., 1989), h is a column vector with the of a reflection, h^{t} is the transposed vector, B_{cart}, the overall anisotropic scale matrix, has the same units and conversion rules as B_{cart} defined in equations (2), (3b) and (7) of GrosseKunstleve & Adams (2002), A is an orthogonalization matrix, k_{sol} and B_{sol} are the flat bulksolvent model parameters, s^{2} = h^{t}G*h, where G* is the reciprocalspace and F^{mask} are the structure factors calculated from a molecular mask (a binary function with zero values in the protein region and unit values in the solvent region). The use of B_{cart} makes it straightforward to apply the isotropic component of the tensor to both B_{sol} and the atomic isotropic B factors in order to compensate for the high correlation of these parameters with the overall anisotropic scale matrix.
The correction for bulk solvent and scaling is usually the first step in a crystallographic is used in optimization of atomic model parameters, then the use of the same target function for the scaling and bulksolvent parameters determination is well justified. However, if the maximumlikelihoodbased strategy is chosen (Bricogne, 1991; Pannu & Read, 1996; Bricogne & Irwin, 1996; Murshudov et al., 1997), the use of function (1) for bulksolvent and scaleparameter determination is less justified. In this case, it is more natural to also determine the bulksolvent and anisotropic scale parameters from the likelihood function, allowing all the parameters to be optimized using the same criterion. The use of a likelihood function for the determination of bulksolvent parameters has been discussed by Blanc et al. (2004).
protocol. If a leastsquaresbased procedure is chosen, where a target function of form (1)It has been observed that the determination of bulksolvent parameters is a numerically challenging problem (Jiang & Brünger, 1994; Fokine & Urzhumtsev, 2002a). Inclusion of the anisotropic overall scale factor makes the problem even more complicated. Some possible reasons for this are the following.

Therefore, it is not surprising to find 95 models in the PDB (see selection criteria below; scoring performed August 2004) with bulksolvent parameters beyond the physically meaningful range discussed in Fokine & Urzhumtsev (2002a).
In this paper, we describe a robust protocol for the determination of bulksolvent and anisotropic scaling parameters using both Computational Crystallographic Toolbox (CCTBX; GrosseKunstleve et al., 2002).
and leastsquares target functions and its implementation in the2. The target function and its derivatives with respect to bulksolvent parameters and components of the anisotropic scale matrix
The negative logarithm of the ), which is implemented in CCTBX as one of the crystallographic target functions for structure can be presented as
function (Lunin & Skovoroda, 1995with
Here, is the calculated structurefactor magnitude for the reflection s from the available atomic model. The coefficient ∊_{s} depends on the threedimensional index s and on the and is equal to the number of symmetry operations that, when applied to the vector s, leave it unchanged. The parameters α_{s} and β_{s} accumulate the uncertainties in atomic coordinates and temperature factors (Lunin & Urzhumtsev, 1984; Read, 1986, 1990, 2001; Lunin & Skovoroda, 1995; Pannu & Read, 1996; Urzhumtsev et al., 1996). It is worth noting that the scale coefficient between observed and calculated structure factors, if not introduced explicitly, is also accumulated in these two parameters.
The explicit introduction of the anisotropic scale factor and the contribution from the bulk solvent into (5) can be realised by replacing with as defined in (2),
The derivatives of Ψ with respect to the six anisotropic scalematrix elements B_{cart} and the solvent parameters k_{sol} and B_{sol} required for firstderivative minimization methods such as LBFGS (Liu & Nocedal, 1989) are provided in Appendix A.
3. Algorithm for determination of k_{sol}, B_{sol} and B_{cart}
Fokine & Urzhumtsev (2002a) have shown that the bulksolvent parameters k_{sol} and B_{sol} are distributed around 0.35 e Å^{−3} and 46 Å^{2} and the physically reasonable range for these parameters can be approximately defined as k_{sol} ∈ (0.1, 0.8) and B_{sol} ∈ (10, 80). These observations make it possible to implement a systematic search procedure for the determination of k_{sol} and B_{sol}, therefore making the whole protocol very robust and insensitive to the potential minimization problems mentioned above.
Fig. 1 outlines the algorithm implemented in the CCTBX using the likelihood function. Starting from zero values for k_{sol}, B_{sol} and B_{cart}, the values for α and β (Lunin & Skovoroda, 1995) are calculated using crossvalidation data with smoothing over resolution shells using spline functions (Lunin & Skovoroda, 1997). The value of the ML function (7) is evaluated at this initial point. In the next step, a gridsearch procedure is applied in order to find k_{sol} and B_{sol}: for each trial pair (k_{sol}, B_{sol}) the parameters α, β are updated and the value of ML is recalculated. The set of (α, β, k_{sol}, B_{sol}) with the minimum value of the function ML is then selected. The LBFGS minimization algorithm is used to optimize ML with respect to the six components of the B_{cart} tensor with the parameters for α, β, k_{sol} and B_{sol} found in the previous step held constant. Symmetry restrictions are applied to the elements of B_{cart} (Sheriff & Hendrickson, 1987); however, they can optionally be turned off. The value of the ML function is evaluated again in order to determine if the procedure has converged; convergence has taken place when the difference of the target function between two steps is less then a certain tolerance value. This tolerance value is fixed as 1% of the relative drop in the target function value. Otherwise, the procedure is repeated starting with the set of parameters obtained in the previous step until convergence is reached.
For reasons of efficiency, the sampling step used in the gridsearch procedure is quite coarse. For example, B_{sol} is by default varied within the range 10–80 Å^{2} with a sampling step of 5 Å^{2}. Finer sampling can be used, but increases the computational time. The parameters k_{sol} and B_{sol} obtained in such a way are then used as the start values for the next calculations, which are the same as above but with the grid search for k_{sol} and B_{sol} replaced with the LBFGS minimization. This allows k_{sol} and B_{sol} to be determined more precisely. However, if the minimization fails the best parameters from the previous step are retained. The procedure using the LS function (1) as a criterion is implemented in a similar way. The default parameters for the mask calculation are r_{solv} = 1.0 Å and r_{shrink} = 1.0 Å and the grid step is the highest resolution of the data divided by 4 (for the definition of these parameters, see Jiang & Brünger, 1994).
It should be emphasized that all available data are used throughout the procedure without any partitioning by resolution.
4. Numerical tests
The goal of this test was to compare the performance of two proposed algorithms with leastsquares (1) and (7) target functions using simulated models of different quality with simulated experimental data.
We used the model of a Fab fragment of a monoclonal antibody (Fokine et al., 2000) which consists of 439 amino acid residues and 213 water molecules. The crystals belong to P2_{1}2_{1}2_{1}, with unitcell parameters a = 72.24, b = 72.01, c = 86.99 Å. The values of were simulated by the amplitudes of structure factors calculated from the complete exact model at 2.2 Å resolution. The contributions of bulk solvent with k_{sol} = 0.25 e Å^{−3} and B_{sol} = 55.0 Å^{2} and anisotropy with the diagonal elements (4, 8, −6) Å^{2} were added to in accordance with (2) and (3). Random errors with mean values in the range 0.0–0.6 Å were then introduced into the atomic coordinates of the complete exact model. Incomplete models were obtained by random deletion of 5 and 10% of atoms from the ensemble of models with errors; this generated a total of 21 models.
Fig. 2 shows the distribution of bulksolvent parameters obtained using (1) and (7) as the target functions. With the exception of two pairs, all pairs of k_{sol} and B_{sol} obtained with the likelihood target are within the physically reasonable range and, depending on the model quality, relatively close to the exact value of 0.25 e Å^{−3} and 55.0 Å^{2}. In contrast, most of the solvent parameters calculated using the leastsquares function are outside the correct range, with some values for B_{sol} reaching 200 Å^{2}. This is not unexpected as the leastsquares target does not include any mechanism to correct for model incompleteness and hence all eight adjustable parameters, k_{sol}, B_{sol} and B_{cart}, model the contribution from bulk solvent and anisotropy along with the model errors and incompleteness. For the likelihoodbased the distribution parameters α and β compensate for model errors and incompleteness. It is the high correlation between all of the model parameters which makes it necessary to develop the thorough and robust algorithm described in the previous section.
5. Tests with experimental data
In order to evaluate this new procedure for bulksolvent correction and anisotropic scaling, we selected all `problem' models from the PDB, i.e. those with physically unreasonable values for the flat bulksolvent model parameters. The exact selection criteria were structures solved by Xray diffraction with the flat bulksolvent model used, k_{sol} < 0.1 or k_{sol} > 1.0 e Å^{−3} and B_{sol} < 10 or B_{sol} > 100 Å^{2}. This selected 95 models. The further demand for experimental data and crossvalidation flags (`test' set of reflections) combined with an evaluation of the overall data correctness reduced the selected number of models to 35.
In most cases the new procedure yields physically reasonable parameters using both LS and ML target functions (Fig. 3). However, for some models (for example, PDB codes 1jh7 , 1k33 , 1kk7 , 1lee , 1r30 and 2gwx ) the parameters k_{sol} and B_{sol} were outside the reasonable range, which may indicate insufficient data or poor model quality. In such cases the procedure sets the parameters to the best found in the search grid in step I (Fig. 1).
In order to evaluate the model improvement arising from more reasonable bulksolvent parameters, R factors versus resolution were calculated for all selected models and a typical example for one model (PDB code 1jj1 ) is presented in Fig. 4(a). The use of corrected parameters significantly improves the fit for the lowresolution data, while the R factor calculated with the unreasonable parameters, taken from the PDB file, is 6% higher in the lowest resolution shell and about 11% higher for the case where no correction was performed. Analogous calculations were performed using the target function (Fig. 4b). Again, the parameters determined with the new method improve the likelihood target function compared with calculations with incorrect parameters or without any scaling and solvent correction.
In addition, tests were performed in order to compare the calculation of flat bulksolvent and anisotropic scaling parameters in selected programs that provide this option (Fig. 5). In many cases CNS1.1 performs significantly better then CNS1.0 (Fig. 5a). This is because the bulksolvent correction procedure in CNS1.1 was improved by changing the initial values for k_{sol} and B_{sol} from zero to the observed mean values (Fokine & Urzhumtsev, 2002a), 0.35 e Å^{−3} and 46.0 Å^{2}, respectively. In some cases CNS1.1 gives similar or slightly worse results than CCTBX (Fig. 5a). However, there are cases where the new procedure gives noticeably better results than both CNS1.0 and CNS1.1 (Fig. 5b). Finally, analogous calculations of flat bulksolvent correction and anisotropic scaling with REFMAC using the SCALE SIMPLE option gave similar results to those seen with CNS1.0.
6. Conclusions
A robust method for the determination of anisotropic scale factor and flat bulksolvent model parameters is required as CNS1.1 or REFMAC in determining optimum parameters for typical structures and works significantly better for `problem' structures.
becomes more automated. The new method we have described here, in combination with the likelihood function for optimization of the parameters, will minimize the occurrence of errors. The robustness of the algorithm has been proven on 35 structures selected from the PDB where unreasonable bulksolvent parameters were reported. In most of these cases the new procedure found values close to those typically observed in refined structures. In our tests, the new procedure is as good as or better thanThese new algorithms are implemented in the CCTBX bulksolvent correction and scaling module. CCTBX is available as opensource software at https://cctbx.sourceforge.net . All results presented are based on the CCTBX source code bundle with the version tag 2005_03_02_2358.
APPENDIX A
The derivatives of
target function with respect to bulksolvent parameters and components of the anisotropic scale matrixGiven the function Ψ defined by (6) its derivatives with respect to the six anisotropic scalematrix elements (B_{cart})_{ij} can be obtained following the chain rule,
where the function is defined below.
The calculation of derivatives with respect to the bulksolvent parameters k_{sol} and B_{sol} requires more attention. We can define a function (z) of complex variables as z = u + g(p)v, where u and v are complex variables and g(p) is a function with real arguments. Remembering that z = (z*z)^{1/2} and using the chain rule, one can obtain the derivative with respect to p as
Replacing u, v and g(p) with , and k_{sol}exp(−B_{sol}s^{2}/4), the desired derivatives are
where
and
Acknowledgements
This work was supported in part by the US Department of Energy under Contract No. DEAC0376SF00098 and NIH/NIGMS grant 1P01GM063210. We thank Andrey Fokine (Purdue University) and Alexander Urzhumtsev (LCM^{3}B Lab, France) for useful discussions.
References
Badger, J. (1997). Methods Enzymol. 277, 344–352. CrossRef PubMed CAS Web of Science Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542. CrossRef CAS PubMed Web of Science Google Scholar
Blanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210–2221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bricogne, G. (1991). Acta Cryst. A47, 803–829. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory. Google Scholar
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., GrosseKunstleve, R. W., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals Google Scholar
Brünger, A. T., Karplus, M. & Petsko, G. A. (1989). Acta Cryst. A45, 50–61. CrossRef IUCr Journals Google Scholar
Fokine, A. V., Afonine, P. V., Mikhailova, I. Yu., Tsygannik, I. N., Mareeva, T. Yu., Nesmeyanov, V. A., Pangborn, W., Li, N., Duax, W., Siszak, E. & Pletnev, V. Z. (2000). Russ. J. Bioorg. Chem. 26, 512–519. Google Scholar
Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387–1392. Web of Science CrossRef CAS IUCr Journals Google Scholar
Fokine, A. & Urzhumtsev, A. (2002b). Acta Cryst. A58, 72–74. Web of Science CrossRef CAS IUCr Journals Google Scholar
GrosseKunstleve, R. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 477–480. Web of Science CrossRef CAS IUCr Journals Google Scholar
GrosseKunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Jiang, J.S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100–115. CrossRef CAS PubMed Web of Science Google Scholar
Kostrewa, D. (1997). CCP4 Newsl. 34, 9–22. Google Scholar
Liu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503–528. CrossRef Web of Science Google Scholar
Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880–887. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lunin, V. & Skovoroda, T. (1997). In Validation and Refinement of Macromolecular Structures, Porto, Portugal, August 29–30, 1997, Collected Abstracts. Google Scholar
Lunin, V. Y. & Urzhumtsev, A. (1984). Acta Cryst. A40, 269–277. CrossRef CAS Web of Science IUCr Journals Google Scholar
Moews, P. C. & Kretsinger, R. H. (1975). J. Mol. Biol. 91, 201–228. CrossRef PubMed CAS Web of Science Google Scholar
Murshudov, G. N., Davies, G. J., Isupov, M., Krzywda, S. & Dodson, E. J. (1998). CCP4 Newsl. Protein Crystallogr. 35, 37–43. Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659–668. CrossRef CAS Web of Science IUCr Journals Google Scholar
Parkin, S., Moezzi, B. & Hope, H. (1995). J. Appl. Cryst. 28, 53–56. CrossRef CAS Web of Science IUCr Journals Google Scholar
Phillips, S. E. V. (1980). J. Mol. Biol. 142, 531–554. CrossRef CAS PubMed Web of Science Google Scholar
Podjarny, A. D. & Urzhumtsev, A. G. (1997). Methods Enzymol. 276, 641–658. CrossRef CAS Web of Science Google Scholar
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (1990). Acta Cryst. A46, 900–912. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (2001). Acta Cryst. D57, 1373–1382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319–343. CrossRef PubMed CAS Web of Science Google Scholar
Sheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118–121. CrossRef CAS Web of Science IUCr Journals Google Scholar
Tronrud, D. E. (1997). Methods Enzymol. 277, 306–319. CrossRef CAS PubMed Web of Science Google Scholar
Urzhumtsev, A. (1991). Acta Cryst. A47, 794–801. CrossRef CAS Web of Science IUCr Journals Google Scholar
Urzhumtsev, A. G. (2000). CCP4 Newsl. 38, 38–49. Google Scholar
Urzhumtsev, A. G. & Podjarny, A. D. (1995). Acta Cryst. D51, 888–895. CrossRef CAS Web of Science IUCr Journals Google Scholar
Urzhumtsev, A., Skovoroda, T. P. & Lunin, V. Y. (1996). J. Appl. Cryst. 29, 741–744. CrossRef CAS Web of Science IUCr Journals Google Scholar
Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158–1167. Web of Science CrossRef IUCr Journals Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.