research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 61| Part 7| July 2005| Pages 850-855

A robust bulk-solvent correction and anisotropic scaling procedure

CROSSMARK_Color_square_no_text.svg

aLawrence Berkeley National Laboratory, One Cyclotron Road, Building 64R0121, Berkeley, CA 94720, USA
*Correspondence e-mail: pafonine@lbl.gov

(Received 16 December 2004; accepted 11 March 2005)

A reliable method for the determination of bulk-solvent model parameters and an overall anisotropic scale factor is of increasing importance as structure determination becomes more automated. Current protocols require the manual inspection of refinement results in order to detect errors in the calculation of these parameters. Here, a robust method for determining bulk-solvent and anisotropic scaling parameters in macromolecular refinement is described. The implementation of a maximum-likelihood target function for determining the same parameters is also discussed. The formulas and corresponding derivatives of the likelihood function with respect to the solvent parameters and the components of anisotropic scale matrix are presented. These algorithms are implemented in the CCTBX bulk-solvent correction and scaling module.

1. Introduction

Analysis of the Protein Data Bank (PDB; Bernstein et al., 1977[Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535-542.]; Berman et al., 2000[Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.]) shows that macromolecular crystals contain a significant amount of disordered solvent. The total solvent content varies around a mean of 55%, with a lower bound of approximately 20% and an upper bound of approximately 95%. The contribution of this bulk solvent to the diffracted amplitudes becomes non-negligible at lower resolution (d > 8.0 Å). In the past, it has been common practice to truncate the low-resolution data and use only middle- and high-resolution shells for crystallographic calculations. More recently, it has been demonstrated that low-resolution data are very important for electron-density map analysis (Urzhumtsev, 1991[Urzhumtsev, A. (1991). Acta Cryst. A47, 794-801.]), crystallographic refinement (Kostrewa, 1997[Kostrewa, D. (1997). CCP4 Newsl. 34, 9-22.]) and the translation search in the molecular-replacement method (Urzhumtsev & Podjarny, 1995[Urzhumtsev, A. G. & Podjarny, A. D. (1995). Acta Cryst. D51, 888-895.]; Fokine & Urzhumtsev, 2002b[Fokine, A. & Urzhumtsev, A. (2002b). Acta Cryst. A58, 72-74.]). For a review and more complete set of references see, for example, Jiang & Brünger (1994[Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]), Badger (1997[Badger, J. (1997). Methods Enzymol. 277, 344-352.]) and Urzhumtsev (2000[Urzhumtsev, A. G. (2000). CCP4 Newsl. 38, 38-49.]).

Jiang & Brünger (1994[Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]) demonstrated that a flat bulk-solvent model (Phillips, 1980[Phillips, S. E. V. (1980). J. Mol. Biol. 142, 531-554.]) is the most reliable model and proposed an algorithm for calculation of the parameters. This involves the calculation of a solvent mask and the determination of two bulk-solvent parameters, ksol and Bsol. Fokine & Urzhumtsev (2002a[Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387-1392.]) analyzed the distribution of bulk-solvent parameters and provided a more physical insight for this model. Alternatively, an exponential model for correcting for the effects of bulk solvent (Moews & Kretsinger, 1975[Moews, P. C. & Kretsinger, R. H. (1975). J. Mol. Biol. 91, 201-228.]; Tronrud, 1997[Tronrud, D. E. (1997). Methods Enzymol. 277, 306-319.]) can be used. This is available in some refinement programs: SHELX (Sheldrick & Schneider, 1997[Sheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319-343.]), REFMAC (Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]; REFMAC also provides the option for the flat bulk solvent described above) and TNT (Tronrud, 1997[Tronrud, D. E. (1997). Methods Enzymol. 277, 306-319.]). However, it has been shown that this method is only correct at very low resolution (lower than 15 Å) and inappropriate at higher resolution (Podjarny & Urzhumtsev, 1997[Podjarny, A. D. & Urzhumtsev, A. G. (1997). Methods Enzymol. 276, 641-658.]). Therefore, in this work we only consider the flat bulk-solvent model.

The bulk-solvent parameters ksol and Bsol are usually determined along with an overall scale factor between observed and calculated structure factors. It was demonstrated that the use of an anisotropic overall scale factor is physically more appropriate and can significantly reduce both the R and Rfree factors (Sheriff & Hendrickson, 1987[Sheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118-121.]; Murshudov et al., 1998[Murshudov, G. N., Davies, G. J., Isupov, M., Krzywda, S. & Dodson, E. J. (1998). CCP4 Newsl. Protein Crystallogr. 35, 37-43.]). The criterion traditionally used to attain this goal is

[{\rm LS} = N\textstyle \sum \limits_{s}w_{s}(F_{s}^{\rm obs} - k|{\bf F}_{s}^{\rm model}|)^{2}, \eqno (1)]

where N = [1/\textstyle \sum_{s}(w_{s}F_{s}^{\rm obs})^{2}] is a normalization factor (Brünger et al., 1989[Brünger, A. T., Karplus, M. & Petsko, G. A. (1989). Acta Cryst. A45, 50-61.]; Jiang & Brünger, 1994[Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]), the model structure factors

[F_{s}^{\rm model} \equiv |{\bf F}_{s}^{\rm model}| = |{\bf F}_{s}^{\rm calc} + {\bf F}_{s}^{\rm solv}|f({\bf B}_{\rm cart}) \eqno (2)]

accumulate structure factors from the atomic model Fcalc (macromolecule plus ordered solvent), contribution from the bulk solvent

[{\bf F}_{s}^{\rm solv} = k_{\rm sol}\exp\left (- {{B_{\rm sol}s^{2}}\over {4}}\right){\bf F}_{s}^{\rm mask} \eqno (3)]

and overall anisotropic scale factor can be either in exponential form (Sheriff & Hendrickson, 1987[Sheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118-121.]) with six parameters to be determined, as implemented in CNS (Brünger et al., 1998[Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921.]) and REFMAC (Murshudov et al., 1998[Murshudov, G. N., Davies, G. J., Isupov, M., Krzywda, S. & Dodson, E. J. (1998). CCP4 Newsl. Protein Crystallogr. 35, 37-43.]),

[f({\bf B}_{\rm cart}) = \exp\left [- {{ {\bf h}^{t}{\bf A}^{-1}{\bf B}_{\rm cart}({\bf A}^{-1})^{t}{\bf h}}\over {4}}\right], \eqno (4)]

or the linear function of 12 parameters as implemented in SHELXL (Usón et al., 1999[Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158-1167.]; Parkin et al., 1995[Parkin, S., Moezzi, B. & Hope, H. (1995). J. Appl. Cryst. 28, 53-56.]). In this work, we consider only the exponential form of the anisotropic scale factor (4)[link].

The scale k is chosen such that the derivative of LS with respect to k is zero, k = [\textstyle \sum_{\bf s}F_{s}^{\rm obs}F_{s}^{\rm model}/] (Fsmodel)2, which is a necessary condition to make LS minimal (Brünger et al., 1989[Brünger, A. T., Karplus, M. & Petsko, G. A. (1989). Acta Cryst. A45, 50-61.]), h is a column vector with the Miller indices of a reflection, ht is the transposed vector, Bcart, the overall anisotropic scale matrix, has the same units and conversion rules as Bcart defined in equations (2), (3b) and (7) of Grosse-Kunstleve & Adams (2002[Grosse-Kunstleve, R. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 477-480.]), A is an orthogonalization matrix, ksol and Bsol are the flat bulk-solvent model parameters, s2 = htG*h, where G* is the reciprocal-space metric tensor, and Fmask are the structure factors calculated from a molecular mask (a binary function with zero values in the protein region and unit values in the solvent region). The use of Bcart makes it straightforward to apply the isotropic component of the tensor to both Bsol and the atomic isotropic B factors in order to compensate for the high correlation of these parameters with the overall anisotropic scale matrix.

The correction for bulk solvent and scaling is usually the first step in a crystallographic refinement protocol. If a least-squares-based refinement procedure is chosen, where a target function of form (1)[link] is used in optimization of atomic model parameters, then the use of the same target function for the scaling and bulk-solvent parameters determination is well justified. However, if the maximum-likelihood-based refinement strategy is chosen (Bricogne, 1991[Bricogne, G. (1991). Acta Cryst. A47, 803-829.]; Pannu & Read, 1996[Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659-668.]; Bricogne & Irwin, 1996[Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85-92. Warrington: Daresbury Laboratory.]; Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]), the use of function (1)[link] for bulk-solvent and scale-parameter determination is less justified. In this case, it is more natural to also determine the bulk-solvent and anisotropic scale parameters from the likelihood function, allowing all the parameters to be optimized using the same criterion. The use of a likelihood function for the determination of bulk-solvent parameters has been discussed by Blanc et al. (2004[Blanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210-2221.]).

It has been observed that the determination of bulk-solvent parameters is a numerically challenging problem (Jiang & Brünger, 1994[Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]; Fokine & Urzhumtsev, 2002a[Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387-1392.]). Inclusion of the anisotropic overall scale factor makes the problem even more complicated. Some possible reasons for this are the following.

  • (i) The quality and/or completeness of the low-resolution diffraction data may be insufficient.

  • (ii) The starting values for ksol and Bsol may be far from the correct values.

  • (iii) The parameters ksol, Bsol, k and Bcart are highly correlated. This may result in instability of the minimization procedure.

  • (iv) Optimization of a function of two exponentials is generally a non-trivial problem.

Therefore, it is not surprising to find 95 models in the PDB (see selection criteria below; scoring performed August 2004) with bulk-solvent parameters beyond the physically meaningful range discussed in Fokine & Urzhumtsev (2002a[Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387-1392.]).

In this paper, we describe a robust protocol for the determination of bulk-solvent and anisotropic scaling parameters using both maximum-likelihood and least-squares target functions and its implementation in the Computational Crystallographic Toolbox (CCTBX; Grosse-Kunstleve et al., 2002[Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126-136.]).

2. The maximum-likelihood target function and its derivatives with respect to bulk-solvent parameters and components of the anisotropic scale matrix

The negative logarithm of the maximum-likelihood function (Lunin & Skovoroda, 1995[Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880-887.]), which is implemented in CCTBX as one of the crystallographic target functions for structure refinement, can be presented as

[{\rm ML} = \textstyle \sum \limits_{{\bf s} \in S} \Psi (F_{\bf s}^{\rm calc}\semi F_{\bf s}^{\rm obs}, \alpha_{\bf s}, \beta_{\bf s}), \eqno (5)]

with

[\Psi = \cases {\displaystyle -\ln \left ({{2F_{\bf s}^{\rm obs}}\over {\varepsilon_{\bf s}\beta_{\bf s} }}\right) + {{(F_{\bf s}^{\rm obs})^{2}}\over {\varepsilon_{\bf s}\beta_{\bf s}}} + {{\alpha_{\bf s}^{2}(F_{\bf s}^{\rm calc})^{2}} \over {\varepsilon_{\bf s}\beta_{\bf s}}}\cr \quad - \ln I_{0} \left ({{2\alpha_{\bf s}F_{\bf s}^{\rm calc}F_{\bf s}^{\rm obs}}\over {\varepsilon_{\bf s}\beta_{\bf s}}}\right) &acentric reflections \cr \displaystyle -{1 \over 2}\ln \left ({2 \over {\pi \varepsilon_{\rm s}\beta_{\bf s}}}\right) + {{(F_{\bf s}^{\rm obs})^{2}}\over {2\varepsilon_{\rm s}\beta_{\bf s}}} + {{\alpha_{\bf s}^{2}(F_{\bf s}^{\rm calc})^{2}} \over {2\varepsilon_{\bf s}\beta_{\bf s}}}\cr \quad - \ln \cosh \left ({{\alpha_{\bf s}F_{\bf s}^{\rm calc}F_{\bf s}^{\rm obs}}\over {\varepsilon_{\bf s}\beta_{\bf s}}}\right)&centric reflections.} \eqno (6)]

Here, [F_{\bf s}^{\rm calc}] is the calculated structure-factor magnitude for the reflection s from the available atomic model. The coefficient s depends on the three-dimensional index s and on the space group and is equal to the number of symmetry operations that, when applied to the vector s, leave it unchanged. The parameters αs and βs accumulate the uncertainties in atomic coordinates and temperature factors (Lunin & Urzhumtsev, 1984[Lunin, V. Y. & Urzhumtsev, A. (1984). Acta Cryst. A40, 269-277.]; Read, 1986[Read, R. J. (1986). Acta Cryst. A42, 140-149.], 1990[Read, R. J. (1990). Acta Cryst. A46, 900-912.], 2001[Read, R. J. (2001). Acta Cryst. D57, 1373-1382.]; Lunin & Skovoroda, 1995[Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880-887.]; Pannu & Read, 1996[Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659-668.]; Urzhumtsev et al., 1996[Urzhumtsev, A., Skovoroda, T. P. & Lunin, V. Y. (1996). J. Appl. Cryst. 29, 741-744.]). It is worth noting that the scale coefficient between observed and calculated structure factors, if not introduced explicitly, is also accumulated in these two parameters.

The explicit introduction of the anisotropic scale factor and the contribution from the bulk solvent into (5)[link] can be realised by replacing [F_{\bf s}^{\rm calc}] with [F_{\bf s}^{\rm model}] as defined in (2)[link],

[{\rm ML} = \textstyle \sum \limits_{{\bf s} \in S}\Psi (F_{\bf s}^{\rm model} \semi F_{\bf s}^{\rm obs}, \alpha_{\bf s}, \beta_{\bf s}). \eqno (7)]

The derivatives of Ψ with respect to the six anisotropic scale-matrix elements Bcart and the solvent parameters ksol and Bsol required for first-derivative minimization methods such as LBFGS (Liu & Nocedal, 1989[Liu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503-528.]) are provided in Appendix A[link].

3. Algorithm for determination of ksol, Bsol and Bcart

Fokine & Urzhumtsev (2002a[Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387-1392.]) have shown that the bulk-solvent parameters ksol and Bsol are distributed around 0.35 e Å−3 and 46 Å2 and the physically reasonable range for these parameters can be approximately defined as ksol ∈ (0.1, 0.8) and Bsol ∈ (10, 80). These observations make it possible to implement a systematic search procedure for the determination of ksol and Bsol, therefore making the whole protocol very robust and insensitive to the potential minimization problems mentioned above.

Fig. 1[link] outlines the algorithm implemented in the CCTBX using the likelihood function. Starting from zero values for ksol, Bsol and Bcart, the values for α and β (Lunin & Skovoroda, 1995[Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880-887.]) are calculated using cross-validation data with smoothing over resolution shells using spline functions (Lunin & Skovoroda, 1997[Lunin, V. & Skovoroda, T. (1997). In Validation and Refinement of Macromolecular Structures, Porto, Portugal, August 29-30, 1997, Collected Abstracts.]). The value of the ML function (7)[link] is evaluated at this initial point. In the next step, a grid-search procedure is applied in order to find ksol and Bsol: for each trial pair (ksol, Bsol) the parameters α, β are updated and the value of ML is recalculated. The set of (α, β, ksol, Bsol) with the minimum value of the function ML is then selected. The LBFGS minimization algorithm is used to optimize ML with respect to the six components of the Bcart tensor with the parameters for α, β, ksol and Bsol found in the previous step held constant. Symmetry restrictions are applied to the elements of Bcart (Sheriff & Hendrickson, 1987[Sheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118-121.]); however, they can optionally be turned off. The value of the ML function is evaluated again in order to determine if the procedure has converged; convergence has taken place when the difference of the target function between two steps is less then a certain tolerance value. This tolerance value is fixed as 1% of the relative drop in the target function value. Otherwise, the procedure is repeated starting with the set of parameters obtained in the previous step until convergence is reached.

[Figure 1]
Figure 1
Algorithm for calculation of flat bulk-solvent model parameters ksol and Bsol and the anisotropic scale matrix Bcart as implemented in CCTBX.

For reasons of efficiency, the sampling step used in the grid-search procedure is quite coarse. For example, Bsol is by default varied within the range 10–80 Å2 with a sampling step of 5 Å2. Finer sampling can be used, but increases the computational time. The parameters ksol and Bsol obtained in such a way are then used as the start values for the next calculations, which are the same as above but with the grid search for ksol and Bsol replaced with the LBFGS minimization. This allows ksol and Bsol to be determined more precisely. However, if the minimization fails the best parameters from the previous step are retained. The procedure using the LS function (1)[link] as a criterion is implemented in a similar way. The default parameters for the mask calculation are rsolv = 1.0 Å and rshrink = 1.0 Å and the grid step is the highest resolution of the data divided by 4 (for the definition of these parameters, see Jiang & Brünger, 1994[Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]).

It should be emphasized that all available data are used throughout the procedure without any partitioning by resolution.

4. Numerical tests

The goal of this test was to compare the performance of two proposed algorithms with least-squares (1)[link] and maximum-likelihood (7)[link] target functions using simulated models of different quality with simulated experimental data.

We used the model of a Fab fragment of a monoclonal antibody (Fokine et al., 2000[Fokine, A. V., Afonine, P. V., Mikhailova, I. Yu., Tsygannik, I. N., Mareeva, T. Yu., Nesmeyanov, V. A., Pangborn, W., Li, N., Duax, W., Siszak, E. & Pletnev, V. Z. (2000). Russ. J. Bioorg. Chem. 26, 512-519.]) which consists of 439 amino acid residues and 213 water molecules. The crystals belong to space group P212121, with unit-cell parameters a = 72.24, b = 72.01, c = 86.99 Å. The values of [F_{\bf s}^{\rm obs}] were simulated by the amplitudes of structure factors calculated from the complete exact model at 2.2 Å resolution. The contributions of bulk solvent with ksol = 0.25 e Å−3 and Bsol = 55.0 Å2 and anisotropy with the diagonal elements (4, 8, −6) Å2 were added to [F_{\bf s}^{\rm obs}] in accordance with (2)[link] and (3)[link]. Random errors with mean values in the range 0.0–0.6 Å were then introduced into the atomic coordinates of the complete exact model. Incomplete models were obtained by random deletion of 5 and 10% of atoms from the ensemble of models with errors; this generated a total of 21 models.

Fig. 2[link] shows the distribution of bulk-solvent parameters obtained using (1)[link] and (7)[link] as the target functions. With the exception of two pairs, all pairs of ksol and Bsol obtained with the likelihood target are within the physically reasonable range and, depending on the model quality, relatively close to the exact value of 0.25 e Å−3 and 55.0 Å2. In contrast, most of the solvent parameters calculated using the least-squares function are outside the correct range, with some values for Bsol reaching 200 Å2. This is not unexpected as the least-squares target does not include any mechanism to correct for model incompleteness and hence all eight adjustable parameters, ksol, Bsol and Bcart, model the contribution from bulk solvent and anisotropy along with the model errors and incompleteness. For the likelihood-based refinement the distribution parameters α and β compensate for model errors and incompleteness. It is the high correlation between all of the model parameters which makes it necessary to develop the thorough and robust algorithm described in the previous section.

[Figure 2]
Figure 2
Flat bulk-solvent model parameters ksol and Bsol determined for 21 test models (see text for details of the models) using the least-squares (LS) or maximum-likelihood (ML) target functions.

5. Tests with experimental data

In order to evaluate this new procedure for bulk-solvent correction and anisotropic scaling, we selected all `problem' models from the PDB, i.e. those with physically unreasonable values for the flat bulk-solvent model parameters. The exact selection criteria were structures solved by X-ray diffraction with the flat bulk-solvent model used, ksol < 0.1 or ksol > 1.0 e Å−3 and Bsol < 10 or Bsol > 100 Å2. This selected 95 models. The further demand for experimental data and cross-validation flags (`test' set of reflections) combined with an evaluation of the overall data correctness reduced the selected number of models to 35.

In most cases the new procedure yields physically reasonable parameters using both LS and ML target functions (Fig. 3[link]). However, for some models (for example, PDB codes 1jh7 , 1k33 , 1kk7 , 1lee , 1r30 and 2gwx ) the parameters ksol and Bsol were outside the reasonable range, which may indicate insufficient data or poor model quality. In such cases the procedure sets the parameters to the best found in the search grid in step I (Fig. 1[link]).

[Figure 3]
Figure 3
Flat bulk-solvent model parameters ksol and Bsol for 35 structures selected from the PDB (PDB codes 1ci3, 1gzk, 1jh7, 1jj1, 1jvx, 1jzb, 1ev8, 1evf, 1k33, 1ijk, 1izr, 1kk7, 1kzn, 1lee,1 lfv, 1dzj, 1m5u, 1m8s, 1nfg, 1oz4, 3gwx, 1ev5, 1evg, 1f3u, 1g1b, 1p9h, 1r30, 1tve, 1hw3, 1hw4, 1ijb, 1izp, 1izq, 1ktk, 2gwx). Blue diamonds and red squares correspond to the bulk-solvent parameters calculated in CCTBX using least-squares (LS) and maximum-likelihood (ML) target functions, respectively. Black triangles represent the bulk-solvent parameters reported in the PDB file under keywords `KSOL' and `BSOL'.

In order to evaluate the model improvement arising from more reasonable bulk-solvent parameters, R factors versus resolution were calculated for all selected models and a typical example for one model (PDB code 1jj1 ) is presented in Fig. 4[link](a). The use of corrected parameters significantly improves the fit for the low-resolution data, while the R factor calculated with the unreasonable parameters, taken from the PDB file, is 6% higher in the lowest resolution shell and about 11% higher for the case where no correction was performed. Analogous calculations were performed using the maximum-likelihood target function (Fig. 4[link]b). Again, the parameters determined with the new method improve the likelihood target function compared with calculations with incorrect parameters or without any scaling and solvent correction.

[Figure 4]
Figure 4
R factor (a) and ML function (b) (ML is normalized by the number of reflections in bins) calculated in resolution bins (for the structure with PDB code 1jj1 ): no scaling and bulk-solvent correction (black), parameters ksol and Bsol and scale matrix Bcart taken from the PDB file (blue), scaling and bulk-solvent correction parameters calculated using CCTBX with the least-squares (a) and maximum-likelihood target (b).

In addition, tests were performed in order to compare the calculation of flat bulk-solvent and anisotropic scaling parameters in selected programs that provide this option (Fig. 5[link]). In many cases CNS1.1 performs significantly better then CNS1.0 (Fig. 5[link]a). This is because the bulk-solvent correction procedure in CNS1.1 was improved by changing the initial values for ksol and Bsol from zero to the observed mean values (Fokine & Urzhumtsev, 2002a[Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387-1392.]), 0.35 e Å−3 and 46.0 Å2, respectively. In some cases CNS1.1 gives similar or slightly worse results than CCTBX (Fig. 5[link]a). However, there are cases where the new procedure gives noticeably better results than both CNS1.0 and CNS1.1 (Fig. 5[link]b). Finally, analogous calculations of flat bulk-solvent correction and anisotropic scaling with REFMAC using the SCALE SIMPLE option gave similar results to those seen with CNS1.0.

[Figure 5]
Figure 5
R factor as a function of resolution (in Å) for the structures with PDB code 1jj1 (a) and 1lee (b). Bulk-solvent correction and anisotropic scaling performed with CNS1.0 (green), CNS1.1 (blue) and CCTBX (red).

6. Conclusions

A robust method for the determination of anisotropic scale factor and flat bulk-solvent model parameters is required as structure determination becomes more automated. The new method we have described here, in combination with the likelihood function for optimization of the parameters, will minimize the occurrence of errors. The robustness of the algorithm has been proven on 35 structures selected from the PDB where unreasonable bulk-solvent parameters were reported. In most of these cases the new procedure found values close to those typically observed in refined structures. In our tests, the new procedure is as good as or better than CNS1.1 or REFMAC in determining optimum parameters for typical structures and works significantly better for `problem' structures.

These new algorithms are implemented in the CCTBX bulk-solvent correction and scaling module. CCTBX is available as open-source software at https://cctbx.sourceforge.net . All results presented are based on the CCTBX source code bundle with the version tag 2005_03_02_2358.

APPENDIX A

The derivatives of maximum-likelihood target function with respect to bulk-solvent parameters and components of the anisotropic scale matrix

Given the function Ψ defined by (6)[link] its derivatives with respect to the six anisotropic scale-matrix elements (Bcart)ij can be obtained following the chain rule,

[{{\partial \Psi}\over {\partial(B_{\rm cart})_{ij}}} = -{1 \over 4}F_{\bf s}^{\rm model}{{\partial[{\bf h}^{t}{\bf A}^{-1}{\bf B}_{\rm cart}({\bf A}^{-1})^{t}{\bf h}]}\over {\partial(B_{\rm cart})_{ij}}}\tilde {\Psi}, \eqno (8)]

where the function [\tilde {\Psi}] is defined below.

The calculation of derivatives with respect to the bulk-solvent parameters ksol and Bsol requires more attention. We can define a function (z) of complex variables as z = u + g(p)v, where u and v are complex variables and g(p) is a function with real arguments. Remembering that |z| = (z*z)1/2 and using the chain rule, one can obtain the derivative with respect to p as

[\eqalign {{{ \partial |{\bf z}| } \over { \partial p }} & = {1 \over 2} {{\displaystyle {\bf z} {{ \partial {\bf z}^* } \over { \partial p }} + {\bf z}^* {{ \partial {\bf z}} \over {\partial p}} } \over { ({\bf z}^*{\bf z})^{1/2} }} \cr & = {{\displaystyle [{\bf u} + g(p){\bf v}]{{\partial g(p)} \over {\partial p}}{\bf v}^* + [{\bf u}^* + g(p){\bf v}^*] {{\partial g(p)} \over {\partial p}}{\bf v}} \over {2({\bf z}^*{\bf z})^{1/2}}}\cr & = {{\partial g(p)} \over {\partial p}} {{[{\bf uv}^* + g(p){\bf vv}^*] + [{\bf u}^*{\bf v} + g(p){\bf v}^*{\bf v}]} \over {2({\bf z}^*{\bf z})^{1/2}}} \cr & = {{ {\bf uv}^* + {\bf u}^*{\bf v} + 2g(p)|{\bf v}|^{2} } \over {2|{\bf z}|}} {{\partial g(p)} \over {\partial p}}.}]

Replacing uv and g(p) with [{\bf F}_{s}^{\rm calc}], [{\bf F}_{s}^{\rm mask}] and ksolexp(−Bsols2/4), the desired derivatives are

[{{\partial \Psi} \over {\partial k_{\rm sol} }} = f({\bf B}_{\rm cart})\Theta \exp \left (- {{B_{\rm sol}s^{2}} \over 4}\right)\tilde {\Psi}, \eqno (9)]

[{{\partial \Psi} \over {\partial B_{\rm sol} }} = f({\bf B}_{\rm cart})\Theta k_{\rm sol} \exp \left (- {{B_{\rm sol}s^{2}} \over 4}\right)\left (- {{ s^{2} }\over 4}\right)\tilde {\Psi}, \eqno (10)]

where

[\Theta = {{ {\bf F}_{s}^{\rm calc}({\bf F}_{s}^{\rm mask})^* + {\bf F}_{s}^{\rm mask}({\bf F}_{s}^{\rm calc})^* + 2k_{\rm sol}\exp(-B_{\rm sol}s^{2}/4)|{\bf F}_{s}^{\rm mask}|^{2} }\over {2|{\bf F}_{s}^{\rm model}|^{2}}}]

and

[\tilde {\Psi} = \cases { \displaystyle {{ 2\alpha_{\bf s}^{2}F_{\bf s}^{\rm model} } \over {\varepsilon_{\bf s}\beta_{\bf s} }} - {{ 2\alpha_{\bf s}F_{\bf s}^{\rm obs} } \over {\varepsilon_{\bf s}\beta_{\bf s} }} {{ I_{1} \left (\displaystyle {{ 2\alpha_{\bf s} F_{\bf s}^{\rm model}F_{\bf s}^{\rm obs}} \over {\varepsilon_{\bf s}\beta_{\bf s} }} \right) } \over {I_{0}\left (\displaystyle {{ 2\alpha_{\bf s} F_{\bf s}^{\rm model}F_{\bf s}^{\rm obs}} \over {\varepsilon_{\bf s}\beta_{\bf s} }} \right)}} & acentric reflections \cr \displaystyle {{\alpha_{\bf s}^{2}F_{\bf s}^{\rm model} } \over {\varepsilon_{\bf s}\beta_{\bf s} }}- {{\alpha_{\bf s}F_{\bf s}^{\rm obs} } \over {\varepsilon_{\bf s}\beta_{\bf s} }}\tanh \left (\displaystyle {{ \alpha_{\bf s} F_{\bf s}^{\rm model}F_{\bf s}^{\rm obs}} \over {\varepsilon_{\bf s}\beta_{\bf s} }} \right) & centric reflections.}]

Acknowledgements

This work was supported in part by the US Department of Energy under Contract No. DE-AC03-76SF00098 and NIH/NIGMS grant 1P01GM063210. We thank Andrey Fokine (Purdue University) and Alexander Urzhumtsev (LCM3B Lab, France) for useful discussions.

References

First citationBadger, J. (1997). Methods Enzymol. 277, 344–352.  CrossRef PubMed CAS Web of Science Google Scholar
First citationBerman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542.  CrossRef CAS PubMed Web of Science Google Scholar
First citationBlanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210–2221.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBricogne, G. (1991). Acta Cryst. A47, 803–829.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationBricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory.  Google Scholar
First citationBrünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921.  Web of Science CrossRef IUCr Journals Google Scholar
First citationBrünger, A. T., Karplus, M. & Petsko, G. A. (1989). Acta Cryst. A45, 50–61.  CrossRef IUCr Journals Google Scholar
First citationFokine, A. V., Afonine, P. V., Mikhailova, I. Yu., Tsygannik, I. N., Mareeva, T. Yu., Nesmeyanov, V. A., Pangborn, W., Li, N., Duax, W., Siszak, E. & Pletnev, V. Z. (2000). Russ. J. Bioorg. Chem. 26, 512–519.  Google Scholar
First citationFokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. D58, 1387–1392.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationFokine, A. & Urzhumtsev, A. (2002b). Acta Cryst. A58, 72–74.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGrosse-Kunstleve, R. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 477–480.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGrosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationJiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100–115.  CrossRef CAS PubMed Web of Science Google Scholar
First citationKostrewa, D. (1997). CCP4 Newsl. 34, 9–22.  Google Scholar
First citationLiu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503–528.  CrossRef Web of Science Google Scholar
First citationLunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880–887.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationLunin, V. & Skovoroda, T. (1997). In Validation and Refinement of Macromolecular Structures, Porto, Portugal, August 29–30, 1997, Collected AbstractsGoogle Scholar
First citationLunin, V. Y. & Urzhumtsev, A. (1984). Acta Cryst. A40, 269–277.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationMoews, P. C. & Kretsinger, R. H. (1975). J. Mol. Biol. 91, 201–228.  CrossRef PubMed CAS Web of Science Google Scholar
First citationMurshudov, G. N., Davies, G. J., Isupov, M., Krzywda, S. & Dodson, E. J. (1998). CCP4 Newsl. Protein Crystallogr. 35, 37–43.  Google Scholar
First citationMurshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationPannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659–668.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationParkin, S., Moezzi, B. & Hope, H. (1995). J. Appl. Cryst. 28, 53–56.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationPhillips, S. E. V. (1980). J. Mol. Biol. 142, 531–554.  CrossRef CAS PubMed Web of Science Google Scholar
First citationPodjarny, A. D. & Urzhumtsev, A. G. (1997). Methods Enzymol. 276, 641–658.  CrossRef CAS Web of Science Google Scholar
First citationRead, R. J. (1986). Acta Cryst. A42, 140–149.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationRead, R. J. (1990). Acta Cryst. A46, 900–912.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationRead, R. J. (2001). Acta Cryst. D57, 1373–1382.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319–343.  CrossRef PubMed CAS Web of Science Google Scholar
First citationSheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118–121.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTronrud, D. E. (1997). Methods Enzymol. 277, 306–319.  CrossRef CAS PubMed Web of Science Google Scholar
First citationUrzhumtsev, A. (1991). Acta Cryst. A47, 794–801.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationUrzhumtsev, A. G. (2000). CCP4 Newsl. 38, 38–49.  Google Scholar
First citationUrzhumtsev, A. G. & Podjarny, A. D. (1995). Acta Cryst. D51, 888–895.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationUrzhumtsev, A., Skovoroda, T. P. & Lunin, V. Y. (1996). J. Appl. Cryst. 29, 741–744.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationUsón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158–1167.  Web of Science CrossRef IUCr Journals Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 61| Part 7| July 2005| Pages 850-855
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds