[Journal logo]

Volume 36 
Part 1 
Pages 158-159  
February 2003  

Received 12 July 2002
Accepted 26 November 2002

MLMF: least-squares approximation of likelihood-based refinement criteria

Pavel Afonine,a,b Vladimir Y. Lunina,c and Alexandre Urzhumtseva*

aLCM3B, UMR 7036 CNRS, Université Henri Poincaré, Nancy 1, BP 239, Faculté des Sciences, Vandoeuvre-lès-Nancy, 54506, France,bCentre Charles Hermite, LORIA, Villers-lès-Nancy, 54602, France, and cInstitute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, 142290, Moscow Region, Russia
Correspondence e-mail: sacha@lcm3b.uhp-nancy.fr

A quadratic approximation of the maximum-likelihood criterion is defined by a target value for every calculated structure-factor magnitude and the corresponding weight. These values can be estimated using the experimental structure-factor magnitudes and general information about the model imperfection. The program MLMF provides a user with these weights and target values. The obtained quadratic approximation allows one to carry out a kind of maximum-likelihood refinement by means of any least-squares refinement suite.

Keywords: maximum likelihood; computer program; least-squares refinement.

1. Introduction

In recent years, macromolecular crystallographers have taken advantage of maximum-likelihood-based refinement (ML in what follows; see Pannu & Read, 1996[Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659-668.]; Bricogne & Irwin, 1996[Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend, pp. 85-92. Warrington: Daresbury Laboratory.]; Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]; Adams et al., 1997[Adams, P. D., Pannu, N. S., Read, R. J. & Brünger, A. T. (1997). Proc. Natl Acad. Sci. USA, 94, 5018-5023.]). One of the main reasons for this is that ML allows the missed part of the model or other sources of errors to be taken into account statistically (Lunin & Urzhumtsev, 1999[Lunin, V. Y. & Urzhumtsev, A. (1999). CCP4 Newsl. Protein Crystallogr. 37, 14-28.]; Lunin et al., 2002[Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. (2002). Acta Cryst. A58, 270-282.]).

It has been shown (Lunin et al., 2002[Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. (2002). Acta Cryst. A58, 270-282.]) that a quadratic approximation of the ML criterion allows one to understand the features of ML refinement and to choose a reasonable refinement strategy. The diagonal approximation to the likelihood function leads to the refinement criterion of the form

[Q_{\rm ML} = \textstyle\sum\limits_{{\bf s} \in S} \Psi \left({F_{\bf s}^{\,\rm calc}\semi F_{\bf s}^{\,\rm obs}, \alpha _{\bf s}, \beta _{\bf s} } \right) \Rightarrow \min\! . \eqno (1)]

Here [Psi] is some non-quadratic function, [F_{\bf s}^{\,\rm calc}] and [F_{\bf s}^{\,\rm obs}] are the calculated and experimental structure-factor magnitudes, and [\alpha_{\bf s}] and [\beta_{\bf s}] are statistical parameters linked to the two parameters of the Gaussian distribution. This criterion can be approximated by a quadratic functional

[Q_{\rm ML}^* = \textstyle\sum\limits_{{\bf s} \in S} w_{\bf s}^* \left({ F_{\bf s}^{\,\rm calc} - F_{\bf s}^{\,*} } \right)^2 \Rightarrow \min , \eqno (2)]

in the vicinity of the point of the minimum of the criterion. The parameters [F_{\bf s}^{\,*}] and [w_{\bf s}^*] of the quadratic approximation (2)[link] can be determined from the minimum and the curvature of the function [Psi]. The new target [F_{\bf s}^{\,*}] is close to the observed magnitude [F_{\bf s}^{\,\rm obs}] if the reflection is strong and the imperfection of the current model is relatively small. Otherwise, the observed magnitude may be essentially modified or even replaced by zero. Tests have shown that the major improvement in the refinement in comparison with least-squares (LS) refinement is due to the introduction of the weights [w_{\bf s}^{\,*}] (Lunin et al., 2002[Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. (2002). Acta Cryst. A58, 270-282.]).

Currently, most refinement programs use the LS criterion as the goal function. An exception is REFMAC (Murshudov et al., 1997[Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.]), which uses the ML criterion explicitly. The program suite CNS (Brünger et al., 1998[Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921.]) also uses a mode called `ML-criterion' (Adams et al., 1997[Adams, P. D., Pannu, N. S., Read, R. J. & Brünger, A. T. (1997). Proc. Natl Acad. Sci. USA, 94, 5018-5023.]); however, practically this mode is closer to the use of the method of moments to choose the statistical hypotheses rather than to the maximization of the likelihood (see Lunin et al., 2002[Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. (2002). Acta Cryst. A58, 270-282.], for corresponding discussion). In our tests (Afonine et al., 2002[Afonine, P., Urzhumtsev, A. & Lunin, V. Y. (2002). CCP4 Newsl. Protein Crystallogr. 40 (http://www.ccp4.ac.uk/newsletters.html).]), the use of this mode was less efficient than the optimization of the quadratic approximation (2)[link]. In conclusion, the approximation (2)[link] can be used as a refinement criterion instead of (1)[link] in various refinement packages, both for small molecules and for macromolecules. An advantage of such an approach is that (2)[link] has the same form as the LS criterion and its minimization can be carried our by conventional LS refinement packages with substituted values of [F_{\bf s}^{\,\rm obs}] and with the weights [w_{\bf s}^*] without any modification of the code. The program MLMF provides the user with the possibility to calculate [F_{\bf s}^{\,*}] and [w_{\bf s}^*] values starting from experimental structure-factor magnitudes.

2. Program description

2.1. Main purpose

For a given set of structure factors, the program MLMF calculates the values of [F_{\bf s}^{\,*}] and [w_{\bf s}^*], which can be used in an LS refinement program as the parameters of the criterion (2).[link]

2.2. The choice of parameters [alpha]s and [beta]s

The parameters [\alpha _{\bf s}] and [\beta _{\bf s}] reflect the level of inaccuracy of the starting model and a desired accuracy of the final model (Luzzati, 1952[Luzzati, V. (1952). Acta Cryst. 5, 802-810.]; Srinivasan & Parthasarathy, 1976[Srinivasan, R. & Parthasarathy, S. (1976). Some Statistical Applications in X-ray Crystallography. Oxford: Pergamon Press.]; Read, 1990[Read, R. J. (1990). Acta Cryst. A46, 900-912.]; Lunin & Skovoroda, 1995[Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880-887.]; Lunin et al., 2002[Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. (2002). Acta Cryst. A58, 270-282.]). They play the key role in the calculation of [F_{\bf s}^{\,*}] and [w_{\bf s}^*], which influences significantly the quality of the refined model. There exist several ways to estimate these parameters.

The basic option included in MLMF is to set all [\alpha _{\bf s}] = 1 and to calculate [\beta _{\bf s}] as

[\beta _{\bf s} = \textstyle\sum\limits_{k = M + 1}^N f_k^{\,2} (s ) , \eqno (3)]

where the sum is taken over all atoms absent in the model and fk(s) are their scattering factors, including the temperature factors (Afonine et al., 2002[Afonine, P., Urzhumtsev, A. & Lunin, V. Y. (2002). CCP4 Newsl. Protein Crystallogr. 40 (http://www.ccp4.ac.uk/newsletters.html).]). This corresponds to the search for a partial model with precisely defined atomic positions. The values of [\beta _{\bf s}] can be estimated only approximately because the temperature factors and, possibly, the precise number of absent atoms and their types are unknown. Tests show that reasonable variations in these parameters do not influence the results by much (Afonine et al., 2002[Afonine, P., Urzhumtsev, A. & Lunin, V. Y. (2002). CCP4 Newsl. Protein Crystallogr. 40 (http://www.ccp4.ac.uk/newsletters.html).]). In particular, this allows all atoms in (3)[link] to be considered as C atoms with the same temperature factor value of about 25  Å2, and the error in the estimated number of missed atoms can reach 25%. Note that in this mode every reflection receives its own values of [\alpha _{\bf s}] and [\beta _{\bf s}].

The second option available in the program is to estimate the parameters [\alpha _{\bf s}] and [\beta _{\bf s}] from the maximum-likelihood principle (Lunin & Urzhumtsev, 1984[Lunin, V. Y. & Urzhumtsev, A. (1984). Acta Cryst. A40, 269-277.]; Read, 1986[Read, R. J. (1986). Acta Cryst. A42, 140-149.]; Lunin & Skovoroda, 1995[Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880-887.]; Pannu & Read, 1996[Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659-668.]; Urzhumtsev et al., 1996[Urzhumtsev, A., Skovoroda, T. P. & Lunin, V. Y. (1996). J. Appl. Cryst. 29, 741-744.]). The values of [\alpha _{\bf s}] and [\beta _{\bf s}] are considered as constant for all reflections in each of a number of thin spherical layers of reciprocal space. In this mode, the structure-factor magnitudes calculated from the starting atomic model must be present in the input file together with the experimental magnitudes and flags for the test set of reflections (Brünger, 1992[Brünger, A. T. (1992). Nature (London), 355, 472-474.]). Every spherical layer should contain at least 50-100 test reflections in order to obtain a good estimate of the parameters [\alpha _{\bf s}] and [\beta _{\bf s}]. As is usual, the test set must represent 5-10% of the full data set.

Finally, the user may use his or her own estimates for [\alpha _{\bf s}] and [\beta _{\bf s}]. In this third mode, these values, calculated for each reflection, must be included in the input file of structure factors.

2.3. Program organization

The input information for MLMF is contained in two files: the file of control data and the file of structure factors.

The mandatory input parameters are the resolution limits and the space-group symmetries. If the mode 1 ([\alpha _{\bf s}] = 1) has been chosen, then the estimated number of absent atoms and their mean temperature factor are requested. Mode 2 needs alternatively the structure factors calculated from the starting model, the corresponding values for the flagged test set, and the number of the resolution shells in which the parameters [\alpha _{\bf s}] and[\beta _{\bf s}] are considered to be constant.

The input file of structure factors is a formatted file in which all data are written in columns. In particular, a file in CNS format (Brünger et al., 1998[Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921.]) is suitable. Free-format records as well as records in any fixed format are also accepted. Every record corresponds to one reflection, but can be expanded over several lines, as is the case for the CNS format. Each record contains Miller indices h, k, l, and experimental structure-factor magnitudes [F_{\bf s}^{\,\rm obs}], which are mandatory. The calculated structure factors [F_{\bf s}^{\,\rm calc}], test-set flags and predefined values of the parameters [\alpha _{\bf s}] and [\beta _{\bf s}] are optional, depending on the mode chosen.

The main output file is a file of structure factors written in the same format as the corresponding input file. It contains one record per reflection containing h, k, l, [F_{\bf s}^{\,\rm obs}], [F_{\bf s}^{\,*}] and [w_{\bf s}^*] values.

The message file contains a copy of the input control parameters, calculated statistics and various data for different modes of calculation of [F_{\bf s}^{\,*}] and [w_{\bf s}^*] values, error and warning messages, etc.

The program MLMF is written in standard Fortran 77 and is system-independent.

2.4. Program distribution

The MLMF program is available from the authors upon request (sacha@lcm3b.uhp-nancy.fr or afonine@lcm3b.uhp-nancy.fr). The distribution package contains the source code and an example of the control data file.

Acknowledgements

The authors thank C. Lecomte for his interest in the project. The project was partially financed through the CNRS-RAS collaboration and supported by RFBR grant 00-04-48175. PA and AU are members of GdR 2417 CNRS.

References

Adams, P. D., Pannu, N. S., Read, R. J. & Brünger, A. T. (1997). Proc. Natl Acad. Sci. USA, 94, 5018-5023. [PubMed] [CrossRef] [ChemPort]
Afonine, P., Urzhumtsev, A. & Lunin, V. Y. (2002). CCP4 Newsl. Protein Crystallogr. 40 (http://www.ccp4.ac.uk/newsletters.html).
Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend, pp. 85-92. Warrington: Daresbury Laboratory. [ChemPort]
Brünger, A. T. (1992). Nature (London), 355, 472-474.
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921. [details]
Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. (2002). Acta Cryst. A58, 270-282. [details] [ChemPort]
Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880-887. [details] [ChemPort]
Lunin, V. Y. & Urzhumtsev, A. (1984). Acta Cryst. A40, 269-277. [details]
Lunin, V. Y. & Urzhumtsev, A. (1999). CCP4 Newsl. Protein Crystallogr. 37, 14-28.
Luzzati, V. (1952). Acta Cryst. 5, 802-810. [details]
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255. [details] [ChemPort]
Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659-668. [details] [ChemPort]
Read, R. J. (1986). Acta Cryst. A42, 140-149. [details] [ChemPort]
Read, R. J. (1990). Acta Cryst. A46, 900-912. [details]
Srinivasan, R. & Parthasarathy, S. (1976). Some Statistical Applications in X-ray Crystallography. Oxford: Pergamon Press.
Urzhumtsev, A., Skovoroda, T. P. & Lunin, V. Y. (1996). J. Appl. Cryst. 29, 741-744. [details] [ChemPort]


J. Appl. Cryst. (2003). 36, 158-159   [ doi:10.1107/S0021889802021738 ]