Volume 36 Received 12 July 2002 | MLMF: least-squares approximation of likelihood-based refinement criteriaaLCM3B, UMR 7036 CNRS, Université Henri Poincaré, Nancy 1, BP 239, Faculté des Sciences, Vandoeuvre-lès-Nancy, 54506, France,bCentre Charles Hermite, LORIA, Villers-lès-Nancy, 54602, France, and cInstitute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, 142290, Moscow Region, Russia A quadratic approximation of the maximum-likelihood criterion is defined by a target value for every calculated structure-factor magnitude and the corresponding weight. These values can be estimated using the experimental structure-factor magnitudes and general information about the model imperfection. The program MLMF provides a user with these weights and target values. The obtained quadratic approximation allows one to carry out a kind of maximum-likelihood refinement by means of any least-squares refinement suite. Keywords: maximum likelihood; computer program; least-squares refinement. |
In recent years, macromolecular crystallographers have taken advantage of maximum-likelihood-based refinement (ML in what follows; see Pannu & Read, 1996
; Bricogne & Irwin, 1996
; Murshudov et al., 1997
; Adams et al., 1997
). One of the main reasons for this is that ML allows the missed part of the model or other sources of errors to be taken into account statistically (Lunin & Urzhumtsev, 1999
; Lunin et al., 2002
).
It has been shown (Lunin et al., 2002
) that a quadratic approximation of the ML criterion allows one to understand the features of ML refinement and to choose a reasonable refinement strategy. The diagonal approximation to the likelihood function leads to the refinement criterion of the form
Here
is some non-quadratic function,
and
are the calculated and experimental structure-factor magnitudes, and
and
are statistical parameters linked to the two parameters of the Gaussian distribution. This criterion can be approximated by a quadratic functional
in the vicinity of the point of the minimum of the criterion. The parameters
and
of the quadratic approximation (2)
can be determined from the minimum and the curvature of the function
. The new target
is close to the observed magnitude
if the reflection is strong and the imperfection of the current model is relatively small. Otherwise, the observed magnitude may be essentially modified or even replaced by zero. Tests have shown that the major improvement in the refinement in comparison with least-squares (LS) refinement is due to the introduction of the weights
(Lunin et al., 2002
).
Currently, most refinement programs use the LS criterion as the goal function. An exception is REFMAC (Murshudov et al., 1997
), which uses the ML criterion explicitly. The program suite CNS (Brünger et al., 1998
) also uses a mode called `ML-criterion' (Adams et al., 1997
); however, practically this mode is closer to the use of the method of moments to choose the statistical hypotheses rather than to the maximization of the likelihood (see Lunin et al., 2002
, for corresponding discussion). In our tests (Afonine et al., 2002
), the use of this mode was less efficient than the optimization of the quadratic approximation (2)
. In conclusion, the approximation (2)
can be used as a refinement criterion instead of (1)
in various refinement packages, both for small molecules and for macromolecules. An advantage of such an approach is that (2)
has the same form as the LS criterion and its minimization can be carried our by conventional LS refinement packages with substituted values of
and with the weights
without any modification of the code. The program MLMF provides the user with the possibility to calculate
and
values starting from experimental structure-factor magnitudes.
For a given set of structure factors, the program MLMF calculates the values of
and
, which can be used in an LS refinement program as the parameters of the criterion (2).![[link]](../../../../../../logos/links/greenarr.gif)
s and
sThe parameters
and
reflect the level of inaccuracy of the starting model and a desired accuracy of the final model (Luzzati, 1952
; Srinivasan & Parthasarathy, 1976
; Read, 1990
; Lunin & Skovoroda, 1995
; Lunin et al., 2002
). They play the key role in the calculation of
and
, which influences significantly the quality of the refined model. There exist several ways to estimate these parameters.
The basic option included in MLMF is to set all
= 1 and to calculate
as
where the sum is taken over all atoms absent in the model and fk(s) are their scattering factors, including the temperature factors (Afonine et al., 2002
). This corresponds to the search for a partial model with precisely defined atomic positions. The values of
can be estimated only approximately because the temperature factors and, possibly, the precise number of absent atoms and their types are unknown. Tests show that reasonable variations in these parameters do not influence the results by much (Afonine et al., 2002
). In particular, this allows all atoms in (3)
to be considered as C atoms with the same temperature factor value of about 25 Å2, and the error in the estimated number of missed atoms can reach 25%. Note that in this mode every reflection receives its own values of
and
.
The second option available in the program is to estimate the parameters
and
from the maximum-likelihood principle (Lunin & Urzhumtsev, 1984
; Read, 1986
; Lunin & Skovoroda, 1995
; Pannu & Read, 1996
; Urzhumtsev et al., 1996
). The values of
and
are considered as constant for all reflections in each of a number of thin spherical layers of reciprocal space. In this mode, the structure-factor magnitudes calculated from the starting atomic model must be present in the input file together with the experimental magnitudes and flags for the test set of reflections (Brünger, 1992
). Every spherical layer should contain at least 50-100 test reflections in order to obtain a good estimate of the parameters
and
. As is usual, the test set must represent 5-10% of the full data set.
Finally, the user may use his or her own estimates for
and
. In this third mode, these values, calculated for each reflection, must be included in the input file of structure factors.
The input information for MLMF is contained in two files: the file of control data and the file of structure factors.
The mandatory input parameters are the resolution limits and the space-group symmetries. If the mode 1 (
= 1) has been chosen, then the estimated number of absent atoms and their mean temperature factor are requested. Mode 2 needs alternatively the structure factors calculated from the starting model, the corresponding values for the flagged test set, and the number of the resolution shells in which the parameters
and
are considered to be constant.
The input file of structure factors is a formatted file in which all data are written in columns. In particular, a file in CNS format (Brünger et al., 1998
) is suitable. Free-format records as well as records in any fixed format are also accepted. Every record corresponds to one reflection, but can be expanded over several lines, as is the case for the CNS format. Each record contains Miller indices h, k, l, and experimental structure-factor magnitudes
, which are mandatory. The calculated structure factors
, test-set flags and predefined values of the parameters
and
are optional, depending on the mode chosen.
The main output file is a file of structure factors written in the same format as the corresponding input file. It contains one record per reflection containing h, k, l,
,
and
values.
The message file contains a copy of the input control parameters, calculated statistics and various data for different modes of calculation of
and
values, error and warning messages, etc.
The program MLMF is written in standard Fortran 77 and is system-independent.
The authors thank C. Lecomte for his interest in the project. The project was partially financed through the CNRS-RAS collaboration and supported by RFBR grant 00-04-48175. PA and AU are members of GdR 2417 CNRS.
Adams, P. D., Pannu, N. S., Read, R. J. & Brünger, A. T. (1997). Proc. Natl Acad. Sci. USA, 94, 5018-5023.
![[ChemPort]](../../../../../../logos/chemportborder.gif)
Afonine, P., Urzhumtsev, A. & Lunin, V. Y. (2002). CCP4 Newsl. Protein Crystallogr. 40 (http://www.ccp4.ac.uk/newsletters.html).
Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend, pp. 85-92. Warrington: Daresbury Laboratory. ![[ChemPort]](../../../../../../logos/chemportborder.gif)
Brünger, A. T. (1992). Nature (London), 355, 472-474.
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921. ![[details]](../../../../../../d/graphics/details.gif)
Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. (2002). Acta Cryst. A58, 270-282.
![[ChemPort]](../../../../../../logos/chemportborder.gif)
Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880-887.
![[ChemPort]](../../../../../../logos/chemportborder.gif)
Lunin, V. Y. & Urzhumtsev, A. (1984). Acta Cryst. A40, 269-277. ![[details]](../../../../../../a/graphics/details.gif)
Lunin, V. Y. & Urzhumtsev, A. (1999). CCP4 Newsl. Protein Crystallogr. 37, 14-28.
Luzzati, V. (1952). Acta Cryst. 5, 802-810. ![[details]](../../../../../../a/graphics/details.gif)
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240-255.
![[ChemPort]](../../../../../../logos/chemportborder.gif)
Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659-668.
![[ChemPort]](../../../../../../logos/chemportborder.gif)
Read, R. J. (1986). Acta Cryst. A42, 140-149.
![[ChemPort]](../../../../../../logos/chemportborder.gif)
Read, R. J. (1990). Acta Cryst. A46, 900-912. ![[details]](../../../../../../a/graphics/details.gif)
Srinivasan, R. & Parthasarathy, S. (1976). Some Statistical Applications in X-ray Crystallography. Oxford: Pergamon Press.
Urzhumtsev, A., Skovoroda, T. P. & Lunin, V. Y. (1996). J. Appl. Cryst. 29, 741-744.
![[ChemPort]](../../../../../../logos/chemportborder.gif)