Likelihood-enhanced fast translation functions

McCoy, A.J.; Grosse-Kunstleve, R.W.; Storoni, L.C.; Read, R.J.

doi:10.1107/S0907444905001617

research papers

BIOLOGICAL
CRYSTALLOGRAPHY

ISSN: 1399-0047

Volume 61| Part 4| April 2005| Pages 458-464

doi:10.1107/S0907444905001617

Likelihood-enhanced fast translation functions

Airlie J. McCoy,^a Ralf W. Grosse-Kunstleve,^b Laurent C. Storoni ^a and Randy J. Read ^a ^*

^aDepartment of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 2XY, England, and ^bLawrence Berkeley National Laboratory, One Cyclotron Road, Building 64R0121, Berkeley, California 94720-8118, USA
^*Correspondence e-mail: rjr27@cam.ac.uk

(Received 21 December 2004; accepted 17 January 2005)

This paper is a companion to a recent paper on fast rotation functions [Storoni et al. (2004 ), Acta Cryst. D60, 432–438], which showed how a Taylor-series expansion of the maximum-likelihood rotation function leads to improved likelihood-enhanced fast rotation functions. In a similar manner, it is shown here how linear and quadratic Taylor-series expansions and least-squares approximations of the maximum-likelihood translation function lead to likelihood-enhanced translation functions, which can be calculated by FFT and which are more sensitive to the correct translation than the traditional correlation-coefficient fast translation function. These likelihood-enhanced translation targets for molecular-replacement searches have been implemented in the program Phaser using the Computational Crystallography Toolbox (cctbx).

Keywords: molecular replacement; translation functions; LETF; Phaser; cctbx.

1. Introduction

Macromolecular structure solution by molecular replacement is usually a two-step process. Firstly, a rotation function is used to find the orientation of the search model. Secondly, the position of the (oriented) search model is found using a form of translation function (Rossmann, 1972 ; Machin, 1985 ). Less commonly, full 6n-dimensional searches are carried out using either systematic (Sheriff et al., 1999 ) or stochastic (Kissinger et al., 1999 ; Glykos & Kokkinidis, 2000 ) algorithms.

Many translation-search functions have been described in the literature. They fall into two general categories: those that are evaluated at each sampled translation point in real space in a brute-force search and those that are calculated by FFT and therefore generate values for all points on the Fourier grid in real space simultaneously. The FFT methods have the advantage of being several orders of magnitude faster than the brute-force searches. In the brute-force category are R-factor searches (Dodson, 1988 ), correlation searches on amplitude or intensity (Fujinaga & Read, 1987 ) and full maximum-likelihood searches (Bricogne, 1992 , 1997 ; Read, 2001 ). In the FFT category are the overlap function (Crowther & Blow, 1967 ) and variations (Tickle, 1985 , 1992 ), which measure the overlap between the observed and calculated Patterson functions, and the fast correlation coefficient on intensity (Navaza & Vernoslova, 1995 ). When there is prior phase information, either from experimental phases or a partial model, FFT-based phased translation functions can be used (Colman et al., 1976 ; Read & Schierbeek, 1988 ; Navaza, 2001 ).

The correlation coefficient on intensity (CORR) is currently the most successful fast translation function and is widely used in molecular-replacement software [e.g. AMoRe (Navaza, 1994 ), MolRep (Vagin & Teplyakov, 1997 ) and CNS (Brünger et al., 1998 )]. If x is the translation of the oriented search model, then CORR is given by

$[{\rm CORR}({\bf x}) = {{\textstyle \sum\limits_{\bf h} M_{\bf h} (I_{\bf h}^{\rm obs} - \overline {I_{\bf h}^{\rm obs}})[I_{\bf h}^{\Phi} ({\bf x}) - \overline {I_{\bf h}^{\Phi}({\bf x})]}} \over {\left[\sum\limits_{\bf h} M_{\bf h}(I_{\bf h}^{\rm obs} - \overline {I_{\bf h}^{\rm obs} })^2 \right]^{1/2} \left\{\sum\limits_{\bf h} M_{\bf h} [I_{\bf h}^{\Phi}({\bf x}) - \overline {I_{\bf h}^{\Phi}({\bf x})}]^2 \right\}^{1/2} }}, \eqno (1)]$

where h is the Miller index of a reflection, M_h is its multiplicity, $[I_{\bf h}^{\rm obs}]$ is the intensity of the observed data, $[\overline {I_{\bf h}^{\rm obs}}]$ is its mean value, $[I_{\bf h}^{\Phi}({\bf x})]$ is the square of the amplitude of the sum of the phased fixed (i.e. known) and moving (i.e. search) structure-factor contributions and $[\overline {I_{\bf h}^{\Phi}({\bf x})}]$ is its mean value.

CORR is not as reliable in identifying the correct translation as the maximum-likelihood translation function, which is the same as the Rice function used for structure refinement (Read, 2001). The expression presented previously is rearranged here in order to make the approximations that will be developed more intuitive. To maximize numerical stability, we compute the log of the likelihood, which has its maximum for the same values of the parameters as the likelihood. If the reflections are assumed to be independent, the total log likelihood for a translation x in the Rice approximation is given by the sum of the reflection log likelihoods. The likelihood for a single reflection is given by

$[L_{\bf h} [I_{\bf h}^{\Phi}({\bf x})] = {{ 2(I_{\bf h}^{\rm obs})^{1/2} } \over {\varepsilon \Sigma _T }} \exp \left [- {{I_{\bf h}^{\rm obs} + I_{\bf h}^{\Phi} ({\bf x})} \over {\varepsilon \Sigma _T }}\right] I_0 \left \{{{2[I_{\bf h}^{\rm obs} I_{\bf h}^{\Phi}({\bf x})]^{1/2}} \over {\varepsilon \Sigma _T }} \right\}]$

for acentric reflections, where I₀ is the modified Bessel function of order zero, and by

$[\eqalign {L_{\bf h} [I_{\bf h}^{\Phi}({\bf x})] &= \left ({2 \over {\pi \varepsilon \Sigma _T }}\right)^{1/2} \exp \left [- {{ I_{\bf h}^{\rm obs} + I_{\bf h}^{\Phi}({\bf x})} \over {2\varepsilon \Sigma _T }} \right]\cr &\ \quad {\times}\ \cosh \left\{ {{ [I_{\bf h}^{\rm obs} I_{\bf h}^{\Phi}({\bf x})]^{1/2} } \over {\varepsilon \Sigma _T }} \right\}}]$

for centric reflections. These likelihoods are defined in terms of the probability of measuring an amplitude $[|{\bf F}_{\bf h}^{\rm obs}|]$ [= $[(I_{\bf h}^{\rm obs})^{1/2}]$ ].

The contribution of acentric reflections to the log-likelihood is therefore given by

$[\eqalignno {{\rm LL}_{\bf h} [I_{\bf h}^{\Phi}({\bf x})] &= \ln \left [{{2(I_{\bf h}^{\rm obs})^{1/2}} \over {\varepsilon \Sigma _T }} \right] - {{ I_{\bf h}^{\rm obs} + I_{\bf h}^{\Phi} ({\bf x})} \over {\varepsilon \Sigma _T }} \cr &\ \quad {+}\ \ln \left(I_0 \left\{ {{ 2 [I_{\bf h}^{\rm obs} I_{\bf h}^{\Phi}({\bf x})]^{1/2}} \over {\varepsilon \Sigma _T }} \right \} \right) & (2a)}]$

and for centric reflections by

$[\eqalignno {{\rm LL}_{\bf h} [I_{\bf h}^{\Phi}({\bf x})] &= {1 \over 2}\ln \left({2 \over {\pi \varepsilon \Sigma _T }} \right) - {{ I_{\bf h}^{\rm obs} + I_{\bf h}^{\Phi} ({\bf x})} \over {2\varepsilon \Sigma _T }} \cr &\ \quad +\ \ln \left(\cosh \left \{ {{[I_{\bf h}^{\rm obs} I_{\bf h}^{\Phi} ({\bf x})]^{1/2}} \over {\varepsilon \Sigma _T }} \right\} \right). & (2b)}]$

In these equations,

$[I_{\rm h}^{\Phi} ({\bf x}) = |D_{\rm move} F_{\bf h}^{\rm move}({\bf x}) + D_{\rm fix} F_{\bf h}^{\rm fix}|^2,]$

$[\Sigma_T ({\bf h}) = \Sigma _{N'} ({\bf h}) - D_{\rm fix}^2 \Sigma _P^{\rm fix} - D_{\rm move}^2 \Sigma _P^{\rm move},]$

$[\Sigma _{N'} ({\bf h}) = \Sigma _N ({\bf h}) - \textstyle \sum\limits_{j_f } D_{j_f}^2 F_{j_f}^2 ({\bf h}) - \langle D_{j_f }^2 F_{j_f }^2 \rangle,]$

$[\Sigma_P^{\rm fix} = \langle I^{\rm fix}/ \varepsilon \rangle, \Sigma _P^{\rm move} = \langle I^{\rm move}/\varepsilon\rangle.]$

The subscripts j_f refer to any fixed (i.e. non-translating) molecules that have an unknown origin relative to the moving molecule. Each F_{j_f} thus represents a structure-factor component with unknown relative phase compared with other components (for example, from fixing the orientation but not the position of a molecule) and may represent the sum of a number of molecular transforms with known relative phase. In contrast, $[F_{\bf h}^{\rm fix}]$ is a fixed contribution with known phase relative to the contributions of symmetry-related copies of the moving molecule. Σ_N is the bare variance of the Wilson (1949 ) distribution, in which nothing is known apart from the unit-cell content. Σ_T is a variance that takes into account the acquisition of extra information from the contributions of the fixed and moving molecules. Σ_N′ accounts for the part of the extra information that arises from the F_{j_f} contributions with unknown relative phase. The factor ∊ accounts for the statistical effect of symmetry on the expected intensity and is equal to the number of symmetry operations that, when applied to h, leave it unchanged. The D factors are the fractions of the calculated structure-factor components that are correlated with the true values (Luzzati, 1952 ). To account for the effect of errors in measuring the observed amplitudes, an observational variance contribution is added to Σ_N, as performed for experimental phasing (Green, 1979 ; de La Fortelle & Bricogne, 1997 ) and structure refinement (Murshudov et al., 1997 ).

The maximum-likelihood translation function is time-consuming to compute and this problem is one that can affect success in finding the correct solution. In difficult molecular-replacement solutions, the correct orientation may be a long way down the sorted list of potential orientations in the results from the rotation function, and it may only be possible to identify the correct orientation by the high translation-function score that it generates. If the translation function is too time-consuming to compute then, in practice, the number of potential orientations that can be tested may be limited and the correct orientation may be missed by the search. Thus, developing an approximation to the full-likelihood translation function that retains its superior ability to discriminate correct from incorrect solutions, but that may be calculated by FFT, is important to the practical success of a maximum-likelihood molecular-replacement program. We follow the strategy used in AMoRe (Navaza, 1994), in which fast methods are used to generate a list of plausible solutions that is then rescored by a better but computationally more expensive target. In our case, we rescore potential solutions using the translation likelihood target (Read, 2001).

We showed recently (Storoni et al., 2004) that likelihood-enhanced fast rotation functions are an excellent compromise between the high quality but slow full-likelihood rotation-function target and the lower quality but much faster traditional Crowther FFT-based search methods, as they provide better discrimination between correct and incorrect orientations than the Crowther function but at the same speed. Here, we use series approximations to the full maximum-likelihood Rice translation function to derive several likelihood-enhanced FFT translation functions. These are of higher quality and as fast or faster than CORR.

2. Series approximations of maximum-likelihood translation function

The fast correlation coefficient algorithm (Navaza & Vernoslova, 1995) provides an efficient method to compute translation targets expressed through linear and quadratic terms in $[I_{\bf h}^\Phi({\bf x})]$ . We have examined two methods to construct such series approximations of the maximum-likelihood translation function. Firstly, we have used Taylor-series expansions to the first and second order. Secondly, we have fitted least-squares linear and quadratic approximations to the likelihood function.

2.1. Taylor-series expansions

To compute Taylor-series expansions, we require the derivatives of the function with respect to the expansion variable. Starting from (2), the first derivative of $[{\rm LL}_{\bf h} [I_{\bf h}^{\Phi} ({\bf x})]]$ with respect to $[I_{\bf h}^\Phi({\bf x})]$ is given by

$[{\rm LL}'_{\bf h} [I_{\bf h}^{\Phi}({\bf x})] = {1 \over {w_{\bf h} \varepsilon \Sigma _T }} \left\{ {{ m_{\bf h}(I_{\bf h}^{\rm obs})^{1/2} } \over { [I_{\bf h}^{\Phi}({\bf x})]^{1/2} }} - 1 \right \}\eqno (3)]$

and the second derivative is given by

$[{\rm LL}''_{\bf h} [I_{\bf h}^{\Phi}({\bf x})] = {1 \over {w_{\bf h}^2 \varepsilon \Sigma _T I_{\bf h}^{\Phi} ({\bf x})}} \left \{ {{ (1 - m_{\bf h}^2) I_{\bf h}^{\rm obs}} \over {\varepsilon \Sigma _T }} - {{ m_{\bf h} (I_{\bf h}^{\rm obs})^{1/2} } \over {[I_{\bf h}^{\Phi} ({\bf x})]^{1/2}}} \right \}, \eqno (4)]$

where for acentric reflections

$[m_{\bf h} = {{I_1 \{ 2[I_{\bf h}^{\rm obs} I_{\bf h}^{\Phi}({\bf x})]^{1/2}/ \varepsilon \Sigma _T \} } \over { I_0 \{ 2[I_{\bf h}^{\rm obs} I_{\bf h}^{\Phi} ({\bf x})]^{1/2}/ \varepsilon \Sigma _T\} }}, w_{\bf h} = 1]$

and for centric reflections

$[m_{\bf h} = \tanh \{ [I_{\bf h}^{\rm obs} I_{\bf h}^{\Phi} ({\bf x})]^{1/2}/ \varepsilon \Sigma _T\}, w_{\bf h} = 2.]$

The first-order Taylor series expansion of the Rice function, centred at $[I_{\bf h}^{\Phi}({\bf x})]$ = χ_h, is therefore given by

$[\eqalignno {{\rm LL}_{\bf h}^1 [I_{\bf h}^{\Phi}({\bf x})] &= {\rm LL}_{\bf h}(\chi _{\bf h}) + {\rm LL}'_{\bf h} (\chi _{\bf h}) [I_{\bf h}^{\Phi} ({\bf x}) - \chi_{\bf h}] \cr &= C_{\bf h}^1 + {\rm LL}'_{\bf h} (\chi _{\bf h})I_{\bf h}^{\Phi} ({\bf x}), & (5)}]$

where $[C_{\bf h}^1]$ is a constant not dependent on x.

Similarly, the second-order Taylor series expansion of the Rice function, centred at $[I_{\bf h}^{\Phi}({\bf x})]$ = χ_h, is given by

$[\eqalignno {{\rm LL}_{\bf h}^2 [I_{\bf h}^{\Phi} ({\bf x})] & = {\rm LL}_{\bf h} (\chi _{\bf h}) + {\rm LL}'_{\bf h} (\chi _{\bf h})[I_{\bf h}^{\Phi} ({\bf x}) - \chi_{\bf h}] \cr &\ \quad +\ {\textstyle{1 \over 2}}{\rm LL}''_{\bf h} (\chi _{\bf h})[I_h^{\Phi} ({\bf x}) - \chi _{\bf h}]^2 \cr & = C_{\bf h}^2 + [{\rm LL}'_{\bf h}(\chi _{\bf h}) - \chi_{\bf h} {\rm LL}''_{\bf h} (\chi _{\bf h})]I_{\bf h}^{\Phi} ({\bf x}) \cr &\ \quad +\ {\textstyle{1 \over 2}}{\rm LL}''_{\bf h} (\chi _{\bf h})I_{\bf h}^{\Phi}, ({\bf x})^2 & (6)}]$

where $[C_{\bf h}^2]$ is a constant not dependent on x.

The expansions provide good estimates of the values of the likelihood function over only a restricted range of values of $[I_{\bf h}^ {\Phi}({\bf x})]$ close to the point of expansion. Optimal results thus require a good choice of the region to be approximated. We have chosen to centre the Taylor-series expansions on the expected value of $[I_{\bf h}^{\Phi} ({\bf x})]$ , so that they are most accurate over the range of values likely to be sampled during the translation search. The expected value takes account of the fixed contribution, if any, and the amplitudes of the molecular transforms of symmetry copies k of the moving molecule. This leads to

$[\chi _{\bf h} = \langle I_{\bf h}^{\Phi} ({\bf x})\rangle = D_{\rm fix}^2 | {\bf F}_{\bf h}^{\rm fix}|^2 + D_{\rm move}^2 \textstyle \sum\limits_k |{\bf F}_k ({\bf h})|^2. \eqno (7)]$

We have tested the effect of computing the expected value of $[I_{\bf h}^{\Phi}({\bf x})]$ using less of the available information, i.e. by taking account only of the scattering power of the moving molecule but ignoring the amplitudes of the molecular-transform contributions. As expected, this approximation works less well (results not shown). In addition, we have tested the use of Taylor expansions centred on zero, which degrades the results significantly (results not shown).

2.2. Least-squares approximations

Least-squares approximations are computed by fitting either a line or a parabola to values of the likelihood function sampled over the range likely to be spanned by $[I_{\bf h}^{\Phi}({\bf x})]$ , weighted by the probability of encountering each value of $[F_{\bf h}^{\Phi}]$ = $[[I_{\bf h}^{\Phi}({\bf x})]^{1/2}]$ . The probability distribution for $[F_{\bf h}^{\Phi}]$ is computed by analogy with the Sim-like rotation likelihood function (Read, 2001),

$[p(F_{\bf h}^{\Phi}) = {{ 2F_{\bf h}^{\Phi}} \over {\varepsilon \Sigma _S }} \exp \left(- {{ F_{\bf h}^{\Phi^2} + | {\bf F}_{\rm big}|^2 } \over {\varepsilon \Sigma _S }} \right) I_0 \left ({{ 2F_{\bf h}^{\Phi} | {\bf F}_{\rm big}| } \over {\varepsilon \Sigma _S }} \right)]$

for acentric reflections and

$[p(F_{\bf h}^{\Phi}) = \left ({2 \over {\pi \varepsilon \Sigma _S }}\right)^{1/2} \exp \left(- {{F_{\bf h}^{\Phi ^2 } + | {\bf F}_{\rm big}|^2 } \over {2\varepsilon \Sigma _S }} \right) \cosh \left({{F_{\bf h}^{\Phi}| {\bf F}_{\rm big}|} \over {\varepsilon \Sigma _S }}\right)]$

for centric reflections, where

$[\Sigma _S = \langle I_{\bf h}^{\Phi }({\bf x})\rangle - |{\bf F}_{\rm big} |^2]$

and F_big is the largest term in the sum contributing to $[\langle I_{\bf h}^{\Phi}({\bf x})\rangle]$ .

The linear least-squares approximation is defined by determining the coefficients $[B_{\bf h}^L]$ and $[C_{\bf h}^L]$ that minimize the residual

$[\textstyle \int p(F_{\bf h}^{\Phi})\{{\rm LL}_{\bf h}[I_{\bf h}^{\Phi}({\bf x})] - [C_{\bf h}^L + B_{\bf h}^L I_{\bf h}^{\Phi} ({\bf x})]\}^2\,\,{\rm d}F_{\bf h}^{\Phi}. \eqno (8)]$

Similarly, the quadratic least-squares approximation is defined by determining the coefficients $[A_{\bf h}^Q]$ , $[B_{\bf h}^Q]$ and $[C_{\bf h}^Q]$ that minimize the residual

$[\textstyle \int p(F_{\bf h}^{\Phi})\{{\rm LL}_{\bf h} [I_{\bf h}^{\Phi}({\bf x})] - [C_{\bf h}^Q + B_{\bf h}^Q I_{\bf h}^{\Phi} ({\bf x}) +A_{\bf h}^Q I_{\bf h}^{\Phi}({\bf x})^2]\}^2\,\,{\rm d }F_{\bf h}^{\Phi}. \eqno (9)]$

In practice, we find that it is sufficient to compute the residual with a sum over as few as five points spanning the range of $[F_{\bf h}^{\Phi}]$ ; Phaser uses seven points for stability.

3. Likelihood-enhanced translation functions

For calculating the optimal position of a search model given a particular orientation, the translation-independent constants could be ignored as they only change the mean of the search-function scores. However, we have chosen to retain them so that the scores for different orientations can be compared. The first-order Taylor-series expansion of the Rice function, combining (5) and (7), then gives what we call the likelihood-enhanced translation function 1 (LETF1),

$[{\rm LETF}1 ({\bf x}) = \textstyle \sum\limits_{\bf h} C_{\bf h}^1 + {\rm LL}'_{\bf h} (\langle I_{\bf h}^{\Phi}\rangle)I_{\bf h}^{\Phi} ({\bf x}).\eqno (10)]$

The second-order Taylor-series expansion of the Rice likelihood target, combining (6) and (7), gives the likelihood-enhanced translation function 2, or LETF2,

$[\eqalignno {{\rm LETF}2({\bf x}) & = \textstyle \sum\limits_{\bf h} C_{\bf h}^2 + [{\rm LL}'_{\bf h} (\langle I_{\bf h}^{\Phi}\rangle) - \langle I_{\bf h}^{\Phi} \rangle {\rm LL}''_{\bf h} (\langle I_{\bf h}^{\Phi} \rangle)]I_{\bf h}^{\Phi}({\bf x}) \cr &\ \quad +\ {\textstyle{1 \over 2}}{\rm LL}''_{\bf h} (\langle I_{\bf h}^{\Phi}\rangle)I_{\bf h}^{\Phi} ({\bf x})^2. & (11)}]$

The linear least-squares approximation of the Rice likelihood target, using coefficients determined by minimizing (8), gives the linear likelihood-enhanced translation function, or LETFL,

$[{\rm LETFL}({\bf x}) = \textstyle \sum\limits_{\bf h}C_{\bf h}^L + B_{\bf h}^L I_{\bf h}^{\Phi} ({\bf x}).\eqno (12)]$

Finally, the quadratic least-squares approximation of the Rice likelihood target, using coefficients determined by minimizing (9), gives the quadratic likelihood-enhanced translation function, or LETFQ,

$[{\rm LETFQ}({\bf x}) = \textstyle \sum\limits_{\bf h} C_{\bf h}^Q + B_{\bf h}^Q I_{\bf h}^{\Phi} ({\bf x}) + A_{\bf h}^Q I_{\bf h}^{\Phi} ({\bf x})^2 \eqno (13)]$

Information from fixed parts of the model is introduced into the coefficients of the fast translation targets in two ways. Phased structure-factor contributions are incorporated directly through $[I_{\bf h}^{\Phi} ({\bf x})]$ and through the contribution to the variance term $[\Sigma _P^{\rm fix}]$ in Σ_T. Those parts of the structure for which the orientation but not the position are known also contribute to the variance through $[\Sigma _{N'}]$ in Σ_T.

4. Implementation

The target functions CORR, LEFT1, LETF2, LETFL and LETFQ described above were implemented in the program Phaser using the Computational Crystallography Toolbox (Grosse-Kunstleve et al., 2002 ). For convenience, the calculations are performed in terms of E values, normalized by dividing the structure factors by (∊Σ_N)^1/2. At the same time the variances, such as ∊Σ_T in (2), are divided by ∊Σ_N.

The fast translation function of Navaza & Vernoslova (1995) was factored into functions that compute $[\textstyle \sum_{\bf h} A_{\bf h} I_{\bf h}^{\Phi}({\bf x})^2]$ and $[\textstyle \sum_{\bf h} B_{\bf h} I_{\bf h}^{\Phi}({\bf x})]$ given the Miller indices h, the coefficients A_h and B_h, the observed data $[(I_{\bf h}^{\rm obs})^{1/2}]$ , the fixed components $[{\bf F}_{\bf h}^{\rm fix}]$ of the calculated structure factor and the molecular transform (in P1) of the moving molecule before translation. With these functions, all of the LETF functions can be computed. For computing $[\textstyle \sum_{\bf h} B_{\bf h} I_{\bf h}^{\Phi}({\bf x})]$ , the run time scales with the second power of the number of symmetry operations. For computing $[\textstyle \sum_{\bf h} A_{\bf h} I_{\bf h}^{\Phi}({\bf x})^2]$ , the run time scales with the fourth power of the number of symmetry operations. For centred cells, the summations can be carried out using only the symmetry operations corresponding to the null centring (the `primitive' subset) to minimize the run time (e.g. for F-centred cells, this decreases the run time by a factor of 4⁴ = 256 for the summations involving the square of the calculated intensities). The same computational saving can also be achieved by transforming the reflection data and coordinates to a primitive setting. This slightly more involved approach has the additional advantage of reducing the memory requirements (e.g. by a factor of 4 for F-centred cells).

The coefficients of the FFT to compute $[\textstyle \sum_{\bf h} B_{\bf h} I_{\bf h}^{\Phi}({\bf x})]$ involve terms to twice the data resolution and those to compute $[\textstyle \sum_{\bf h} A_{\bf h} I_{\bf h}^{\Phi}({\bf x})^2]$ involve terms to four times the data resolution (Navaza & Vernoslova, 1995). It follows from Langs (2002 ) that the fast translation functions may be evaluated using a grid coarser than the Shannon sampling corresponding to the terms involved. In principle, to preserve all the details, the grid spacing should be at least as fine as the Shannon sampling: d_min/4 for doubled resolution and d_min/8 for quadrupled resolution. Numerical tests show that a grid spacing of d_min/4 is optimal for the first-order approximations (LETF1 and LETFL). The results for the second-order approximations (LETF2 and LETFQ) do not improve much when the grid is made finer than d_min/5 and they are usually acceptable with a grid of d_min/4.

Note that in our implementation of CORR, the components of $[I_{\bf h}^{\Phi}({\bf x})]$ are weighted by Luzzati (1952) D values reflecting the expected coordinate errors of the models. This improves the results over those obtained without weights.

5. Test cases

Results from three tests are shown below. These examples were chosen to illustrate the performance of the fast translation function targets in a variety of circumstances, not because the use of the new targets is essential to solving these structures. Earlier work (Read, 2001) has already demonstrated that the likelihood targets are more sensitive to the correct solution than traditional targets such as CORR. In Phaser, the top translations from the fast translation search are rescored with the full translation likelihood target; the better the fast search predicts the top peaks, the shorter the list for rescoring can be. We use scatter plots and the correlation coefficient between the fast and slow (LLG) scores to evaluate how well the fast scores approximate the slow score and thus predict the order of the rescored peaks. The test cases below were chosen to assess the fast translation scores in cases where an accurate model accounts for either a small proportion or a large proportion of the total structure factor and also in cases where the model is less accurate.

No low-resolution cutoffs were applied to the available data in any of the tests.

5.1. β-Lactamase and β-lactamase inhibitor protein complex

The structure of the complex between β-lactamase (BETA) and β-lactamase inhibitor protein (BLIP) has served as a test structure for maximum-likelihood molecular replacement (Read, 2003 ; Storoni et al., 2004) because the original structure determination using traditional molecular-replacement techniques was difficult, even though good models for BETA and BLIP were available (Strynadka et al., 1996 ). The difficulty arose in the search for the BLIP component, especially in determining its orientation, as the BETA component is easily found by traditional (and maximum-likelihood) methods. BLIP was difficult to find by traditional methods for two main reasons. Firstly, the BLIP component of the structure comprises only 38% of the total scattering (the BETA component accounts for the other 62%). Secondly, the data are anisotropic and so there is systematic variation in the structure-factor amplitudes not accounted for by the molecular model, which increases the noise of the search. We have previously shown that full maximum-likelihood molecular replacement (Read, 2003) and the likelihood-enhanced fast rotation functions (Storoni et al., 2004) allow the BLIP component to be found easily. Maximum likelihood overcomes the problems of low scattering and anisotropic data (manuscript in preparation) through better modelling of the structure-factor probabilities and by allowing the information from BETA to be included in the search for BLIP.

5.1.1. Searching for BLIP alone with restricted resolution

The correct orientation for BLIP can be found with a likelihood-based fast rotation search, even when the information about the BETA component is not exploited (Storoni et al., 2004). Once its orientation is known, the translation can be determined easily with any of the fast translation-function scores. To make the translation search more challenging, we have reduced the signal by truncating the resolution of the data to 6 Å. This test illustrates the case where the model predicts only a relatively small component of the structure factor, even if the model is reasonably accurate. As Fig. 1(a) shows, only a small range of values of $[I_{\bf h}^{\Phi} ({\bf x})]$ is spanned for a typical reflection as the model is translated. Over this range the Rice likelihood function is reasonably close to linear. The results in Table 1 demonstrate that all the LETF scores provide a much better prediction of the LLG score than does the CORR score. As one might expect, the higher order approximations provide a better fit to LLG and the least-squares approximations are slightly better than the Taylor-series approximations. The scatter plots in Fig. 2 and the results in Table 2 show that the correct translation receives the top score in all LETF scores, but not with CORR. Nonetheless, the correct translation is near the top of the list even for CORR and would be recovered in this case if the peaks were rescored with the LLG score.

Table 1
Correlation coefficients between peaks of fast translation maps and LLG values from rescoring in three sets of test calculations

	Translate BLIP alone, 6 Å	Fix BETA, translate BLIP, 3 Å	Translate 1d0d model of TOXD
CORR	0.752	0.714	0.803
LETF1	0.931	0.920	0.949
LETFL	0.936	0.922	0.950
LETF2	0.969	0.946	0.971
LETFQ	0.981	0.987	0.984

Table 2
Peak-to-noise discrimination in translation searches

Results are expressed as Z scores, i.e. r.m.s. deviations above the mean score.

	Translate BLIP alone, 6 Å		Fix BETA, translate BLIP, 3 Å		Translate 1d0d model of TOXD
	Correct	Top incorrect	Correct	Top incorrect	Correct	Top incorrect
CORR	4.49	4.96	24.26	5.12	6.13	5.24
LETF1	5.31	4.85	31.00	7.86	5.98	5.20
LETFL	5.30	4.81	30.55	7.54	5.93	5.15
LETF2	5.45	5.00	27.43	5.98	5.41	4.73
LETFQ	5.45	4.90	30.03	7.18	5.57	4.82
LLG	5.96	4.94	31.25	7.33	5.86	4.97

Figure 1
Plots of Rice log-likelihood function and its approximations (vertical axis) for a single acentric reflection from the BLIP test case, as a function of $[I_{\bf h}^{\Phi} ({\bf x})]$ . The data are normalized, so that $[I_{\bf h}^{\Phi}({\bf x})]$ has been divided by Σ_N. The log-likelihood function (LL_h) is shown in black, the linear (LETF1) and quadratic (LETF2) Taylor-series approximations are shown in blue and the linear (LETFL) and quadratic (LETFQ) least-squares approximations are shown in red. A dashed line shows the probability distribution of $[I_{\bf h}^{\Phi}({\bf x})]$ , superimposed using an arbitrary scale and origin. (a) BLIP alone. (b) BLIP with BETA fixed.

Figure 2
Scatter plots showing correlation between peaks in the fast translation function maps and the LLG values from rescoring. The BLIP component of the BETA–BLIP complex was translated using data restricted to 6 Å resolution and not taking into account the contribution of the BETA component. A triangle indicates the score for the best translation. (a) CORR score. (b) LETF1 score. (c) LETF2 score. (d) LETFQ score.

5.1.2. Searching for BLIP, fixing known BETA contribution

In the previous case, the contribution of BLIP accounts for only a small part of the uncertainty in the prediction of the observed structure-factor amplitude, so only a relatively small portion of the Rice-function curve is sampled as the molecule is translated. To test the case where the translated model accounts for a much greater part of the uncertainty, we carried out tests in which the known contribution of BETA was fixed during the translation search for BLIP using all data to 3 Å resolution. In this case, as shown in Fig. 1(b), a wider range of values of $[I_{\bf h}^{\Phi} ({\bf x})]$ will be sampled and the Rice likelihood function deviates more from a straight line. The results in Table 1 show that, as one might expect, the first-order approximations work somewhat more poorly than in the case with BLIP alone, but the correlation with the LLG score is still very high for all LETF scores. The results in Table 2 show that with the correct orientation this translation problem is trivial for all search targets.

5.2. TOXD

In a further test, we used the test data for α-dendrotoxin (TOXD) distributed with the CCP4 suite (Collaborative Computational Project, Number 4, 1994 ). This structure was originally solved by isomorphous replacement (Skarzynski, 1992 ), but it shares 36% sequence identity with bovine pancreatic trypsin inhibitor. As a model, we have used the structure of bovine pancreatic trypsin inhibitor from PDB entry 1d0d (St Charles et al., 2000 ). The results in Table 1 demonstrate that the LETF scores are equally good approximations of LLG, whether the model is closely or more distantly related to the target structure.

6. Conclusions

The results demonstrate that all four likelihood-based fast translation functions investigated here (LETF1, LETF2, LETFL and LETFQ) are superior to CORR in approximating the full-likelihood target, LLG, and thus in predicting the top solutions. The first-order approximations (LETF1 and LETFL) have the significant advantage that they only require one FFT, with a map sampled at d_min/4. The second-order approximations (LETF2 and LETFQ) require only two FFTs compared with the three needed for CORR.

In practice, we prefer the use of the LETF1 fast translation function, which is the program default in Phaser. We have not found a molecular-replacement problem in which the second-order targets succeed in finding the correct solution when LETF1 fails. This is probably because molecular-replacement problems become more difficult as the fragment to be found becomes smaller or the model becomes less accurate. In both situations, the proportion of the observed structure-factor amplitude explained by the model decreases and, as illustrated in Fig. 1, the relevant portion of the Rice likelihood-function curve becomes more linear. In addition, the second-order approximations require a second FFT, leading to greater memory requirements. Finally, the calculation of the first derivative needed for LETF1 is simpler and perhaps more reliable than the least-squares fitting required for LETFL.

The translation function is also used in dual-space substructure searches (Grosse-Kunstleve & Adams, 2003 ), where peaks in the Patterson function are selected. These represent heavy-atom pairs and the heavy-atom pairs are then translated through the unit cell to find the position of the pair. This pair is the basis of a bootstrap procedure to find the rest of the heavy atoms in a substructure. The likelihood-enhanced translation functions described here could also be used for these substructure searches.

The program Phaser has been released as part of the PHENIX (Adams et al., 2002 ) software suite and will be released as part of the CCP4 (Collaborative Computational Project, Number 4, 1994) suite. It is also available from the authors (see https://www-structmed.cimr.cam.ac.uk/phaser for details).

Acknowledgements

We are grateful to Michael James and Natalie Strynadka for supplying the data for the β-lactamase complex test case. This work was funded by NIH/NIGMS under grant No. 1P01GM063210 and by a Principal Research Fellowship from the Wellcome Trust (RJR).

References

Adams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948–1954. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bricogne, G. (1992). Proceedings of the CCP4 Study Weekend. Molecular Replacement, edited by W. Wolf, E. J. Dodson & S. Gover, pp. 62–75. Warrington: Daresbury Laboratory. Google Scholar
Bricogne, G. (1997). Methods Enzymol. 276, 361–423. CrossRef CAS Web of Science Google Scholar
Brünger, A. T., Adams, P. D., Clore, G. M., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals Google Scholar
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals Google Scholar
Colman, P. M., Fehlhammer, H. & Bartels, K. (1976). Crystallographic Computing Techniques, edited by F. R. Ahmed, K. Huml & B. Sedlacek, pp. 248–258. Copenhagen: Munksgaard. Google Scholar
Crowther, R. A. & Blow, D. M. (1967). Acta Cryst. 23, 544–548. CrossRef IUCr Journals Web of Science Google Scholar
Dodson, E. J. (1988). Crystallographic Computing 4: Techniques and New Technologies, edited by N. W. Isaacs & M. R. Taylor, pp. 80–96. Oxford University Press. Google Scholar
Fujinaga, M. & Read, R. J. (1987). J. Appl. Cryst. 20, 517–521. CrossRef Web of Science IUCr Journals Google Scholar
Glykos, N. M. & Kokkinidis, M. (2000). Acta Cryst. D56, 169–174. Web of Science CrossRef CAS IUCr Journals Google Scholar
Green, E. A. (1979). Acta Cryst. A35, 351–359. CrossRef CAS IUCr Journals Web of Science Google Scholar
Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966–1973. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kissinger, C. R., Gelhaar, D. K. & Fogel, D. B. (1999). Acta Cryst. D55, 484–491. Web of Science CrossRef CAS IUCr Journals Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472–494. Google Scholar
Langs, D. A. (2002). J. Appl. Cryst. 35, 505. Web of Science CrossRef IUCr Journals Google Scholar
Luzzati, V. (1952). Acta Cryst. 5, 802–810. CrossRef IUCr Journals Web of Science Google Scholar
Machin, P. A. (1985). Editor. Proceedings of the Daresbury Study Weekend. Molecular Replacement. Warrington: Daresbury Laboratory. Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
Navaza, J. (1994). Acta Cryst. A50, 157–163. CrossRef CAS Web of Science IUCr Journals Google Scholar
Navaza, J. (2001). Acta Cryst. D57, 1367–1372. Web of Science CrossRef CAS IUCr Journals Google Scholar
Navaza, J. & Vernoslova, E. (1995). Acta Cryst. A51, 445–449. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (2001). Acta Cryst. D57, 1373–1382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Read, R. J. (2003). Crystallogr. Rev. 9, 33–41. CrossRef CAS Google Scholar
Read, R. J. & Schierbeek, A. J. (1988). J. Appl. Cryst. 21, 490–495. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rossmann, M. G. (1972). Editor. The Molecular Replacement Method. New York: Gordon & Breach. Google Scholar
Sheriff, S., Klei, H. E. & Davis, M. E. (1999). J. Appl. Cryst. 32, 98–101. Web of Science CrossRef CAS IUCr Journals Google Scholar
Skarzynski, T. (1992). J. Mol. Biol. 224, 671–683. CrossRef PubMed CAS Web of Science Google Scholar
St Charles, R., Padmanabhan, K., Arni, R. V., Padmanabhan, K. P. & Tulinsky, A. (2000). Protein Sci. 9, 265–272. Web of Science PubMed CAS Google Scholar
Storoni, L. C., McCoy, A. J. & Read, R. J. (2004). Acta Cryst. D60, 432–438. Web of Science CrossRef CAS IUCr Journals Google Scholar
Strynadka, N. C., Jensen, S. E., Alzari, P. M. & James, M. N. (1996). Nature Struct. Biol. 3, 290–297. CrossRef CAS PubMed Web of Science Google Scholar
Tickle, I. J. (1985). Proceedings of the Daresbury Study Weekend. Molecular Replacement, edited by P. A. Machin, pp. 22–26. Warrington: Daresbury Laboratory. Google Scholar
Tickle, I.J. (1992). Proceedings of the Daresbury Study Weekend. Molecular Replacement, edited by W. Wolf, E. J. Dodson & S. Gover, pp. 20–32. Warrington: Daresbury Laboratory. Google Scholar
Vagin, A. & Teplyakov, A. (1997). J. Appl. Cryst. 30, 1022–1025. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wilson, A. J. C. (1949). Acta Cryst. 2, 318–321. CrossRef IUCr Journals Web of Science Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

BIOLOGICAL
CRYSTALLOGRAPHY

ISSN: 1399-0047

Volume 61| Part 4| April 2005| Pages 458-464

doi:10.1107/S0907444905001617