Liking likelihood

McCoy, A.J.

doi:10.1107/S0907444904016038

research papers

BIOLOGICAL
CRYSTALLOGRAPHY

ISSN: 1399-0047

Volume 60| Part 12| December 2004| Pages 2169-2183

doi:10.1107/S0907444904016038

Liking likelihood

Airlie J. McCoy ^a ^*

^aUniversity of Cambridge, Department of Haematology, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 2XY, England
^*Correspondence e-mail: ajm201@cam.ac.uk

(Received 26 January 2004; accepted 30 June 2004)

Maximum-likelihood methods have now been applied to most areas of macromolecular crystallography, including data reduction, molecular replacement, experimental phasing and refinement. However, students of macromolecular crystallography are predominantly taught only traditional crystallographic methods and therefore have little understanding of the methods underlying the modern software that they routinely use in structure determination. This situation arises, at least in part, because maximum likelihood is considered to be too difficult to be taught to students who lack substantial mathematical training within the limited time frame of undergraduate/graduate courses. A method of introducing maximum-likelihood concepts with the help of dice is described here and it is then shown how these concepts can form the core of understanding maximum-likelihood refinement, molecular replacement and experimental phasing. Within the framework described, the crystallographic maximum-likelihood techniques are all reduced to the same basic concepts and become easier and less time-consuming to teach than traditional methods, which rely on disparate concepts.

Keywords: maximum likelihood.

1. Introduction

Maximum likelihood is a branch of statistical inference that asserts that the best hypothesis (i.e. set of parameters, which includes estimates of the errors) on the evidence of the data is the one that explains what has in fact been observed with the highest probability. In the context of macromolecular crystallography, maximum likelihood has come to refer to the set of new statistical methods that improved upon the least-squares methods that preceded them. The least-squares methods were not contrary to the principle of maximum likelihood, since least squares is a special case of maximum likelihood where the errors in the parameters are simple Gaussians, rather than more complex functions. The slow acceptance of maximum likelihood was therefore not because maximum likelihood itself was considered inappropriate, but because least squares works acceptably when the data and model are good and because computers were not capable of performing the more complex calculations required for more sophisticated maximum-likelihood treatments in reasonable times. Maximum likelihood is not the only method for obtaining a set of parameters from experimental data. Indeed, in other fields of application maximum likelihood may not be the best method, as maximum-likelihood estimators can be severely biased. However, maximum likelihood gives little bias when applied in crystallography and it has been extremely successful in supplying better probability models, particularly when the data and/or model are poor, and has been instrumental in the solution of numerous macromolecular structures.

Dice have a long history in the explanation of problems in likelihood, maximum entropy and Bayesian theory (e.g. Jaynes, 1968 , 1979 ; Frieden, 1985 ; Mohammed-Djafari, 2003 ). In this tradition, I present here basic maximum-likelihood concepts using thought experiments with dice. These concepts are then used to explain maximum-likelihood refinement, molecular replacement and experimental phasing.

2. Experiments with dice

There are six important concepts that are needed in order to understand the statistical approach of maximum likelihood in crystallography: maximum likelihood, independence, log-likelihood, Bayes' theorem, integrating out nuisance variables and the central limit theorem. These concepts will be explored with the help of dice with different numbers of sides (Fig. 1).

Figure 1
Photographs of (a) four-sided die, (b) six-sided die, (c) ten-sided die and (d) eight-sided die

2.1. Dice and probability

A game of dice.

I put four unbiased dice in a box: one four-sided, one six-sided, one eight-sided and one ten-sided.
I select a die at random.
How often will you guess correctly which die I selected?

It is obvious that there is a one in four chance of getting the correct answer. If the experiment is performed a large number of times you will guess the answer a quarter of the time, or if a large number of people guess each time a quarter will guess correctly.

2.2. Dice and maximum likelihood

A game of dice with data.

I put four unbiased dice in a box: one four-sided, one six-sided, one eight-sided and one ten-sided.
I select a die at random.
I roll the die and tell you the result of the roll.
Which die was the most likely to be selected?

If I were to roll a 10, it is obvious that the die selected must have been the ten-sided die. Why is it obvious? Because the probability of rolling a 10 from the four-, six- or eight-sided die is zero, but the probability of rolling a 10 from the ten-sided die is non-zero. The probability is written as

$[P({10\semi {\bf 10}}) = \textstyle{1 \over {10}},]$

where the semi-colon means `given' (for a glossary of terms see Table 1) and I have denoted the type of die by its number of sides in bold. The probability of the observed data (the number rolled) given the model (the number of sides of the die) is called the likelihood.

Table 1
Glossary of terms

$[\textstyle \sum_{i = 1}^N {a_i }]$ = a₁ + … + a_N	Sum of all a_i for i between 1 and N
$[\textstyle \prod_{i = 1}^N {a_i }]$ = a₁ × … × a_N	Product of all a_i for i between 1 and N
{a_i} = a₁, …, a_N	Set of all a_i
$[\textstyle \int_a^b]$ f(x)dx	Definite integral of function f(x) between values a and b
I₀(x)	Modified Bessel function of order 0 with argument x
P(A; B)	Probability of A, given B
P(A, B; C)	Probability of A and B, given C
P(A; B, C)	Probability of A, given B and C
F	Structure factor (vector)
F	Structure-factor amplitude
σ² or Σ	Variance of a Gaussian
x ∝ y	x is proportional to y
x ↠ y	x implies y

What would be the case if I rolled a 7? If the same analysis is performed again, the likelihood of rolling a 7 from the four- or six-sided die is 0, but the likelihood of rolling a 7 from the eight-sided die is one in eight and the likelihood of rolling a 7 from the ten-sided die is one in ten. Therefore, it is most likely that the eight-sided die would have been selected. What if I roll a 1? It is most likely that the four-sided die would have been selected. The most likely die is the one with the highest likelihood of generating the data: this is the principle of maximum likelihood.

How confident are you that the die is an eight-sided die if the roll was a 7? Not very, because the difference between the likelihood of rolling a 7 from the eight-sided and ten-sided die is only small. The ratio between two likelihoods is a measure of confidence (known as the likelihood ratio). For example, when I roll a 10, the likelihood ratio agrees that you are supremely confident that I selected a ten-sided die, rather than, say, the eight-sided die,

$[{{P({10\semi {\bf 10}})} \over {P({10\semi {\bf 8}})}} = {{{\textstyle{1 \over {10}}}} \over 0} = \infty.]$

In the case where I roll a 7, the likelihood-ratio is close to 1 (the ratio for equal likelihoods),

$[{{P({7\semi {\bf 8}})} \over {P({7\semi {\bf 10}})}} = {{{\textstyle{1 \over 8}}} \over {{\textstyle{1 \over {10}}}}} = 1.25.]$

2.3. Dice, independence and log-likelihood

A game of dice with more data.

I put four unbiased dice in a box: one four-sided, one six-sided, one eight-sided and one ten-sided.
I select a die at random.
I roll that die three times and tell you the results of the rolls.
Which die did I most likely select?

If I roll a 7 three times, you would expect that I selected an eight-sided die, as the answer should be consistent with the game above when only one roll (of a 7) was made. How is the formal analysis performed? The chance of rolling a 7 three times from the four- or six-sided die is 0, but what is the chance of throwing a 7 three times from an eight-sided or ten-sided die? The chance of throwing a 7, or any other number, the second or third time is not influenced by the value of the first roll. This is the principle of independence. When probabilities are independent, they multiply. If the calculations are performed, the eight-sided die is indeed more likely,

$[\eqalign{ & P({7,7,7\semi {\bf 8}}) = {\textstyle{1 \over 8}} \times {\textstyle{1 \over 8}} \times {\textstyle{1 \over 8}} = {\textstyle{1 \over {512}}}, \cr & P({7,7,7\semi {\bf 10}}) = {\textstyle{1 \over {10}}} \times {\textstyle{1 \over {10}}} \times {\textstyle{1 \over {10}}} = {\textstyle{1 \over {1000}}}.}]$

After obtaining data from three rolls, your confidence that you have guessed the correct die has increased compared with when you only knew the result of one roll, so the likelihood ratio increases,

$[{{P({7,7,7\semi {\bf 8}})} \over {P({7,7,7\semi {\bf 10}})}} = {{{\textstyle{1 \over {512}}}} \over {{\textstyle{1 \over {1000}}}}} = 1.953.]$

What is the probability of rolling a 7 from an eight-sided die one hundred thousand times? (Of course, if you really were to roll 7 one hundred thousand times, you might have some difficulty believing that the die is unbiased. Please continue to assume that it is.) Although the formula for the probability can be written down,

$[P(7 \ldots {\rm one\,\,hundred\,\,thousand\,\,times}\semi {\bf 8}) = {1 \over {8^{100\,000} }},]$

and you could work out the answer and write it down on a (very long) piece of paper,

$[\eqalign {P(7 \ldots {\rm one\,\,hundred\,\,}&{\rm thousand\,\,times}\semi {\bf 8}) \cr & = 0.(\ldots 90308\,\,{\rm zeroes} \ldots)10029997 \ldots,}]$

the number is too small (has too many decimal places) to be stored by a computer. The solution to this computational problem is to calculate log-likelihood rather than the likelihood,

$[\log [P(7 \ldots {\rm one\,\,hundred\,\,thousand\,\,times}\semi {\bf 8})] = - 90\,309.]$

Calculation of the log-likelihood solves the small-number computation problem, but is the switch from using the likelihood allowed? Fortunately it is, because logarithmic functions are monotonic functions [i.e. if a < b then log(a) < log(b)]. This means that the parameter values obtained by optimizing log-likelihood are the same as the parameter values obtained by optimizing the likelihood. In fact, computer algorithms are designed to minimize, so parameters are optimized by minimizing the −log-likelihood. There are also other more theoretical justifications for using the log-likelihood, which come from the statistical field of information theory.

Is there a paradox in that the computer needs to store the likelihood before taking its logarithm? Fortunately not, because there is a shortcut to the log-likelihood when the total likelihood is a product of likelihoods, (i.e. when the likelihoods are independent),

$[\log \left(\textstyle\prod\limits_{i = 1}^N {P_i }\right) = \textstyle\sum\limits_{i = 1}^N {\log \left({P_i } \right)}.]$

In the case where I rolled 7 three times from an eight-sided die, there are thus two ways of calculating the log-likelihood. Using the product method,

$[\eqalign {\log [P(7,7,7\semi {\bf 8})] & = \log \left[\textstyle \prod\limits_{i = 1}^3 P(7\semi {\bf 8})_i \right]\cr & = \log (\textstyle{1 \over 8} \times \textstyle{1 \over 8} \times \textstyle{1 \over 8})\cr & = \log (0.001953) \cr & = - 2.70927. }]$

Using the sum method,

$[\eqalign {\log [P(7,7,7\semi {\bf 8})] & = \textstyle \sum\limits_{i = 1}^3 \log [P(7\semi {\bf 8})_i] \cr & = \log (\textstyle{1 \over 8}) + \log (\textstyle{1 \over 8}) + \log(\textstyle{1 \over 8}) \cr & = - 0.90309 - 0.90309 - 0.90309 \cr & = - 2.70927.}]$

However, the product method required the intermediate of calculating a number close to zero (0.001953), while the sum method did not require any numbers close to zero (the smallest numbers were the independent probabilities themselves, 0.125).

When the log-likelihood is used instead of the likelihood, the log-likelihood gain is calculated instead of the likelihood ratio. The log-likelihood gain is the difference between log-likelihoods [since log(a/b) = log(a) − log(b)]. Whereas for the likelihood ratio more favourable likelihoods are indicated by values greater than 1, for the log-likelihood gain they are indicated by any positive value. The log-likelihood gain for the die being the eight-sided rather than the ten-sided after rolling 7 three times is

$[\eqalign {\log [P(7,7,7\semi {\bf 8})] - \log [P(7,7,7\semi {\bf 10})] &= (- 2.70927) - (- 3) \cr &= 0.29073.}]$

What would happen if the result of previous rolls influenced the result of the subsequent rolls? In this case the data points are not independent, but correlated. Note that correlation is not the same as bias. A biased die would be one that, for example, always rolled a 7, but a correlated die would be one that, for example, always rolled one number higher than the previous roll. Highly correlated data points make the determination of the likelihood difficult, if not impossible, and so the assumption of independence is often applied even when it is not justified. In crystallography, reflections are assumed to be independent, even though to a certain extent they are not. Correlations are introduced by the presence of solvent, which means that the molecular transform is over-sampled, and by non-crystallographic symmetry (if present). However, the correlations are sufficiently weak that the approximation of assuming independence is very good. To calculate the total log-likelihood for all the reflections in a data set (of the order of one hundred thousand), the sum of the log-likelihoods for each reflection is used.

2.4. Dice and Bayes' theorem

A game of dice with multiple copies of a die.

I put one eight-sided die and eight ten-sided dice in a box.
I select a die at random.
I roll the die and tell you the result of the roll.
Which die did I most likely select?

I roll a 4. In this case the probability of selecting the ten-sided die in the first place overwhelms the slightly higher chance of rolling the 4 from the eight-sided die. The chance of selecting the ten-sided die in the first place is included in the probability calculation with Bayes' theorem,

$[P({\rm model \semi data}) = {{P({\rm model})} \over {P({\rm data})}} \times P({\rm data \semi model}).]$

In experimental situations P(data) is constant and when comparing probabilities can be ignored, so Bayes' theorem becomes

$[P({\rm model \semi data}) = P({\rm model}) \times P({\rm data\semi model}).]$

Bayes' theorem is also called the rule of inverse probability since it shows how to turn P(data; model) (e.g. the probability of rolling the 4 from the ten-sided die, which we can calculate) into P(model; data) (e.g. the probability of the ten-sided die given a roll of 4, which is what we want to know). P(model) is the probability of the model without having any data (e.g. the chance of selecting the ten-sided die in the first place). P(model; data) is called the posterior probability, P(data; model) is called the likelihood (as before) and P(model) is called the prior probability. If Bayes' theorem is used to calculate the probability rather that just the likelihood, then the method of optimizing the probability should properly be called the maximum-posterior method, rather than the maximum-likelihood method, but the term `maximum likelihood' is generally used for both. True maximum likelihood can be thought of as a special case of maximum posterior when the prior probability P(model) is constant for all the models. This was the case for the examples in §§2.2 and 2.3 above.

Using Bayes' theorem, the probability that the die was ten-sided given a roll of 4,

$[\eqalign {P({\bf 10} \semi 4) & = P({\bf 10}) \times P(4 \semi {\bf 10}) \cr & = {\textstyle {8 \over 9} } \times {\textstyle {1 \over {10}} } \cr & = 0.0{\bar 8},}]$

is higher than the probability that the die was eight-sided given a roll of 4,

$[\eqalign {P({\bf 8}\semi 4) & = P({\bf 8}) \times P(4\semi {\bf 8}) \cr & = {\textstyle{1 \over 9}} \times {\textstyle{1 \over 8}} \cr & = 0.013{\bar 8},}]$

so a ten-sided die is more likely, as expected.

Bayes' theorem is very useful in crystallography because it enables exploitation of the things that are known about protein structure even before the X-ray data are collected. For example, a carbon–oxygen double bond is known to be 1.23 Å long. So, if the electron density for a structure showed no density 1.23 Å from a particular peptide carbon, but a large piece of density 2 Å away from it, prior knowledge of the carbon–oxygen double-bond length means that it would be extremely unlikely that the density 2 Å away was due to an O atom bound to the carbonyl O atom. It would be more likely that the density 2 Å away from the carbon was due to noise or some other feature of the structure. However, if the O atom had been moved into this density during rebuilding (and the carbon–oxygen bond stretched), a refinement program would use Bayes' theorem to restrain the bond length to 1.23 Å and produce the more likely structure. Bayes' theorem is also used in density modification, where information about solvent content, non-crystallographic symmetry etc. is introduced (Terwilliger, 2000 ; McCoy, 2002 ).

2.5. Dice and integrating out nuisance variables

A game of dice with unknown dice.

I put a six-sided and an eight-sided die in a red box and a four-sided and ten-sided die in a blue box.
I select a die from each of the red and blue boxes at random and put them in a yellow box.
I select a die at random from the yellow box, roll the die and tell you the result.
Did the die most likely come from the red box or the blue box originally?

I roll a 3. The problem here is to calculate the likelihoods P(3; blue box) and P(3; red box) and find the maximum without knowing which die actually produced the roll of 3. Consider P(3; blue box). The blue box could have contained either the four-sided or the ten-sided die. To calculate P(3; blue box), the likelihood of the 3 being rolled from the two possibilities for the contents of the blue box (the four-sided and the ten-sided die) are added,

$[P(3\semi {\rm blue \,\, box}) = P(3,{\bf 4}\semi{\rm blue\,\, box}) + P(3,{\bf 10}\semi {\rm blue \,\, box})]$

The basic probability identity P(A, B) = P(B; A) × P(A) [which can also have any number of conditions added, so P(A, B; C) = P(B; A, C) × P(A; C), for example] can be used to write

$[\eqalign {P(3 \semi {\rm blue\,\, box}) = &\,\, P(3\semi {\bf 4}, {\rm blue \,\, box}) \times P({\bf 4} \semi {\rm blue \,\, box}) \cr &+ P(3\semi {\bf 10}, {\rm blue \,\, box}) \times P({\bf 10} \semi {\rm blue \,\, box}). }]$

Substituting in values for these probabilities,

$[P(3 \semi {\rm blue \,\, box}) = ({\textstyle {1 \over 4}} \times {\textstyle {1 \over 2}}) ({\textstyle{1 \over 10}} \times {\textstyle{1 \over 2}}) = 0.175.]$

Likewise, P(3; red box) is the likelihood of the 3 being rolled and the die being six-sided plus the likelihood of the 3 being rolled and the die being eight-sided,

$[\eqalign {P(3\semi {\rm red\,\, box}) & = P(3, {\bf 6}\semi {\rm red \,\, box}) + P(3,{\bf 8}\semi{\rm red \,\, box}) \cr & = P(3\semi {\bf 6},{\rm red \,\, box}) \times P({\bf 6}\semi {\rm red \,\, box}) \cr &\ \quad +\ P(3\semi {\bf 8},{\rm red \,\, box}) \times P({\bf 8}\semi {\rm red \,\, box}) \cr & = ({\textstyle{ 1 \over 6}} \times{\textstyle{1 \over 2}}) +({\textstyle{ 1 \over 8}}\times {\textstyle{ 1 \over 2}}) \cr & = 0.1458{\bar 3} }]$

So, it is slightly more likely that the die came from the blue box if I roll a 3.

The likelihood for the red and blue boxes has been calculated even though which die actually produced the roll of 3 was not known. Summing the probabilities for all the possibilities for the die solves the problem of not knowing which die actually produced the roll. In general, if the unknown variable (call it x) of the model can take n values between a and b, then

$[P({\rm data \semi model}) = \textstyle \sum\limits_{i = 1}^n P({\rm data}, x_i \semi {\rm model}),]$

where a < x_i ≤ b.

The probability distribution for the dice is for discrete variables, because it is only defined for certain values (the dice must have an integer number of sides). In crystallography, the probability distributions are for continuous variables, meaning that they are defined for all values (an infinite number) over an interval (for example, an atom can be anywhere and an occupancy can be anywhere between 0 and 1). When the probability distribution is continuous, the sum in the equation for the discrete probability distribution becomes an integral, because an integral can be thought of as a sum of an infinite number of infinitesimally small numbers. If the unknown variable x can take values between a and b, then

$[P({\rm data \semi model}) = \textstyle \int \limits_a^b P({\rm data}, x \semi {\rm model})\,\,{\rm d} x.]$

The unknown variable x is called a nuisance variable. The removal of a nuisance variable from a probability distribution by integration is called integrate out (or marginalization of) the nuisance variable. Although termed `nuisance', these variables can be very useful in probability distributions. It may be easier to describe a probability function using an extra variable (such as the phase of the observed structure factor) and then to integrating it out at the end of the analysis than to attempt to develop a probability function without ever referring to the extra variable.

2.6. Dice and the central limit theorem

A game of dice taking the average of many rolls of the dice.

I have a six-sided die.
I roll the die 40 times and add up the values of the rolls, then divide the sum by 40.
I do this 10 000 times, plotting the final average value from each game on a histogram.
What form does the histogram take?

The histogram is Gaussian (bell-shaped curve, see Fig. 2), with a maximum at 3.5 (see Fig. 3). Now I play the game again with a biased six-sided die: the die is biased so that it will roll its values with a probability linearly proportional to the value, i.e. a 2 is twice as likely as a 1, a 3 is three times as likely as a 1 etc. Again, the histogram looks like a Gaussian. The only difference is that the mean of the distribution is shifted to about 4.3 and the variance of the distribution is smaller (see Fig. 2 for an explanation of the mean and variance of a Gaussian). Now I play the game again with a six-sided die that is biased so that it will roll its values with a probability proportional to the square of the value i.e. a 2 is four times more likely than a 1, a 3 is nine times more likely than a 1 etc. Again, the histogram looks like a Gaussian (with an even higher mean and smaller variance). For most types of bias of the die, the histogram generated by the game of dice is Gaussian, even when the bias of the die (from which the average is computed) is decidedly non-Gaussian. This property is called the central limit theorem. The central limit theorem is possibly the most important theorem in probability. In crystallography, the central limit theorem allows us to describe the errors in the structure factors (in reciprocal space) that arise from errors in the atomic model (in real space). It says that even though the errors in an individual atom's contribution to the total structure factor may be very complicated, in the end the error for the total structure factor (the sum of the atomic structure-factor contributions) is a simple two-dimensional Gaussian in reciprocal space.

Figure 2
(a) One-dimensional Gaussian {1/[(2π)^1/2σ]}exp[−(x − μ)²/2σ²] and (b) radially symmetric two-dimensional Gaussian [1/(2πσ²)]exp[−(x − μ|²)/2σ²]. The mean is μ (one-dimensional)/μ (two-dimensional). The standard deviation is σ, which is half-width at ∼61% of the peak height. The variance is σ². The full-width half-maximum (FWHM) = 2.35σ. The area (one-dimensional)/volume (two-dimensional) under the curve is 1.

Figure 3
Central limit theorem. On the left are the probability distributions showing the bias of the six-sided die (shown with dots) and on the right the histogram of 10 000 trials of the average of 40 rolls of the die. If the probability distribution is continuous rather than discrete (shown on the right with a line connecting the dots), the distribution of the average is also continuous (shown with Gaussian function over the histogram). (a) Unbiased die: the distribution of the average is a Gaussian with μ = 3.5 and σ = 0.27. (b) Linearly biased die: the distribution of the average is a Gaussian with μ = 4.3 and σ = 0.24. (c) Quadratically biased die: the distribution of the average is a Gaussian with μ = 4.8 and σ = 0.19.

2.7. Dice summary

Maximum likelihood: the best model is the one that maximizes the probability of observing the experimental data.

Independence: probabilities multiply when the experimental data points are independent, i.e. all observations are independent of one another.

Log-likelihood: the log-likelihood is used instead of the likelihood as it has its maximum at the same parameter values as the likelihood but it is safer to calculate on a computer because the numerical range is smaller.

Bayes' theorem: P(model; data) = P(model) × P(data; model), where P(data; model) is called the likelihood and P(model) is called the prior probability.

Integrating out variables: nuisance variables in a joint probability distribution can be eliminated by integration.

Central limit theorem: the distribution of the average tends to be Gaussian, even when the distribution from which the average is computed is decidedly non-Gaussian.

3. Maximum likelihood in macromolecular crystallography

There are many ways of applying maximum likelihood to crystallography. Ideally, all the information from chemistry and the diffraction experiment should be included to create the `mother of all likelihood functions'. Although the chemical and diffraction information that should contribute to this likelihood function is known, there are too many correlations between the contributions to allow a practical precise formula to be written down. This is rather unfortunate, because there is enough information in the chemistry and the diffraction experiment to solve the phase problem ab initio (cf. direct methods; Bricogne, 1993 ). Instead, simplifications and approximations are made to allow maximum likelihood to be applied to specific areas of crystallography such as refinement, molecular replacement and experimental phasing.

4. Refinement

The Bayesian view of crystallographic refinement is that the prior probability comes from chemistry (a great deal is known about what molecules look like even before the experiment) and the likelihood comes from the X-ray diffraction experiment (Pannu & Read, 1996 ; Bricogne & Irwin, 1996 ; Murshudov et al., 1997 ). The probability function for refinement (here called P-refinement) is thus, by Bayes' theorem (see §2.4), the product of the prior probability (here called P-chemistry) and the likelihood (here called P-Xray),

$[\eqalign{ & P({\rm model\semi data}) = P({\rm model}) \times P({\rm data \semi model}) \cr & \Rightarrow P {\hbox {-}}{\rm refinement} = P{\hbox {-}}{\rm chemistry} \times P{\hbox {-}}{\rm Xray}.}]$

The chemical probabilities for all the different chemical interactions in the structure are taken to be independent (see §2.3), so that P-chemistry is the product of these individual chemical interaction probabilities P-chemistry_i. This is not a very good approximation, as the bond lengths and angles are correlated with each other; the problems that this approximation causes are discussed in §4.4. If the number of interactions is I,

$[P{\hbox {-}}{\rm chemistry} = \textstyle \prod\limits_{i = 1}^I P{\hbox {-}}{\rm chemistry}_i.]$

It is also assumed that reflections are independent, so that P-Xray is the product of the reflection likelihoods (P-Xray_r). This is a good approximation (see §2.3). If the number of reflections is R,

$[P{\hbox {-}}{\rm Xray} = \textstyle \prod\limits_{r = 1}^R P{\hbox {-}}{\rm Xray}_r.]$

Since there are hundreds of thousands of interactions and hundreds of thousands of reflections, the log-likelihood is calculated rather than the likelihood (see §2.3). To optimize the model parameters (atomic positions, occupancies and B factors), the −log-likelihood is minimized,

$[- \log P{\hbox {-}}{\rm refinement} = - \textstyle \sum\limits_{i = 1}^I \log P{\hbox {-}}{\rm chemistry}_i - \textstyle \sum\limits_{r = 1}^R \log P{\hbox {-}}{\rm Xray}_r.]$

4.1. Prior probability

Macromolecules obey the same chemical rules as small organic molecules and so ideal bond lengths and angles for macromolecules can be derived from the results of small-molecule crystallography (Engh & Huber, 1991 ). The bond lengths and angles in the crystal are restrained to these ideal values using a probability distribution. For example, the prior probability for having a bond of length b is given by a Gaussian about the ideal length b_ideal for the bond type (see Fig. 2 for the equation for a Gaussian),

$[P{\hbox {-}}{\rm chemistry}_{\rm bond} = {1 \over {(2\pi)^{1/2} \sigma _b }}\exp \left [-{{(b - b_{\rm ideal})^2 } \over {2\sigma _b^2 }}\right]. ]$

Here, σ_b reflects the distribution of a particular bond type about its mean; e.g. a C—C bond has an ideal length of 1.54 Å and a σ_b of about 0.02 Å. There are similar equations for the other types of chemical interaction restraints.

4.2. Refinement likelihood

The likelihood for a reflection (P-X-ray_r) is the probability of the data (i.e. the observed structure-factor amplitude F_o) given the current model. The model is in real space and the X-ray observed data are in reciprocal space, so in order to calculate the likelihood, the model (in real space) must be used to calculate structure factors (in reciprocal space). The structure factor for the whole unit cell (F_c) is calculated as follows: first the structure factor for the model in the asymmetric unit (F_m) is calculated from the sum of the structure factors of the atoms in that model (F_atom). Then, F_m and all its symmetry relatives are added to obtain the total structure factor F_c (see Fig. 4; the importance of the symmetry relatives will become apparent in the explanation of the rotation-function likelihood below). However, the data for a given reflection is the observed structure-factor amplitude F_o, so in order to compare like with like the model must be the calculated structure-factor amplitude F_c,

$[P{\hbox {-}}{\rm Xray}_r = P({\rm data \semi model}) = P(F_{\rm o}; F_{\rm c}).]$

Without considering errors, if F_c matches F_o the probability is 1 and if it does not match the probability is 0 (the model is either `correct' or `incorrect'). However, if errors in the model and the data are considered, then F_c and F_o are allowed to differ somewhat and the likelihood function should give a non-zero probability when F_c and F_o are close (the closer the better). It is easier to model the errors in terms of the phased structure factors F_c and F_o with the phase between them defined as α, rather than in terms of the structure-factor amplitudes alone. The introduced variable, the phase α, is a nuisance variable (a case where a nuisance variable is very useful) and must be integrated out of the probability distribution at the end of the analysis (see §2.5). The integration limits are 0–2π, i.e. all angles,

$[P{\hbox {-}} {\rm Xray}_r = \textstyle \int\limits_0^{2\pi } P(F_{\rm o}, \alpha \semi F_{\rm c})\, {\rm d}\alpha.]$

The errors in the model arise from Gaussian errors in the atomic positions and atomic scattering. Gaussian errors in the atomic positions and scattering give rise to Gaussian errors in the phases and amplitudes of the corresponding atomic structure-factor contributions, respectively (see Fig. 5). When these atomic structure-factor contributions and their errors are summed to give the total structure factor and its error for a given reflection, by the central limit theorem (see §2.6) the resulting distribution is a two-dimensional Gaussian (see Fig. 2) in reciprocal space centred on DF_c (where D is between 0 and 1) with variance termed $[\sigma_{\Delta}^{2}]$ (see Fig. 6),

$[P({\bf F}_{\rm o} \semi {\bf F}_{\rm c}) = {1 \over {\pi \sigma _\Delta ^2 }}\exp \left (- {{|{\bf F}_{\rm o}- D{\bf F}_{\rm c}|^2 } \over {\sigma _\Delta ^2 }}\right).]$

Using this probability and the integral above, it can be shown (Appendix A) that the likelihood function is a Rice distribution (Sim, 1959 ; Read, 1990 ),

$[P{\hbox {-}}{\rm Xray}_r = {{2F_{\rm o}} \over {\sigma _\Delta ^2 }}\exp\left (- {{F_{\rm o}^2 + D^2 F_{\rm c}^2 } \over {\sigma _\Delta ^2 }}\right) I_0 \left({{{2F_{\rm o} DF_{\rm c}} \over {\sigma _\Delta ^2 }}} \right),]$

where I₀ is the modified Bessel function of order 0. The Rice distribution is the key distribution for maximum likelihood in crystallography and it will appear over and over again in the equations below. It applies to acentric reflections (those for which the phase is not restricted) and, for simplicity, the discussions below will only concern acentric structure factors (and assume the expected intensity factor, generally denoted ∊, to be 1). For a full explanation of the derivation of the Rice function, see Appendix A. Centric structure factors (those where the phase is restricted to 0 or 180°) are treated similarly to give a different likelihood function (see Appendix B).

Figure 4
Total structure factor for all scattering in unit cell. (a) The unit cell contains six symmetry-related molecules. (b) The total structure factor (F_c) for a reflection is built up by adding the structure-factor contributions from the atoms in a molecule (F_atom) to give the structure factor for the molecules (F_m) and then adding all the symmetry-related structure-factor contributions for all the molecules in the unit cell.

Figure 5
Errors for an atomic structure factor. (a) An atom has variation in position (indicated by purple arrow) and in scattering (indicated in green concentric circles). (b) The variation in the atom's position and scattering are Gaussian. (c) The atom at its mean position with its mean scattering has a structure factor F_atom (shown with a black vector). Variation in the atom's position corresponds to variation in the phase of F_atom (shown with a purple arrow) and variation in the scattering corresponds to variation in the length of F_atom (shown with a green arrow). (d) The distributions of the structure factors owing to variation in the atom's position and scattering combine to give a boomerang-shaped structure-factor distribution (indicated with black contours). Since the distribution of structure factors is symmetric about F_atom, the average structure factor is shorter than F_atom (by a fraction d, where 0 < d < 1) but in the same direction as F_atom (dF_atom).

Figure 6
Errors in the total structure factor. (a) The unit cell contains six symmetry-related molecules. The atoms have errors in their positions and scattering, indicated by the arrows and concentric circles, respectively. (b) By the central limit, the probability distribution for the sum of the structure-factor contributions from all the atoms is a two-dimensional Gaussian in reciprocal space, centred at DF_c, where 0 < D < 1, shown with grey shading. The structure-factor contributions from atoms and molecules as in Fig. 4

are shown in pink.

There are also experimental errors (σ_F) in the measurements. Experimental error is accounted for by widening the probability distribution, a method that is termed inflating the variance (Green, 1979 ; Murshudov et al., 1997; de La Fortelle & Bricogne, 1997 ). The likelihood function used for refinement is therefore given by

$[P{\hbox {-}}{\rm Xray}_r = {{2F_{\rm o}} \over {\sigma _\Delta ^2 + \sigma _F^2 }}\exp\left (- {{F_O^2 + D^2 F_C^2 } \over {\sigma _\Delta ^2 + \sigma _F^2 }}\right)I_0 \left({{{2F_O DF_C } \over {\sigma _\Delta ^2 + \sigma _F^2 }}} \right). \eqno (1)]$

4.3. Sigma A

D and σ_Δ are anticorrelated: if the model is very bad and therefore if $[\sigma_{\Delta}^{2}]$ is large then D will be small and vice versa. If E values (normalized structure factors) are used rather than F values, D and σ_Δ can be replaced with a single parameter σ_A (Srinivasan & Ramachandran, 1965 ), with DF_c = σ_AE_c and $[\sigma_{\Delta}^{2}]$ = 1 − $[\sigma_{A}^{2}]$ ,

$[\eqalign{ P&{\hbox {-}}{\rm Xray}_r = P(E_{\rm o}\semi E_{\rm c})\cr & = {{2E_{\rm o} } \over {{1 - \sigma _{\rm A}^2 } + \sigma _E^2 }} \exp \left (- {{E_{\rm o}^2 + \sigma _{\rm A}^2 E_{\rm c}^2 } \over {{1 - \sigma _{\rm A}^2 } + \sigma _E^2 }}\right) I_0 \left ({{2E_{\rm o} \sigma _{\rm A} E_{\rm c} } \over {1 - \sigma _{\rm A}^2 + \sigma _E^2 }} \right), }]$

where σ_E is the normalized structure-factor experimental error. The probability distributions are very sensitive to the estimates of σ_A, and σ_A is refined along with the atomic parameters in structure refinement. Unfortunately, if the same data are used to refine σ_A and the atomic parameters, the data are severely overfitted and σ_A is overestimated. This problem is partially avoided by estimating σ_A from the data that are used to compute R_free (which are excluded from the refinement).

4.4. Weighting

In principle, if all errors are estimated properly there is no need to apply a weighting between the prior probability (P-chemistry) and likelihood (P-Xray) to calculate P-refinement using Bayes' theorem, but in practice it is necessary to overweight the likelihood (P-Xray) for refinement to converge. This is partly because the probabilities used are only approximate (particularly for the chemistry terms, where the correlations between bond lengths and angles are not taken into account) and partly because the refinement algorithm does not account for the fact that improvements in the model will sharpen the experimental likelihood function (because the model and the σ_A values are refined against different subsets of the data). As the resolution becomes higher and the model becomes better, the amount of over-weighting required is reduced.

4.5. Experimental phases

Experimental phasing information can be incorporated into the refinement likelihood function as a prior probability when integrating out the phase (Pannu et al., 1998 ). The prior probability can be modelled using Hendrickson–Lattman coefficients (Hendrickson & Lattman, 1970 ).

4.6. Probabilities and energies

Some refinement programs minimize energy rather than the −log-likelihood. In fact, the two targets of refinement are equivalent. If the experiment is considered as a physical system with energy, Boltzmann's law gives the probability P of observing a state in the physical system with energy E,

$[P \propto \exp(- E/kT),]$

where k is Boltzmann's constant and T is the temperature. Taking the logarithm of Boltzman's law, the energy is proportional to the logarithm of the probability,

$[E \propto - kT\ln P.]$

Boltzman's law in logarithm form leads to harmonic bond restraints,

$[\eqalign{ & P{\hbox {-}}{\rm chemistry}_{\rm bond} \propto \exp \left [- {{(b - b_{\rm ideal})^2 } \over {2\sigma _b^2 }}\right] \cr & \Rightarrow E{\hbox {-}}{\rm chemistry}_{\rm bond} \propto - kT{{(b - b_{\rm ideal})} \over {2\sigma _b^2 }}^2}.]$

Boltzman's law in this logarithm form also allows Bayes' theorem (in terms of probabilities) to be expressed in terms of energies

$[\eqalign{ & P{\hbox {-}}{\rm refinement} = P{\hbox {-}}{\rm chemistry} \times P{\hbox {-}}{\rm Xray} \cr & \Rightarrow E{\hbox {-}}{\rm refinement} = E{\hbox {-}}{\rm chemistry} + E{\hbox {-}}{\rm Xray}.}]$

5. Molecular replacement

Maximum-likelihood molecular replacement (Bricogne, 1992 ; Read, 2001 , 2003b ) can be divided into a rotation function (RF) followed by a translation function (TF) in the same way that traditional molecular-replacement methods are. Each type of search is a `brute-force' search procedure. The likelihood for the models is generated on a grid of angles (RF) or positions (TF) and the angle (RF) or position (TF) of the model that has the highest likelihood is selected as the `solution' to the molecular-replacement problem. Currently, prior information (such as packing constraints and non-crystallographic symmetry) is not included in maximum-likelihood molecular replacement and so Bayes' theorem (see §2.4) is not used. Reflections are assumed to be independent, so that the likelihood for the rotation function (here called P-RF) and the likelihood for the translation function (here called P-TF) is the product of the reflection likelihoods (see §2.3). If the number of reflections is R,

$[P{\hbox {-}}{\rm RF} = \textstyle \prod\limits_{r = 1}^R P{\hbox {-}}{\rm RF}_r]$

and

$[P{\hbox {-}}{\rm TF} = \textstyle \prod\limits_{r = 1}^R P{\hbox {-}}{\rm TF}_r.]$

In practice, the −log-likelihood is used as the target for the molecular-replacement searches,

$[-\log P{\hbox {-}}{\rm RF} = - \textstyle \sum\limits_{r = 1}^R \log P{\hbox {-}}{\rm RF}_r]$

and

$[-\log P{\hbox {-}}{\rm TF} = - \textstyle \sum\limits_{r = 1}^R \log P{\hbox {-}}{\rm TF}_r.]$

5.1. Translation-function likelihood

The data are the observed structure-factor amplitudes (F_o) and the model is the molecular-replacement structure oriented and positioned at the search point. This is exactly the same situation as for refinement: the approximate locations of all the atoms are known and a structure-factor amplitude F_c can be calculated from the scattering in the unit cell. The translation function target is therefore the same Rice function as the target for maximum-likelihood structure refinement. The only difference is that the errors will be much larger for the translation function than for refinement (D will be smaller and σ_Δ larger). The same function is also suitable for a brute-force six-dimensional (orientation and position) search,

$[\eqalign {P{\hbox {-}}{\rm TF}_r = &\ P{\hbox {-}}{\rm Xray}_r \cr =&\ {{2F_{\rm o} } \over {\sigma _\Delta ^2 + \sigma _F^2 }}\exp \left (- {{F_{\rm o}^2 + D^2 F_{\rm c}^2 } \over {\sigma _\Delta ^2 + \sigma _F^2 }}\right) I_0 \left({{{2F_{\rm o} DF_{\rm c} } \over {\sigma _\Delta ^2 + \sigma _F^2 }}} \right).}]$

5.2. Rotation-function likelihood

At each rotation-function search orientation, the model consists of the molecular-replacement model with defined orientation but undefined position. Undefined positions in real space correspond to undefined phases of the structure-factor contributions in reciprocal space. Thus, F_c cannot be calculated from the sum of the phased structure-factor contributions as it was for the case of refinement and the translation function. However, because the relative positions of the atoms in the model are known, the atomic structure-factor contributions (F_atom) for the model can be added up with relative phases to calculate F_m, i.e. the amplitude but not the phase of the model structure-factor contribution. This can also be performed for all the symmetry relatives of the model in order to obtain the set of amplitudes of the model structure-factor contributions, {F_m}_sym. The symmetry relatives have different amplitudes because as the model rotates its strength of scattering in any given direction changes. Since these model structure-factor contributions are unphased, they cannot be added together to obtain the structure factor for the scattering from the whole unit cell, F_c. The model in reciprocal space for the rotation function is therefore not F_c, but the set of amplitudes of the model structure-factor contributions, {F_m}_sym,

$[P{\hbox {-}}{\rm RF}_r = P({\rm data \semi model}) = P(F_{\rm o}\semi \{ F_{\rm m} \} _{\rm sym}).]$

It is easiest to generate a function for this probability by introducing a (useful) nuisance phase variable, the phase α between the observed structure factor F_o and one of the F_m. It is best to select the symmetry relative of F_m with the largest amplitude, F_big (the reason is given shortly). The symmetry operator that gives rise to the largest F_m will be different for each reflection, so F_big corresponds to a different symmetry operator for each reflection. The set of symmetry relatives of F_m is thus split into the set not including F_big, {F_m}_sym≠big, which are left unphased, and F_big, which is given the phase α relative to F_o. The introduced nuisance phase α must be integrated out of the probability distribution at the end of the analysis (see §2.6),

$[P{\hbox {-}}{\rm RF}_r = \textstyle \int\limits_0^{2\pi } P(F_{\rm o},\alpha \semi\{ F_{\rm m} \} _{\rm sym \ne big}, F_{\rm big})\,\,{\rm d}\alpha.]$

The probability distribution for {F_m}_sym≠big comes from a `random walk' (Fig. 7) in reciprocal space. Fixing the phase of the largest of the symmetry relatives of F_m results in the narrowest probability distribution for the `random walk' and this is why the largest F_m was chosen to have the phase α relative to F_o. Errors in the model must also be accounted for in the probability distribution, just as they were for refinement. Using the same reasoning that applied for developing the refinement target (see Fig. 6), errors in the model mean that all symmetry relatives of F_m (including F_big) are down-weighted by a D-factor (0 ≤ D ≤ 1). The probability distribution is thus a two-dimensional Gaussian centred on DF_big with variance Σ_S dependent on {DF_m}_sym≠big (see Fig. 8),

$[P({\bf F}_{\rm o}\semi \{ F_{\rm m} \} _{\rm sym \ne big}, {\bf F}_{\rm big}) = {1 \over {\pi \Sigma _S }}\exp\left (- {{|{\bf F}_{\rm o}- D{\bf F}_{\rm big} |^2 } \over {\Sigma _S }}\right).]$

Using this probability and the integral above, it can be shown (Appendix A) that the likelihood function is another Rice distribution,

$[P{\hbox {-}}{\rm RF}_r = {{2F_{\rm o}} \over {\Sigma _S }}\exp\left (- {{F_{\rm o}^2 + D^2 F_{\rm big}^2 } \over {\Sigma _S }}\right) I_0 \left({{{2F_{\rm o} DF_{\rm big} } \over {\Sigma _S }}} \right).]$

Experimental errors (σ_F) are incorporated by inflating the variance of the distribution, as was the case for the refinement likelihood function,

$[P{\hbox {-}}{\rm RF}_r = {{2F_{\rm o} } \over {\Sigma _S + \sigma _F^2 }}\exp\left (- {{F_{\rm o}^2 + D^2 F_{\rm big}^2 } \over {\Sigma _S + \sigma _F^2 }}\right) I_0 \left({{{2F_O DF_{\rm big} } \over {\Sigma _S + \sigma _F^2 }}} \right). \eqno (2)]$

Notice the similarities between this equation and the equation for P-Xray_r [and P-TF_r; (1)]. The only differences are that F_big replaces F_c and Σ_S replaces $[\sigma_{\Delta}^{2}]$ . The latter difference shows an unfortunate inconsistency in the notation for variances that has arisen in crystallography. Sometimes the variance is shown as the square of the standard deviation, with the standard deviation written with a lower case Greek sigma (e.g. $[\sigma_{\Delta}^{2}]$ ), and sometimes the variance is shown as a single parameter, written with an upper case Greek sigma (e.g. Σ_S). The differences in the equations can be traced back to differences in the position of the centre and difference in the width of the two-dimensional Gaussian in reciprocal space that gave rise to the Rice distribution.

Figure 7
Random walk. (a) Starting at the origin, a walker takes N (in this case, five) steps, each step in a random direction. Each step is the same length and independent of the previous one. Because of the many random choices, the final position of the walker varies each time. Four final positions are shown (marked ×). Some final positions are more likely than others. (b) The probability that the walker will be at a given final position after N steps is proportional to a two-dimensional Gaussian, shown with grey shading.

Figure 8
Rotation-function likelihood. (a) The unit cell contains six symmetry-related molecules. For a given orientation of the search, the orientation but not the position of the six molecules is defined. Therefore, the amplitudes but not the phases of the six corresponding structure-factor contributions are defined. The atoms in the molecules have errors in their positions and scattering, indicated by the arrows and concentric circles, respectively. The molecule in the orientation giving the largest scattering is shown in magenta. (b) The largest model structure-factor contribution, F_big, is given a phase α relative to the observed structure factor F_o, with the other model structure-factor contributions making a `random walk' around the end of this one phased structure-factor contribution. (c) The resulting probability distribution for F_o is a two-dimensional Gaussian centred on DF_big, shown with grey shading.

5.3. Partial structure

Maximum-likelihood molecular replacement allows incorporation of any information about the structure already determined, e.g. known orientation and position of partial structure, known orientation of partial structure only and any combination thereof. Any partial structure information makes the probability distribution more exacting (reduces the variance) and improves the signal of the search.

5.4. Fast searches

Maximum-likelihood brute-force rotation and translation searches are very slow to compute. However, there are approximations to the full search targets that can be calculated with fast Fourier transforms and are therefore much faster. The fast rotation search is calculated with a series of two-dimensional fast Fourier transforms, while the fast translation search is calculated with one three-dimensional fast Fourier transform. These likelihood-enhanced fast rotation and translation searches can be generated by a Taylor series expansion of the full likelihood targets (Storoni et al., 2004 ).

6. Experimental phasing

There are many forms of experimental phasing, including MIR (multiple-wavelength isomorphous replacement), MIRAS (multiple-wavelength isomorphous replacement with anomalous scattering), MAD (multiple-wavelength anomalous dispersion) and SAD (single-wavelength anomalous dispersion). They all have different types of data and types of models and so require different types of likelihood functions (Bricogne, 1991 ; Read, 1991 , 1994 ; de La Fortelle & Bricogne, 1997). Prior information is not included in maximum-likelihood experimental phasing and so Bayes' theorem is not used (see §2.4). Reflections are assumed to be independent, so that the total likelihood is the product of reflection likelihoods (see §2.3). If the number of reflections is R, then for example in the case of MIR the likelihood (here called P-MIR) is given by

$[P{\hbox {-}}{\rm MIR} = \textstyle \prod\limits_{r = 1}^R P{\hbox {-}}{\rm MIR}_r.]$

In practice, the −log-likelihood is used,

$[- \log P{\hbox {-}}{\rm MIR} = -\textstyle \sum\limits_{r = 1}^R \log P{\hbox {-}}{\rm MIR}_r.]$

Similarly, the MIRAS, MAD and SAD likelihoods are the products of their respective reflection likelihoods. The heavy-atom sites must have been found using a Patterson, direct-methods or dual-space method before invoking maximum-likelihood phasing. The heavy-atom sites (in real space) are then used to calculate the model for maximum likelihood, the heavy-atom structure factors F_H (in reciprocal space).

6.1. MIR likelihood

In MIR, the data are the observed native and derivative structure-factor amplitudes. Unfortunately, there are significant correlations in the data because all data sets share the scattering from the native protein component, i.e. if a reflection is strong/weak in the native then it is likely to be strong/weak in all the derivative data sets as well. To simplify the analysis a (useful) nuisance variable is introduced, the `true' structure factor F_T, which is the component of scattering shared by the native and derivatives (Read, 2003a ) and can be thought of as the scattering from a `true' crystal. With the introduction of F_T in maximum-likelihood MIR there is nothing `special' about the native data set. The native is treated in exactly the same way as the derivatives: the native is simply a derivative without heavy atoms. In the nomenclature used here, the observed native and derivative structure factors are all denoted F_o. Elsewhere, F_o is often written as F_P, denoting that it contains native protein only, or F_PH, denoting that it contains native protein and heavy atoms. The data, the set of all `native' and derivative observed structure-factor amplitudes, is denoted {F_oj}, and the model, the set of all calculated heavy-atom structure factors, is denoted {F_Hj}, with the derivative number denoted by the subscript j. The introduced (useful) nuisance variable F_T must be integrated out of the probability distribution at the end of the analysis (see §2.6). Since F_T is a vector, integrating out the parameter requires integrating over the whole complex plane (a double integral, with real and imaginary components integrated from +∞ to −∞). The MIR likelihood function for a reflection is therefore given by

$[\eqalignno {P{\hbox {-}} {\rm MIR}_r & = P({\rm data \semi model}) = P (\{ F_{{\rm o}j}\}\semi \{{\bf F}_{{\rm H}j}\}) \cr & = \textstyle \int\limits_{-\infty}^{+\infty} \int\limits_{-\infty}^{+\infty} P(\{F_{{\rm o}j}\},{\bf F}_{\rm T} \semi \{ {\bf F}_{{\rm H}j} \})\,\,{\rm d}{\bf F}_{\rm T}. & (3)}]$

The reason for introducing the nuisance variable F_T is that by explicitly including the correlated component of the scattering between all the data, the `leftover' parts of the scattering can be considered to be independent. Therefore, the probabilities for each derivative F_oj given its F_Hj and F_T are (approximately) independent and can be multiplied to give the joint conditional probability (see §2.3),

$[P(\{F_{{\rm o}j}\}\semi \{ {\bf F}_{{\rm H}j}\},{\bf F}_{\rm T}) = \textstyle \prod\limits_{j = 1}^N P(F_{{\rm o}j} \semi {\bf F}_{{\rm H}j}, {\bf F}_{\rm T}). \eqno (4)]$

However, this is an expression for the probability of {F_oj} given {F_Hj} and F_T, not for the probability of {F_oj} and F_T given {F_Hj}, which is what is required for integrating out F_T (3). The relationship between the two probabilities is given by P(B, A; C) = P(A; C) × P(B; C, A). Taking F_T ≡ A, {F_oj} ≡ B and {F_Hj} ≡ C,

$[P(\{F_{{\rm o}j}\},{\bf F}_{\rm T} \semi\{{\bf F}_{{\rm H}j}\}) = P({\bf F}_{\rm T} \semi\{ {\bf F}_{{\rm H}j}\}) \times P(\{F_{{\rm o}j}\}\semi \{ {\bf F}_{{\rm H}j}\},{\bf F}_{\rm T}).]$

If the `true crystal' lacks atoms at the heavy-atom positions of the derivative, then P(F_T; {F_Hj}) is the same as P(F_T), i.e. {F_Hj} is irrelevant. P(F_T) is then the probability of F_T when all that is known is the number and type of atoms in the `true crystal' e.g. the number of C, N, O and S atoms if the `true crystal' contains protein only (Wilson, 1949 ). The probability distribution given by this information is relatively flat and can be ignored (Read, 1991). [However, if F_T does have atoms coincident with the heavy-atom positions, it should be included (Read, 2003a).]

$[P(\{F_{{\rm o}j}\},{\bf F}_{\rm T} \semi\{{\bf F}_{{\rm H}j}\}) = P(\{F_{{\rm o}j}\}\semi \{ {\bf F}_{{\rm H}j}\},{\bf F}_{\rm T}). \eqno (5)]$

Combining (4) and (5),

$[P(\{ F_{{\rm o}j}\},{\bf F}_{\rm T} \semi \{{\bf F}_{{\rm H}j}\}) = \textstyle \prod\limits_{j = 1}^N P(F_{{\rm o}j} \semi {\bf F}_{{\rm H}j}, {\bf F}_{\rm T}). \eqno (6)]$

Substituting (6) into the integral (3),

$[P{\hbox {-}}{\rm MIR}_r = \textstyle \int\limits_{-\infty}^{+\infty} \textstyle \int\limits_{-\infty}^{+\infty } \textstyle\prod\limits_{j = 1}^N P(F_{{\rm o}j} \semi {\bf F}_{{\rm H}j}, {\bf F}_{\rm T})\,\,{\rm d} {\bf F}_{\rm T}. \eqno (7)]$

P(F_oj; F_Hj,F_T) is the probability of the observed structure-factor amplitude F_oj given F_H and F_T for derivative j; to calculate this probability, F_Hj and F_T must be used to calculate the structure-factor amplitude F_cj that can be compared with F_oj. The calculated structure factor (phased) F_cj is the sum of the heavy-atom and protein structure factors (phased) for the derivative. If the heavy-atom model is perfect (and thus F_Hj is perfect) and the protein component of the derivative is identical (isomorphous) to F_T, then the calculated structure factor F_cj is simply given by the sum of F_Hj and F_T. However, F_Hj will not be perfect because the heavy atoms will not have perfect positions and occupancies and some of the sites may be missing from the model and F_T will not be perfectly isomorphous with the native component of the derivative. Using the same reasoning that applied for developing the refinement target, F_T and F_Hj are thus down-weighted by D factors (0 ≤ D ≤ 1). Refining the D factor for F_Hj has the same effect as refining the occupancies and B factors of the heavy atoms and so can be absorbed by these parameters during refinement. Including errors, the calculated structure factor F_cj is given by

$[{\bf F}_{{\rm c}j} = D_j {\bf F}_{\rm T} + {\bf F}_{{\rm H}j}.]$

The calculated structure-factor amplitude F_cj (in terms of F_T and F_Hj) can now be compared with the observed structure-factor amplitude F_oj,

$[P(F_{{\rm o}j} \semi {\bf F}_{{\rm H}j}, {\bf F}_T) = P(F_{{\rm o}j} \semi F_{{\rm c}j}),]$

where F_cj = |D_jF_T + F_Hj|.

As was the case for deriving the equation for refinement likelihood and the rotation-function likelihood, the trick to deriving a maximum-likelihood MIR function is to introduce the phase difference α between the observed and calculated structure factors while developing the likelihood function and then to integrate out this (useful) nuisance phase at the end of the analysis (Bricogne, 1991; Read, 1991),

$[P(F_{{\rm o}j} \semi F_{{\rm c}j}) = \textstyle \int\limits_0^{2\pi } P(F_{{\rm o}j}, \alpha \semi F_{{\rm c}j})\,\,{\rm d}\alpha.]$

The probability of F_oj is a two-dimensional Gaussian in reciprocal space centred on F_cj with variance $[\sigma_{\Delta j}^{2}]$ (Fig. 9),

$[P({\bf F}_{{\rm o}j} \semi {\bf F}_{{\rm c}j}) = {1 \over {\pi \sigma _{\Delta j}^2 }}\exp \left (- {{|{\bf F}_{{\rm o}j} - {\bf F}_{{\rm c}j}|^2 } \over {\sigma _{\Delta j}^2 }}\right).]$

Using this probability and the integral above, it can be shown (Appendix A) that the likelihood function is yet another Rice distribution,

$[P(F_{{\rm o}j} \semi F_{{\rm c}j}) = {{2F_{{\rm o}j} } \over {\sigma _{\Delta j}^2 }}\exp \left (- {{F_{{\rm o}j} ^2 + F_{{\rm c}j}^2 } \over {\sigma _{\Delta j}^2 }}\right)I_0 \left({{{2F_{{\rm o}j} F_{{\rm c}j} } \over {\sigma _{\Delta j}^2 }}} \right).]$

Experimental errors are incorporated by inflating the variance of the distribution

$[P(F_{{\rm o}j} \semi F_{{\rm c}j}) = {{2F_{{\rm o}j}} \over {\sigma _{\Delta j}^2 + \sigma _{Fj}^2 }}\exp \left (- {{F_{{\rm o}j} ^2 + F_{{\rm c}j} ^2 } \over {\sigma _{\Delta j}^2 + \sigma _{Fj}^2 }} \right)I_0 \left({{{2F_{{\rm o}j} F_{{\rm c}j} } \over {\sigma _{\Delta j}^2 + \sigma _{Fj}^2 }}} \right). \eqno (8)]$

This is the likelihood function for a single reflection and a single derivative. Notice the similarities between this equation and the equations for P-Xray_r [and P-TF_r; (1)] and P-RF_r (2). The likelihood function is virtually identical to that for P-Xray_r except that F_c is not calculated directly from the model but from the sum of DF_T and F_H. To combine all the derivatives, the product over all the derivatives is taken before integrating out the nuisance parameter F_T. Substituting (8) into (7),

$[\eqalign {P{\hbox {-}}&{\rm MIR}_r =\cr &{\textstyle \int\limits_{-\infty}^{+\infty} \int\limits_{-\infty}^{+\infty}\prod\limits_{j = 1}^N} {{ 2F_{{\rm o}j} } \over {\sigma _{\Delta j}^2 + \sigma _{Fj}^2 }} \exp \left (- {{ F_{{\rm o}j} ^2 + F_{{\rm c}j} ^2 } \over {\sigma _{\Delta j}^2 + \sigma _{Fj}^2 }}\right) I_0 \left({{ 2F_{{\rm o}j} F_{{\rm c}j} } \over {\sigma_{\Delta j}^2 + \sigma _{Fj}^2 }} \right)\,{\rm d} {\bf F}_{\rm T},}]$

where F_cj = |D_jF_T + F_Hj|.

Figure 9
Derivative structure factors. The calculated derivative structure factor (F_c) in blue is the sum of DF_T and H_c (in green). The probability distribution for F_o is shown with grey shading.

Unfortunately, the integrating out of F_T cannot be performed analytically; it must be performed numerically (values calculated and summed). Double numerical integrations are generally slow to compute and so they have to be performed cleverly.

The MIR likelihood function assumes that errors in the models of heavy atoms are uncorrelated to one another. It also assumes that the non-isomorphism differences between the derivatives are uncorrelated to one another. This is not always the case, particularly when the heavy-atom compounds are chemically related.

6.2. MIRAS likelihood

The likelihood function for MIRAS is the probability of all the F_o⁺ and F_o^- given all the calculated heavy-atom structure factors $[{\bf F}_{\rm H}^{+}]$ and $[{\bf F}_{\rm H}^{-}]$ (rather than just the mean F_o and mean F_H as for MIR). However, this probability function is difficult to generate by maximum likelihood because F_o⁺ and F_o^- are highly correlated (if F_o⁺ is large/small then F_o^- will also be large/small). This problem is partially avoided if the mean F_o and anomalous difference ΔF are used instead of F_o⁺ and F_o^-, as these are less correlated with one another (if the mean F is large, the anomalous difference ΔF need not be large; North, 1965 ; Matthews, 1966 ; de La Fortelle & Bricogne, 1997). The probabilities for the normal and anomalous scattering components are then considered to be independent. The probability of the normal scattering component is the same as that derived for MIR. The probability for the anomalous difference is approximated by a least-squares term (rather than being given by a true likelihood term).

6.3. MAD likelihood

Currently, MAD phasing is treated as a case of MIRAS (de La Fortelle & Bricogne, 1997), where the derivatives correspond to the wavelengths. This is unsatisfactory because the assumption that the errors in the models of heavy atoms between derivatives (i.e. wavelengths) are uncorrelated with one another is necessarily violated in MAD. To be treated properly, the likelihood function would have to be computed by performing an integration for each unknown phase. For example, in two-wavelength MAD, four integrations would be required, one each for $[\alpha_{\lambda 1}^{+}]$ , $[\alpha_{\lambda 1}^{-}]$ , $[\alpha_{\lambda 2}^{+}]$ and $[\alpha_{\lambda 2}^{-}]$ . Only one such integral can be performed analytically (to give a Rice distribution) and all the others must be performed numerically. Numerical instability and limitations in computing power currently preclude this approach, although Bricogne (2000 ) has proposed an alternative solution to the problem of performing multiple integrations.

6.4. SAD likelihood

In the special case of SAD, there is a likelihood function that explicitly accounts for the correlations between F_o⁺ and F_o^- (Pannu & Read, 2004 ). The function includes the familiar Rice distribution, which primarily accounts for the anomalous difference, but also another term that accounts for the heavy atoms being part of the model of the normal scatterers (McCoy et al., 2004 ). Only a single numerical (phase) integration is required. The SAD likelihood for a reflection (P-SAD_r) is given by

$[P{\hbox {-}}{\rm SAD}_r = {{F_o^ - } \over {\pi \Sigma ^ - }}{\textstyle \int\limits_0^{2\pi }} \exp \left (- {{| {{\bf F}_o^ - - {\bf F}_H^ - } |^2 } \over {\Sigma ^ - }}\right)\Re ({F_o^ +, F_c^ +, \Sigma ^ + })\,\,{\rm d}\alpha ^ -,]$

where F_c⁺ = | $[{\bf F}_H^ +]$ + D_Φ( $[{\bf F}_o^-]$ − $[{\bf F}_H^ -]$ )| and

$[\Re (F_o^+, F_c^+, \Sigma^+) = {{2F_o^+} \over {\Sigma^+ }} \exp\left (- {{F_o^{+2} + F_c^{+2} } \over {\Sigma ^ + }}\right)I_0 \left({{2F_o^ + F_c^ + } \over {\Sigma ^ + }} \right).]$

7. Discussion

The Rice distribution is ubiquitous where maximum likelihood is applied in crystallography because it is the result of integrating out the phase of a two-dimensional Gaussian: the phase must be integrated out because only structure-factor amplitudes are measured and two-dimensional Gaussians are ubiquitous because of the central limit theorem or `random walks' of structure-factor components. In fact, the two-dimensional Gaussians arising from `random walks' are also fundamentally a result of the central limit theorem. Understanding how and why the Rice distribution arises are concepts that link maximum likelihood to all aspects of macromolecular crystallography.

I hope that this material will give students the confidence to look deeper into the maximum-likelihood literature and discover some of the subtleties lost in the simple explanations. For those who are inspired to know more, the crystallography course notes at https://www-structmed.cimr.cam.ac.uk (by Randy J. Read, Airlie J. McCoy, Andrew G. W. Leslie and Philip R. Evans) are recommended as an appropriate second step.

APPENDIX A

Probability distribution for acentric reflections

The probability distribution for F_o given F_c is a two-dimensional Gaussian with variance $[\sigma_{\Delta}^{2}]$ centred on F_c (Fig. 10),

$[P({\bf F}_{\rm o} \semi {\bf F}_{\rm c})= {1 \over {\pi\sigma_{\Delta}^{2}}} \exp \left (-{{|{\bf F}_{\rm o} - {\bf F}_{\rm c}|^{2}}\over {\sigma_{\Delta}^{2}}}\right).]$

The cosine rule with the phase α between F_o and F_c gives

$[|{\bf F}_{\rm o} - {\bf F}_{\rm c}| = (F_{\rm o}^{2} + F_{\rm c}^{2} -2 F_{\rm o}F_{\rm c}\cos \alpha)^{1/2}.]$

The likelihood function is given by integrating out the phase α from the probability P(F_o, α; F_c) (Fig. 10a),

$[P(F_{\rm o}\semi F_{\rm c}) = \textstyle \int \limits_{0}^{2 \pi} P(F_{\rm o}, \alpha\semi F_{\rm c}) \,\,{\rm d}\alpha. ]$

The relationship between P(F_o, α; F_c) and P(F_o; F_c) is given by

$[P(F_{\rm o}, \alpha\semi F_{\rm c}) = F_{\rm o} \times P({\bf F}_{\rm o} \semi {\bf F}_{\rm c}), ]$

where the factor F_o is introduced by changing the descriptions of the Fs from Cartesian coordinates (i.e. expressed in terms of real and imaginary components) to polar coordinates (i.e. expressed in terms of radial and angular components; this factor is called the Jacobian). Therefore,

$[P(F_{\rm o}\semi F_{\rm c}) = {{F_{\rm o}}\over {\pi\sigma_{\Delta}^{2} }} \exp \left ( - {{ F_{\rm o}^{2} + F_{\rm c}^{2}} \over {\sigma_{\Delta}^{2}}}\right ) {\textstyle \int \limits_{0}^{2\pi}} \exp \left ( {{2F_{\rm o}F_{\rm c}} \over {\sigma_{\Delta}^{2}}}\cos \alpha\right )\,\,{\rm d}\alpha.]$

This integral has an analytical solution of the form

$[\textstyle \int \limits_{0}^{2\pi} \exp(z\cos \alpha)\,\,{\rm d}\alpha = 2 \pi I_{0}(z),]$

where I₀ is the modified Bessel function of order 0. Therefore,

$[P(F_{\rm o}\semi F_{\rm c}) = {{2F_{\rm o}}\over {\sigma_{\Delta}^{2} }} \exp \left ( - {{ F_{\rm o}^{2} + F_{\rm c}^{2}} \over {\sigma_{\Delta}^{2}}}\right ) I_{0}\left ( {{2F_{\rm o}F_{\rm c}} \over {\sigma_{\Delta}^{2}}}\right ).]$

This is known as the Rice distribution (Fig. 10b). In the special case where F_c is zero,

$[P(F_{\rm o}\semi F_{\rm c} = 0) ={{2 F_{\rm o}} \over {\sigma_{\Delta}^{2}}} \exp\left ({{F_{\rm o}^{2}}\over {\sigma_{\Delta}^{2}}}\right).]$

This is known as the Wilson distribution.

Figure 10
Rice distribution. (a) The integral of the two-dimensional Gaussian is the sum of values on concentric circles around the origin (shown in grey). As F_c increases, the two-dimensional Gaussian becomes further offset from the origin (shown with red, yellow, green, blue and purple shading). (b) A series of five Rice functions are plotted in red, yellow, green, blue and purple for the different values of F_c indicated with the same colour shading in (a). As F_c increases, the Rice function becomes more like a one-dimensional Gaussian.

APPENDIX B

Probability distribution for centric reflections

The probability distribution for F_o given F_c is a one-dimensional Gaussian with variance $[\sigma_{\Delta}^{2}]$ centred on F_c. F_o is either in phase or out of phase with F_c (Fig. 11). Summing over the two possibilities for the unknown phase,

$[\eqalign {P(F_{\rm o}\semi F_{\rm c}) &= {1 \over {(2 \pi \sigma_{\Delta}^{2})^{1/2}}} \exp \left [ - {{(F_{\rm o} - F_{\rm c})^{2}} \over {2 \sigma_{\Delta}^{2}}}\right ] \cr &\ \quad +\ {1 \over {(2 \pi \sigma_{\Delta}^{2})^{1/2}}} \exp \left [ - {{(F_{\rm o} + F_{\rm c})^{2}} \over {2 \sigma_{\Delta}^{2}}}\right ]. }]$

Expanding the quadratics and using

$[2 \cosh x = \exp x + \exp (-x)]$

gives

$[P(F_{\rm o}\semi F_{\rm c}) = \left ( {2 \over {\pi\sigma_{\Delta}^{2} }}\right )^{1/2} \exp \left (-{{F_{\rm o}^{2} + F_{\rm c}^{2}} \over {2\sigma_{\Delta}^{2}}} \right ) \cosh \left ({{F_{\rm o}F_{\rm c}}\over {\sigma_{\Delta}^{2}}}\right ).]$

This is known as the Woolfson distribution (Woolfson, 1956 ). In the special case where F_c is zero,

$[P(F_{\rm o}\semi F_{\rm c}= 0) = \left ( {2 \over {\pi\sigma_{\Delta}^{2} }} \right )^{1/2} \exp \left ( - {{F_{\rm o}^{2}} \over {2 \sigma_{\Delta}^{2}}} \right ).]$

Figure 11
Woolfson distribution. The probability distribution for F_o given F_c is a one-dimensional Gaussian centred on F_c. Since the reflection is centric, F_o either has the same phase as F_c or is 180° out of phase with F_c.

Acknowledgements

I thank Randy Read, Garib Murshudov and Laurent Storoni for many useful discussions. I also thank Eleanor Dodson for comments on the manuscript.

References

Bricogne, G. (1991). Proceedings of the CCP4 Study Weekend. Isomorphous Replacement and Anomalous Scattering, edited by P. R. Evans & A. G. W. Leslie, pp. 60–68. Warrington: Daresbury Laboratory. Google Scholar
Bricogne, G. (1992). Proceedings of the CCP4 Study Weekend. Molecular Replacement, edited by W. Wolf, E. J. Dodson & S. Glover, pp. 62–75. Warrington: Daresbury Laboratory. Google Scholar
Bricogne, G. (1993). Acta Cryst. D49, 37–60. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (2000). Advanced Special Functions and Applications: Proceedings of the Melfi School on Advanced Topics in Mathematics and Physics, edited by D. Cocoliccio, G. Dattoli & H. M. Srivastava, pp. 315–232. Rome: Aracne Editrice. Google Scholar
Bricogne, G. & Irwin, J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson. M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory. Google Scholar
Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392–400. CrossRef CAS Web of Science IUCr Journals Google Scholar
Frieden, B. R. (1985). J. Opt. Soc. Am. 73, 1764–1770. Google Scholar
Green, E. A. (1979). Acta Cryst. A35, 351–359. CrossRef CAS IUCr Journals Web of Science Google Scholar
Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136–143. CrossRef CAS IUCr Journals Google Scholar
Jaynes, E. T. (1968). IEEE Trans. Syst. Sci. Cybern. SSC-4, 227–241. Google Scholar
Jaynes, E. T. (1979). The Maximum Entropy Formalism, edited by R. D. Levine & M. Tribus, pp 15–118. Cambridge: MIT Press. Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472–494. Google Scholar
McCoy, A. J. (2002). Curr. Opin. Struct. Biol. 12, 670–673. Web of Science CrossRef PubMed CAS Google Scholar
McCoy, A. J., Storoni, L. C. & Read, R. J. (2004). Acta Cryst. D60, 1220–1228. Web of Science CrossRef CAS IUCr Journals Google Scholar
Matthews, B. W. (1966). Acta Cryst. 20, 82–86. CrossRef IUCr Journals Web of Science Google Scholar
Mohammed-Djafari, A. (2003). Am. Inst. Phys. Conf. Proc. 659, 281–306. Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
North, A. C. T. (1965). Acta Cryst. 18, 212–216. CrossRef IUCr Journals Web of Science Google Scholar
Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Acta Cryst. D54, 1285–1294. Web of Science CrossRef CAS IUCr Journals Google Scholar
Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659–668. CrossRef CAS Web of Science IUCr Journals Google Scholar
Pannu, N. S. & Read, R. J. (2004). Acta Cryst. D60, 22–27. Web of Science CrossRef CAS IUCr Journals Google Scholar
Read, R. J. (1990). Acta Cryst. A46, 900–912. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (1991). Proceedings of the CCP4 Study Weekend. Isomorphous Replacement and Anomalous Scattering, edited by P. R. Evans & A. G. W. Leslie, pp. 69–79. Warrington: Daresbury Laboratory. Google Scholar
Read, R. J. (1994). Lecture Notes from the Workshop on Isomorphous Replacement Methods in Macromolecular Crystallography. Am. Crystallogr. Assoc. Ann. Meet., Atlanta, GA, USA. Google Scholar
Read, R. J. (2001). Acta Cryst. D57, 1373–1382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Read, R. J. (2003a). Acta Cryst. D59, 1891–1902. Web of Science CrossRef CAS IUCr Journals Google Scholar
Read, R. J. (2003b). Crystallogr. Rev. 9, 33–41. CrossRef CAS Google Scholar
Sim, G. A. (1959). Acta Cryst. 12, 813–815. CrossRef IUCr Journals Web of Science Google Scholar
Srinivasan, R. & Ramachandran, G. N. (1965). Acta Cryst. 19, 1003–1007. CrossRef CAS IUCr Journals Web of Science Google Scholar
Storoni, L. C., McCoy, A. J. & Read, R. J. (2004). Acta Cryst. D60, 432–438. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2000). Acta Cryst. D56, 965–972. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wilson, A. J. C. (1949). Acta Cryst. 2, 318–321. CrossRef IUCr Journals Web of Science Google Scholar
Woolfson, M. M. (1956). Acta Cryst. 9, 804–810. CrossRef CAS IUCr Journals Web of Science Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

BIOLOGICAL
CRYSTALLOGRAPHY

ISSN: 1399-0047

Volume 60| Part 12| December 2004| Pages 2169-2183

doi:10.1107/S0907444904016038

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Search term		doi		Advanced search
Author		volume	page

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Liking likelihood

1. Introduction

2. Experiments with dice

2.1. Dice and probability

2.2. Dice and maximum likelihood

2.3. Dice, independence and log-likelihood

2.4. Dice and Bayes' theorem

2.5. Dice and integrating out nuisance variables

2.6. Dice and the central limit theorem

2.7. Dice summary

3. Maximum likelihood in macromolecular crystallography

4. Refinement

4.1. Prior probability

4.2. Refinement likelihood

4.3. Sigma A

4.4. Weighting

4.5. Experimental phases

4.6. Probabilities and energies

5. Molecular replacement

5.1. Translation-function likelihood

5.2. Rotation-function likelihood

5.3. Partial structure

5.4. Fast searches

6. Experimental phasing

6.1. MIR likelihood

6.2. MIRAS likelihood

6.3. MAD likelihood

6.4. SAD likelihood

7. Discussion

APPENDIX A

Probability distribution for acentric reflections

APPENDIX B

Probability distribution for centric reflections

Acknowledgements

References

research papers