short communications\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767

A note on the Hendrickson–Lattman phase probability distribution and its equivalence to the generalized von Mises distribution

crossmark logo

aSchool of Biological Sciences, University of Auckland, Auckland, New Zealand
*Correspondence e-mail: rl.kingston@auckland.ac.nz

Edited by J. Hajdu, Uppsala University, Sweden and The European Extreme Light Infrastucture, Czechia (Received 1 December 2023; accepted 9 January 2024; online 16 February 2024)

Hendrickson & Lattman [Acta Cryst. (1970), B26, 136–143] introduced a method for representing crystallographic phase probabilities defined on the unit circle. Their approach could model the bimodal phase probability distributions that can result from experimental phase determination procedures. It also provided simple and highly effective means to combine independent sources of phase information. The present work discusses the equivalence of the Hendrickson–Lattman distribution and the generalized von Mises distribution of order two, which has been studied in the statistical literature. Recognizing this connection allows the Hendrickson–Lattman distribution to be expressed in an alternative form which is easier to interpret, as it involves the location and concentration parameters of the component von Mises distributions. It also allows clarification of the conditions for bimodality and access to a simplified analytical method for evaluating the trigonometric moments of the distribution, the first of which is required for computing the best Fourier synthesis in the presence of phase, but not amplitude, uncertainty.

1. Introduction

To enable determination of protein structures using X-ray crystallography, a variety of methods for experimental phase determination were developed and refined over several decades [see Hendrickson (2023[Hendrickson, W. A. (2023). IUCrJ, 10, 521-543.]) for a review]. All of these methods involved the systematic perturbation of Bragg diffraction from the crystal, by manipulating either the chemical composition of the crystal, the physical properties of the irradiating X-rays or both. A well understood feature of some approaches to experimental phase determination, including single isomorphous replacement and single wavelength anomalous dispersion, is that a twofold geometric ambiguity in the phase results, even in the absence of error (Matthews, 1970[Matthews, B. W. (1970). Crystallographic Computing, pp. 146-159. Copenhagen: Munksgaard.]; Vijayan, 1980[Vijayan, M. (1980). Computing in Crystallography, pp. 1901-1925. Bangalore: The Indian Academy of Sciences.]; Dauter et al., 2002[Dauter, Z., Dauter, M. & Dodson, E. J. (2002). Acta Cryst. D58, 494-506.]; McCoy & Read, 2010[McCoy, A. J. & Read, R. J. (2010). Acta Cryst. D66, 458-469.]; Hendrickson, 2014[Hendrickson, W. A. (2014). Q. Rev. Biophys. 47, 49-93.]). Hence the practical application of these phase determination procedures naturally generates bimodal phase probability distributions. This complexity must be captured in any mathematical function used to represent these probabilities. In addition, resolving the crystallographic phase problem for biological molecules experimentally often requires the combination of phase information from independent experiments.

The probability density function introduced by Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]) addressed these issues. It has the form

[\eqalignno { f\left( \theta \mid A, B, C, D \right) = & \ N \left (A,B,C,D \right ) \exp \big (A\cos\theta + B\sin\theta \cr & + C\cos2\theta + D\sin2\theta \big ). & (1) }]

Here (A, B, C, D) are the four coefficients of the distribution, which encode the phase information, and N is a normalization constant. Depending on the values of the coefficients, (1) may be either unimodal or bimodal. Most conveniently, when (1) is used to represent phase probabilities, independent sources of phase information can be combined through simple addition of the coefficients (A, B, C, D) because of the exponential form of the distribution.

Hence, (1), being both sufficiently flexible and numerically very convenient, became widely used to represent phase probability distributions in protein crystallography. We note that the Hendrickson–Lattman distribution is useful for modeling the phase probability distributions of acentric data, where the phase can take any value in the range 0–2π. For centric data, where there are only two phase possibilities, always separated by π, a discrete circular probability mass function provides the most straightforward descriptor.

Although the treatment of error in experimental phase determination has become increasingly sophisticated and is now generally based on the principle of maximum likelihood, with joint consideration of uncertainty in both amplitudes and phases (Read, 2003[Read, R. J. (2003). Acta Cryst. D59, 1891-1902.]; Bricogne et al., 2003[Bricogne, G., Vonrhein, C., Flensburg, C., Schiltz, M. & Paciorek, W. (2003). Acta Cryst. D59, 2023-2030.]; McCoy & Read, 2010[McCoy, A. J. & Read, R. J. (2010). Acta Cryst. D66, 458-469.]), the Hendrickson–Lattman distribution is still used to represent phase probability distributions in modern crystallographic software. Hence some clarification of its basic characteristics seems worthwhile.

Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]) briefly noted the similarities between their probability density distribution and the von Mises distribution, and the connection has been remarked on subsequently (Murshudov et al., 2011[Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355-367.]). However, to the best of our knowledge, these observations have not been systematically developed. Fully documenting the relation between the Hendrickson–Lattman and von Mises distributions and placing the procedures used by Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]) within the framework of circular statistics is the purpose of this short review.

2. The von Mises distribution

The von Mises probability density function is central to circular statistics, being the circular analog of the Gaussian probability density function on a line, and its properties are consequently very well documented (Batschelet, 1981[Batschelet, E. (1981). Circular Statistics in Biology. London: Academic Press.]; Fisher, 1993[Fisher, N. I. (1993). Statistical Analysis of Circular Data. Cambridge University Press.]; Mardia & Jupp, 1999[Mardia, K. V. & Jupp, P. E. (1999). Directional Statistics. Chichester: John Wiley.]; Jammalamadaka & Sengupta, 2001[Jammalamadaka, S. R. & Sengupta, A. (2001). Topics in Circular Statistics. Singapore: World Scientific.]). Like the Gaussian, the von Mises distribution is a mirror symmetric mono-modal distribution, defined by two parameters (Fig. 1[link]). μ is a location parameter. The function takes on its maximum value at μ, which is both the modal and mean value of the distribution. κ is a concentration parameter, named because as κ increases the distribution becomes more concentrated around μ. The von Mises probability density function is given by

[f\left(\theta \mid \mu , \kappa \right) = { {1}\over {2 \pi I_0(\kappa)}} \exp\left[\kappa\cos(\theta - \mu)\right], \eqno (2a)]

where μ ∈ [0, 2π), κ ≥ 0, and I0 is the modified Bessel function of the first kind and order 0.

[Figure 1]
Figure 1
The von Mises distribution. (a)–(d) Four different instantiations of the von Mises probability density function represented in (left) circular form and (right) linear form, where the functions have been unwrapped onto the line. In the circular representation, the radial distance from the unit circle at each angle indicates the probability density (solid shaded), and the vectors internal to the unit circle display the first trigonometric moment of the probability density distribution, calculated according to (13)[link], which identifies its center of mass. When κ = 0, the von Mises distribution reduces to the uniform circular distribution. In this case, the center of mass of the distribution corresponds to the center of the unit circle, and the first trigonometric moment is not fully defined.

A simple extension of the von Mises distribution allows for multimodality, subject to symmetry restrictions (Mardia & Spurr, 1973[Mardia, K. V. & Spurr, B. D. (1973). J. R. Stat. Soc. Ser. B (Methodological), 35, 422-436.]; Batschelet, 1981[Batschelet, E. (1981). Circular Statistics in Biology. London: Academic Press.]). The multimodal von Mises probability density function is given by

[f\left(\theta \mid \mu, \kappa\right) = {{1}\over {2\pi I_0(\kappa)}}\exp\left\{ \kappa\cos\left[n(\theta - \mu)\right] \right\}, \eqno (2b)]

where μ ∈ [0, 2π/n), κ ≥ 0 and n is a positive integer that specifies the number of modes. The modes of this highly symmetric distribution are separated by 2π/n, as depicted in Fig. 2[link] for the monomodal (n = 1), bimodal (n = 2) and trimodal (n = 3) cases.

[Figure 2]
Figure 2
Multimodal von Mises distribution. A single instantiation of the (a) monomodal (n = 1), (b) bimodal (n = 2) and (c) trimodal (n = 3) von Mises probability density functions are represented in (left) circular form and (right) linear form, as in Fig. 1[link]. When n = 1, the ordinary von Mises distribution results (see Fig. 1[link]).

3. The generalized von Mises distribution and its equivalence with the Hendrickson–Lattman distribution

The ordinary von Mises distribution [equation (2a)[link], Fig. 1[link]] is both unimodal and mirror symmetric, whereas its multimodal extension [equation (2b)[link], Fig. 2[link]] has both mirror and rotational symmetry. This limits applications. An important generalization of the von Mises distribution (Gatto & Jammalamadaka, 2007[Gatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341-353.]), which allows for both bimodality and asymmetry, is given by

[\eqalign { f \left(\theta \mid \mu_1, \mu_2, \kappa_1, \kappa_2 \right) = & \ {{1} \over {2\pi G_0\left(\mu_1,\mu_2,\kappa_1,\kappa_2\right)}} \cr & \times \! \exp \! \big[\kappa_1\cos\left(\theta \! - \! \mu_1\right) \! + \! \kappa_2\cos 2\left(\theta \! - \! \mu_2 \right)\big], } \eqno (3)]

where μ1 ∈ [0, 2π) and μ2 ∈ [0, π) are location parameters, and κ1 ≥ 0 and κ2 ≥ 0 are concentration parameters. The distribution (3)[link] can be considered to arise from the multiplication of a unimodal and a bimodal von Mises distribution (Fig. 3[link]), and is hence termed the generalized von Mises distribution of order 2 (the GvM2 distribution). Incorporating multimodal von Mises distributions of higher order into the product gives rise to an infinite series of probability distributions [see Gatto & Jammalamadaka (2007[Gatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341-353.]) and Gatto (2009[Gatto, R. (2009). Statistics, 43, 409-421.]) for context and commentary]; however, GvM distributions with order greater than two have found limited practical applications. The GvM2 distribution (3)[link] appears to have been first proposed by Maksimov (1967[Maksimov, V. M. (1967). Theory Probab. Appl. 12, 267-280.]) and its properties are now well studied (Yfantis & Borgman, 1982[Yfantis, E. A. & Borgman, L. E. (1982). Comm. Statist. Theory Methods, 11, 1695-1706.]; Gatto & Jammalamadaka, 2007[Gatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341-353.]; Gatto, 2008[Gatto, R. (2008). Stat. Comput. 18, 321-331.], 2009[Gatto, R. (2009). Statistics, 43, 409-421.]; Salvador & Gatto, 2022a,b[Salvador, S. & Gatto, R. (2022b). Comput. Stat. 37, 947-974.]).

[Figure 3]
Figure 3
Construction of the generalized von Mises distribution of order 2, from the monomodal and bimodal von Mises distributions. (a) Instantiation of the monomodal von Mises distribution with the parameters (μ1, κ1) as indicated. (b) Instantiation of the bimodal von Mises distribution with the parameters (μ2, κ2) as indicated. (c) Generalized von Mises distribution with the parameters (μ1, μ2, κ1, κ2). This is the product of the unimodal and bimodal distributions shown in (a) and (b), normalized by the constant [I0(κ1)I0(κ2)]/G0(μ1, μ2, κ1, κ2). The distributions are represented in (left) circular form and (right) linear form as in Fig. 1[link].

The normalizing constant G0 appearing in (3)[link] ensures that the distribution is a probability density function, and is obtained by definite integration of the function over the unit circle. This integral cannot be evaluated in closed form, but can be written in terms of an infinite series expansion (Yfantis & Borgman, 1982[Yfantis, E. A. & Borgman, L. E. (1982). Comm. Statist. Theory Methods, 11, 1695-1706.]; Gatto & Jammalamadaka, 2007[Gatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341-353.]) as

[\eqalignno { G_0 \left(\mu_1,\mu_2,\kappa_1,\kappa_2\right) = & \ {{1} \over {2\pi}} \int\limits_0^{2\pi} \exp \big[\kappa_1\cos\left(\theta - \mu_1\right) \cr & + \kappa_2\cos2 \left(\theta - \mu_2\right)\big] \cr = & \ I_0 \left(\kappa_1\right) I_0\left(\kappa_2\right) + 2\sum\limits_{j= 1}^{\infty} I_{2j}\left(\kappa_1\right)I_j\left(\kappa_2\right) \cr & \times \cos 2j \left(\mu_1 - \mu_2 \right), & (4)} ]

where In are the modified Bessel functions of the first kind and integer order n. The derivation of this result relies on the Jacobi–Anger expansion (Olver et al., 2010[Olver, F. W., Lozier, D. W., Boisvert, R. F. & Clark, C. W. (2010). NIST Handbook of Mathematical Functions. New York: Cambridge University Press.]):

[\exp(z\cos\theta) = I_0(z) + 2\sum \limits_{n=1}^{\infty} I_n(z)\cos(n\theta). \eqno (5)]

As the modified Bessel functions decrease rapidly to zero with increasing order (Oldham et al., 2009[Oldham, K. B., Myland, J. C. & Spanier, J. (2009). An Atlas of Functions. Springer.]), accurate evaluation of the normalizing constant G0 using (4)[link] can be achieved with only the first few summands of the infinite series.

The GvM2 distribution (3)[link] can be symmetric or asymmetric, unimodal or bimodal, depending on its parameters (μ1, μ2, κ1, κ2) [see the literature (Yfantis & Borgman, 1982[Yfantis, E. A. & Borgman, L. E. (1982). Comm. Statist. Theory Methods, 11, 1695-1706.]; Gatto & Jammalamadaka, 2007[Gatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341-353.]; Kato & Jones, 2010[Kato, S. & Jones, M. C. (2010). J. Am. Stat. Assoc. 105, 249-262.]; Salvador & Gatto, 2022a[Salvador, S. & Gatto, R. (2022). Comm. Statist. Theory Methods, pp. 1-17.]) for demonstration and discussion]. If κ1 = 0, the GvM2 distribution [equation (3)[link]] reduces to a bimodal von Mises distribution [equation (2b)[link] with n = 2], whereas if κ2 = 0, it reduces to a monomodal von Mises distribution [equation (2a)[link]; equation 2(b)[link] with n = 1]. If κ1 = 0 and κ2 = 0, the GvM2 distribution reduces to the uniform circular distribution (Yfantis & Borgman, 1982[Yfantis, E. A. & Borgman, L. E. (1982). Comm. Statist. Theory Methods, 11, 1695-1706.]). The general conditions for bimodality of the GvM2 distribution are elaborated below.

As written, the connections between the GvM2 (3)[link] and the Hendrickson–Lattman distribution (1)[link] are not immediately seen. However the GvM2 distribution can be reparameterized (Gatto & Jammalamadaka, 2007[Gatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341-353.]) as follows.

If

[\eqalign { \lambda_1 & = \kappa_1\cos\mu_1, \cr \lambda_2 & = \kappa_1\sin\mu_1 ,\cr \lambda_3 & = \kappa_2\cos2\mu_2,\cr \lambda_4 & = \kappa_2\sin2\mu_2,}\eqno (6)]

then the GvM2 probability density function can be expressed as

[\eqalignno { f\left(\theta \mid \lambda_1, \lambda_2, \lambda_3, \lambda_4 \right) = \ & \exp\big[\lambda_1\cos\theta + \lambda_2\sin\theta + \lambda_3\cos 2\theta \cr & + \lambda_4 \sin 2\theta - K\left(\lambda_1,\lambda_2,\lambda_3,\lambda_4\right)\big], & (7)}]

where the constant K(λ1, λ2, λ3, λ4) is an appropriate transformation of the normalizing factor appearing in the denominator of (3)[link],

[K\left(\lambda_1,\lambda_2,\lambda_3,\lambda_4\right) = \ln(2\pi) + \ln\left[G_0\left(\mu_1,\mu_2,\kappa_1,\kappa_2\right)\right]. \eqno (8)]

Practically, the normalizing constant K(λ1, λ2, λ3, λ4) [equation (8)[link]] can be evaluated using (4)[link]. Equations (6)[link] have the form of a polar-to-Cartesian coordinate transformation. Parameters (μ1, μ2, κ1, κ2) can therefore be recovered from parameters (λ1, λ2, λ3, λ4) using

[\eqalign { \tan\left(\mu_1\right) = & \left({{\lambda_2} \over {\lambda_1}}\right) ,\cr \tan\left(2\mu_2\right) = & \left({{\lambda_4}\over{\lambda_3}}\right), \cr \kappa_1 = & \left(\lambda_1^2 + \lambda_2^2 \right)^{1/2} ,\cr \kappa_2 = & \left(\lambda_3^2 + \lambda_4^2\right)^{1/2} .}\eqno (9)]

The reparameterized version of the GvM2 distribution (7)[link] is clearly equivalent to the Hendrickson–Lattman probability distribution (1)[link], with A = λ1, B = λ2, C = λ3, D = λ4 and N = exp(−K) = 1/(2πG0). The equivalence has been noted previously (Murshudov et al., 2011[Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355-367.]). The GvM2/Hendrickson–Lattman distributions belong to the exponential family of probability distributions, with (7)[link] being the canonical representation of that family. The relationships between the two parameterizations of the Hendrickson–Lattman/GvM2 distribution are illustrated in Fig. 4[link]. Some aspects of the distribution are easier to recognize when it is written in the order factorized form (3)[link] rather than the expanded form of (1)[link] or (7)[link]. For example, when μ1 approaches μ2, the GvM2 probability density function (3)[link] approaches mirror symmetric, and is either unimodal or bimodal with peaks at antipodal positions [Fig. 4[link](a)] (Gatto & Jammalamadaka, 2007[Gatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341-353.]; Salvador & Gatto, 2022b[Salvador, S. & Gatto, R. (2022b). Comput. Stat. 37, 947-974.]). When both concentration parameters (κ1, κ2) become small, the distribution approaches uniform circular.

[Figure 4]
Figure 4
Equivalent parameterizations of the generalized von Mises distribution of order 2. (a)–(d) Four different instantiations of the GvM2 distribution represented in (left) circular and (right) linear form, as in Fig. 1[link]. Each distribution can be specified using either the expanded expression (1)[link] and the parameters (A, B, C, D) or the order factorized expression (3)[link] and the parameters (μ1, μ2, κ1, κ2). The parameters are given in the table at the bottom of the figure. The derived parameters ρ = κ1/4κ2 and δ = μ1μ2 mod(π) are useful for diagnosing the bimodality of the distribution (Salvador & Gatto, 2022a[Salvador, S. & Gatto, R. (2022). Comm. Statist. Theory Methods, pp. 1-17.]). The vectors internal to the unit circle display the first trigonometric moment of each GvM2 distribution, calculated according to (18)[link].

The order factorized form of the GvM2 distribution (3)[link] also allows analysis of the conditions for bimodality of the distribution, which are of particular interest in crystallography. These conditions are most readily expressed in terms of two derived quantities: the scaled ratio of the two concentration parameters, ρ = κ1/4κ2; and the difference between the location parameters, δ = μ1μ2 mod(π). When ρ ≤ 1/2, the GvM2 distribution is always bimodal [see e.g. Figs. 3[link](c) and 4[link](a)]. When ρ ≥ 1, the GvM2 distribution is always unimodal [see e.g. Fig. 4[link](b)]. When 1/2 < ρ < 1, the GvM2 distribution may be either unimodal or bimodal ,dependent on the value of δ [see e.g. Figs. 4[link](c) and 4[link](d)], the detail being somewhat complex as it involves the roots of a quartic equation. Full details are given by Salvador & Gatto (2022a[Salvador, S. & Gatto, R. (2022). Comm. Statist. Theory Methods, pp. 1-17.]).

Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]) actually used a functionally equivalent reparameterization of the probability distribution in their paper. To facilitate analytical integration of the distribution, and calculation of its normalizing constant N, they perform a change of variables, almost identical to (9)[link], which effectively switches from the expanded form of the distribution (1) or (7)[link] to the order factorized form (3)[link]. Allowing for the variations in definitions and notation, the result obtained for the normalization constant [equation (21a) of Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.])] is the same as (4)[link], up to a factor of 2π. Other integrations were performed that enable calculation of the best Fourier synthesis. Before considering these results, we first reframe the crystallographic problem being treated using the terminology of directional statistics.

4. The first trigonometric moment of a circular probability distribution and the best Fourier synthesis

As with probability distributions defined on the line, probability distributions defined on the circle can be characterized by a series of moments, which are obtained by integration of products of the distribution. However, these moments must be defined differently because of the circular periodicity. The trigonometric moments used to characterize circular distributions are named for the trigonometric functions that appear inside the integral. Unlike the regular moments, the trigonometric moments are complex-valued quantities. Though trigonometric moments of arbitrary order can be defined, we consider here only the first trigonometric moment which is defined as (Fisher, 1993[Fisher, N. I. (1993). Statistical Analysis of Circular Data. Cambridge University Press.]; Mardia & Jupp, 1999[Mardia, K. V. & Jupp, P. E. (1999). Directional Statistics. Chichester: John Wiley.]; Jammalamadaka & Sengupta, 2001[Jammalamadaka, S. R. & Sengupta, A. (2001). Topics in Circular Statistics. Singapore: World Scientific.])

[\eqalignno { {\bf m}_1 & = \! \int\limits_0^{2\pi} \! \cos(\theta)f(\theta) \, {\rm d}\theta + i \! \int\limits_0\limits^{2\pi} \! \sin(\theta)f(\theta) \, {\rm d}\theta \cr & = a_1 + ib_1 = \rho_1\exp\left(i\theta_1\right), & (10)}]

where f(θ) is the probability density function.

The quantities

[\eqalign { a_1 = & \int\limits_0^{2\pi}\cos(\theta)f(\theta) \, {\rm d}\theta, \cr b_1 = & \int\limits_0^{2\pi} \sin(\theta)f(\theta)\, {\rm d}\theta } \eqno (11)]

are the components of the first trigonometric moment expressed in Cartesian form.

The quantities

[\eqalign { \rho_1 = & \left ( a_1^2 + b_1^2 \right )^{1/2} ,\cr \theta_1 = & \ {\rm atan} 2\left(b_1,a_1\right) } \eqno (12)]

are the components of the first trigonometric moment expressed in polar form.

In the field of circular statistics, the modulus (ρ1) of the first trigonometric moment is termed the mean length (sometimes the mean resultant length), while the argument (θ1) is termed the mean direction. For the ordinary von Mises distribution (2a)[link], the mean length and mean direction are given by (Fisher, 1993[Fisher, N. I. (1993). Statistical Analysis of Circular Data. Cambridge University Press.]; Mardia & Jupp, 1999[Mardia, K. V. & Jupp, P. E. (1999). Directional Statistics. Chichester: John Wiley.]; Jammalamadaka & Sengupta, 2001[Jammalamadaka, S. R. & Sengupta, A. (2001). Topics in Circular Statistics. Singapore: World Scientific.])

[\eqalign { \rho_1 &= {{I_1[\kappa]}\over{I_0[\kappa]}} ,\cr \theta_1 &= \alpha. } \eqno (13)]

The first trigonometric moments for particular instantiations of the ordinary von Mises distribution are displayed in Fig. 1[link]. The first trigonometric moment identifies the center of mass of a circular probability density function. The mean length, which can vary between 0 and 1, provides a useful measure of the dispersion of a unimodal distribution, such as the von Mises, though the interpretation is less straightforward for a potentially multimodal distribution such as the generalized von Mises.

Irrespective of the form of a circular probability distribution, the first trigonometric moment is of particular importance in crystallography. This is because, ignoring errors in the Fourier amplitudes, and given probability density functions for the phases, the best Fourier synthesis (in a least-squares sense) is obtained using the product of the first trigonometric moment and the measured Fourier amplitudes as coefficients. Therefore, the required coefficients are

[{\bf F}_{\rm best} (hkl) = \left|F(hkl)\right| {\bf m}_1(hkl) = \left|F(hkl)\right| \rho_1(hkl)\exp\left[i\theta_1(hkl)\right], \eqno (14)]

where Fbest(hkl) represents the complex Fourier coefficients and |F(hkl)| represents the measured Fourier amplitudes. Hence the best Fourier synthesis is computed using the mean direction as the phase, while weighting the Fourier amplitudes by the mean length. This is the essential result given in the foundational paper by Blow & Crick (1959[Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794-802.]) [see the literature (Matthews, 1970[Matthews, B. W. (1970). Crystallographic Computing, pp. 146-159. Copenhagen: Munksgaard.]; Vijayan, 1980[Vijayan, M. (1980). Computing in Crystallography, pp. 1901-1925. Bangalore: The Indian Academy of Sciences.]; McCoy & Read, 2010[McCoy, A. J. & Read, R. J. (2010). Acta Cryst. D66, 458-469.]) for discussion]. In crystallographic applications, the mean length has historically been termed the `figure of merit', and the mean direction the `best' or `centroid' phase (Matthews, 1970[Matthews, B. W. (1970). Crystallographic Computing, pp. 146-159. Copenhagen: Munksgaard.]; Vijayan, 1980[Vijayan, M. (1980). Computing in Crystallography, pp. 1901-1925. Bangalore: The Indian Academy of Sciences.]).

5. The first trigonometric moment of the GvM2 distribution

We now consider the analytical evaluation of the first trigonometric moment of the GvM2 distribution, which involves the integrals in (11)[link]. For the GvM2 distribution, no closed form solution for these integrals exists. However, as for the normalizing constant of the distribution [equation (4)[link]], solutions can again be obtained that involve rapidly converging series expansions. For clarity, we restate the results obtained by Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]), using the standard notation for the GvM2 distribution (3)[link]. The procedure described by Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]), when applied to evaluate the integrals

[\eqalign { G_0 \left(\mu_1,\mu_2,\kappa_1,\kappa_2\right) a_1 = & \ {{1}\over{2\pi}} \int\limits_0^{2\pi} \cos(\theta) \exp\big[\kappa_1\cos\left(\theta - \mu_1\right) \cr & + \kappa_2\cos 2\left(\theta - \mu_2\right)\big]\,{\rm d}\theta ,\cr G_0 \left(\mu_1,\mu_2,\kappa_1,\kappa_2\right) b_1 = \ & {{1}\over{2\pi}} \int\limits_0^{2\pi} \sin(\theta)\exp\big[\kappa_1\cos\left(\theta - \mu_1\right) \cr & + \kappa_2\cos 2\left(\theta - \mu_2\right)\big] \,{\rm d}\theta ,} \eqno (15)]

yields

[\eqalign { G_0\left(\mu_1,\mu_2,\kappa_1,\kappa_2\right) a_1 = & \ I_0\left(\kappa_2\right) I_1\left(\kappa_1\right) \cos\left(\mu_1\right) + \sum\limits_{n=1}^{\infty} I_n\left(\kappa_2\right) \cr & \times \big\{ I_{2n-1} \left(\kappa_1\right) \cos\left[\mu_1 - 2n \left(\mu_1 - \mu_2 \right) \right] \cr & + I_{2n+1}\left(\kappa_1\right) \cos\left[\mu_1 + 2n \left(\mu_1 - \mu_2 \right) \right]\big\}, \cr \ G_0 \left(\mu_1,\mu_2,\kappa_1,\kappa_2\right) b_1 = & \ I_0 \left(\kappa_2\right)I_1\left(\kappa_1\right)\sin\left(\mu_1\right) + \sum\limits_{n=1}^{\infty}I_n\left(\kappa_2\right) \cr & \times \big\{ I_{2n-1} \left(\kappa_1\right) \sin\left[\mu_1 - 2n \left(\mu_1 - \mu_2\right)\right] \cr & + I_{2n+1}\left(\kappa_1\right)\sin\left[\mu_1 + 2n \left(\mu_1 - \mu_2\right)\right] \big\}. } \eqno (16)]

The proof again rests on the repeated use of the Jacobi–Anger expansion (5)[link] and standard trigonometric identities. By making substitutions that reflect the variant re-parameterization of the probability density function used by Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]),

[\eqalign { \kappa_1 & = S ,\cr \kappa_2 & = T, \cr \mu_1 & = -\sigma, \cr \mu_2 & = -{{1}\over {2\tau}} ,} \eqno (17) ]

then expressions (16)[link] are seen to be equivalent to the equations appearing at the bottom of page 141 of the article by Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]), up to a factor of 2π (noting the presence of a typographical error resulting in an erroneous change of sign when specifying the Bessel functions). The result (16)[link] can also be obtained from the expressions for the trigonometric moments of arbitrary order, reported by Yfantis & Borgman (1982[Yfantis, E. A. & Borgman, L. E. (1982). Comm. Statist. Theory Methods, 11, 1695-1706.]), who used an identical method of derivation.

Without loss of generality, we now consider the case where μ1 = 0. For any GvM2 distribution this can be achieved by an angular coordinate transformation. When setting μ1 = 0, expressions (16)[link] for the components of the first trigonometric moment simplify to

[\eqalign { G_0\left(\mu_1,\mu_2,\kappa_1,\kappa_2\right) a_1 = & \ I_0 \left(\kappa_2\right) I_1\left(\kappa_1\right) + \sum\limits_{n=1}^{\infty} \cos\left(2n\delta\right)I_n\left(\kappa_2\right) \cr & \times \big\{ I_{2n+1}\left(\kappa_1\right) + I_{2n-1} \left(\kappa_1\right)\big\}, \cr G_0 \left(\mu_1,\mu_2,\kappa_1,\kappa_2\right) b_1 = & \ \sum\limits_{n=1}^{\infty} \sin\left(2n\delta\right) I_n\left(\kappa_2\right)\cr & \times \big\{I_{2n+1}\left(\kappa_1\right) - I_{2n-1}\left(\kappa_1\right)\big\}, } \eqno (18)]

where δ = μ1μ2 is the difference between the location parameters of the distribution. This is a computationally more convenient way to analytically evaluate the integrals, and is also the result given by Gatto (2009[Gatto, R. (2009). Statistics, 43, 409-421.]), made specific for the first trigonometric moment.

The first trigonometric moments for particular instantiations of the GvM2 distribution, evaluated using (18)[link], are displayed in Fig. 4[link].

6. Conclusions

Directional data are ubiquitous in the physical and biological sciences, so it is probably unsurprising that the circular probability distribution developed by Hendrickson & Lattman (1970[Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136-143.]) was independently discovered and characterized by others. The exponential form of the Hendrickson–Lattman probability distribution confers many desirable properties. However, the Hendrickson–Lattman coefficients A, B, C and D lack straightforward meaning. Recognizing the equivalence of the Hendrickson–Lattman and GvM2 distributions allows reparameterization of the distribution to a more intuitive form that reflects the relationship with the von Mises distribution. It also allows a fuller appreciation of the general mathematical and statistical properties of the distribution, including the conditions for bimodality, and access to analytical procedures for computing all its trigonometric moments. There may be applications in crystallography where the inferential properties of the Hendrickson–Lattman/GvM2 distribution become important (i.e. when the parameters of the distribution need to be inferred, on the basis of computational procedures that effectively sample phase probabilities), and these have been studied in the statistical literature.

Acknowledgements

Open access publishing facilitated by The University of Auckland, as part of the Wiley–The University of Auckland agreement via the Council of Australian University Librarians.

Funding information

The following funding is acknowledged: Royal Society of New Zealand (grant No. 20-UOA-138).

References

First citationBatschelet, E. (1981). Circular Statistics in Biology. London: Academic Press.  Google Scholar
First citationBlow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationBricogne, G., Vonrhein, C., Flensburg, C., Schiltz, M. & Paciorek, W. (2003). Acta Cryst. D59, 2023–2030.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDauter, Z., Dauter, M. & Dodson, E. J. (2002). Acta Cryst. D58, 494–506.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationFisher, N. I. (1993). Statistical Analysis of Circular Data. Cambridge University Press.  Google Scholar
First citationGatto, R. (2008). Stat. Comput. 18, 321–331.  Web of Science CrossRef Google Scholar
First citationGatto, R. (2009). Statistics, 43, 409–421.  Web of Science CrossRef Google Scholar
First citationGatto, R. & Jammalamadaka, S. R. (2007). Stat. Methodol. 4, 341–353.  CrossRef Google Scholar
First citationHendrickson, W. A. (2014). Q. Rev. Biophys. 47, 49–93.  Web of Science CrossRef PubMed Google Scholar
First citationHendrickson, W. A. (2023). IUCrJ, 10, 521–543.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationHendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136–143.  CrossRef CAS IUCr Journals Google Scholar
First citationJammalamadaka, S. R. & Sengupta, A. (2001). Topics in Circular Statistics. Singapore: World Scientific.  Google Scholar
First citationKato, S. & Jones, M. C. (2010). J. Am. Stat. Assoc. 105, 249–262.  Web of Science CrossRef CAS Google Scholar
First citationMaksimov, V. M. (1967). Theory Probab. Appl. 12, 267–280.  CrossRef Web of Science Google Scholar
First citationMardia, K. V. & Jupp, P. E. (1999). Directional Statistics. Chichester: John Wiley.  Google Scholar
First citationMardia, K. V. & Spurr, B. D. (1973). J. R. Stat. Soc. Ser. B (Methodological), 35, 422–436.  Google Scholar
First citationMatthews, B. W. (1970). Crystallographic Computing, pp. 146–159. Copenhagen: Munksgaard.  Google Scholar
First citationMcCoy, A. J. & Read, R. J. (2010). Acta Cryst. D66, 458–469.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMurshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationOldham, K. B., Myland, J. C. & Spanier, J. (2009). An Atlas of Functions. Springer.  Google Scholar
First citationOlver, F. W., Lozier, D. W., Boisvert, R. F. & Clark, C. W. (2010). NIST Handbook of Mathematical Functions. New York: Cambridge University Press.  Google Scholar
First citationRead, R. J. (2003). Acta Cryst. D59, 1891–1902.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSalvador, S. & Gatto, R. (2022). Comm. Statist. Theory Methods, pp. 1–17.  Google Scholar
First citationSalvador, S. & Gatto, R. (2022b). Comput. Stat. 37, 947–974.  Web of Science CrossRef Google Scholar
First citationVijayan, M. (1980). Computing in Crystallography, pp. 1901–1925. Bangalore: The Indian Academy of Sciences.  Google Scholar
First citationYfantis, E. A. & Borgman, L. E. (1982). Comm. Statist. Theory Methods, 11, 1695–1706.  CrossRef Web of Science Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767
Follow J. Appl. Cryst.
Sign up for e-alerts
Follow J. Appl. Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds