Volume 67 Received 4 May 2010  Bayesian algorithms for recovering structure from singleparticle diffraction snapshots of unknown orientation: a comparison^{a}Department of Physics, University of WisconsinMilwaukee, 1900 E. Kenwood Boulevard, Milwaukee, WI 53211, USA The advent of Xray freeelectron lasers promises the possibility to determine the structure of individual particles such as microcrystallites, viruses and biomolecules from singleshot diffraction snapshots obtained before the particle is destroyed by the intense femtosecond pulse. This program requires the ability to determine the orientation of the particle giving rise to each snapshot at signal levels as low as ~10^{2} photons per pixel. Two apparently different approaches have recently demonstrated this capability. Here we show they represent different implementations of the same fundamental approach, and identify the primary factors limiting their performance. 
Xray freeelectron lasers promise to move crystallography beyond crystals. For example, moves are afoot to determine the structure of biological molecules and their assemblies by exposing a succession of individual single particles to intense femtosecond pulses of Xrays (Solem & Baldwin, 1982; Neutze et al., 2004; Gaffney & Chapman, 2007). In addition to experimental issues, two algorithmic challenges must be overcome in order to recover structure from such diffraction snapshots. First, the orientation of the object giving rise to each snapshot must be determined. Second, this must be performed at extremely low signal. A typical 500 kD biomolecule, for example, scatters only 100 of the 10^{12} incident photons, with the photon count per pixel being as low as 10^{2} at the detector (Shneerson et al., 2008). As the particle orientations giving rise to the snapshots are unknown, the signal cannot be boosted by averaging, and orientation recovery must be carried out at `raw signal level' in the presence of shot (Poisson) and background scattering noise (Shneerson et al., 2008; Fung et al., 2009). Orientation recovery is thus one of the most critical steps in singleparticle structure determination (Leschziner & Nogales, 2007). Once diffractionpattern orientations have been discovered, the threedimensional diffraction volume can be assembled and the particle structure recovered by standard phasing algorithms (Gerchberg & Saxton, 1972; Feinup, 1978; Miao et al., 2001; Shneerson et al., 2008; Fung et al., 2009; Loh & Elser, 2009).
Using an adaptation of generative topographic mapping (GTM) (Bishop et al., 1998; Svensén, 1998), Fung et al. (2009) published the first successful recovery of the structure of a molecule from simulated diffraction snapshots of unknown orientation at signal levels expected from a 500 kD molecule by utilizing the information content of the entire ensemble of diffraction snapshots. Subsequently, Loh & Elser (2009) demonstrated structure recovery from simulated diffraction snapshots by an apparently different approach, using a socalled expansionmaximizationcompression (EMC) algorithm (Loh & Elser, 2009). Both approaches have been validated with experimental data. Loh et al. (2010) have oriented snapshots from iron oxide nanoparticles obtained by singleshot diffraction. Using GTM, Fung et al. (2010) and Schwander et al. (2010) have determined the orientation of diffraction snapshots from gold nanofoam with 8 × 10^{2} scattered photons per Shannon pixel with an orientational accuracy of about one Shannon angle. Using a variety of manifold embedding approaches, Giannakis et al. (2010) have demonstrated orientation recovery from diffraction snapshots of superoxide dismutase crystals with 1° accuracy compared with the goniometer step size of 0.5° and the crystal mosaicity of 0.8°. Using recently discovered symmetries of image formation, Giannakis et al. (2010) have used manifold approaches for orientation recovery and threedimensional reconstruction of single chaperonin molecules with experimental cryoelectron microscopy snapshots as well as experimental snapshots processed to represent doses 10× lower than is possible with existing techniques.
Here we show the two Bayesian approaches of Loh & Elser (2009) and Fung et al. (2009) are fundamentally the same, and discuss their capabilities and limitations. Issues to do with the way each approach is implemented and performs under different conditions are beyond the scope of the present paper, if only because these aspects are under active development. In order to facilitate the discussion, the structurerecovery process is divided into two steps: (a) orienting the diffraction snapshots and assembling the threedimensional diffraction volume; and (b) recovering the structure by a phasing algorithm. Since we are concerned with orientation recovery, the discussion will be confined to the first step.
The differences in presentation and notation notwithstanding, the Fung et al. (2009) and the Loh & Elser (2009) approaches are the same in all essential features. Specifically, they both:
(a) exploit the information content of the entire data set;
(b) recognize that a nonlinear mapping function relates the space of object orientations to the space of scattered intensities;
(c) determine the mapping function by Bayesian inference;
(d) use the well established expectationmaximization (EM) iterative algorithm (Dempster et al., 1977) to maximize likelihood;
(e) apply a constraint to guide likelihood maximization; and
(f) implement noiserobust algorithms with essentially the same computational scaling behaviors.
At the conceptual level, the primary difference between the two approaches concerns the way the step (e) is introduced. This paper elucidates the essential similarity between these two approaches, thus clarifying the common basis of Bayesian approaches to orienting snapshots. Details of each approach can be found in the cited references (Svensén, 1998; Fung et al., 2009; Loh & Elser, 2009; Giannakis et al., 2010). To facilitate a comparison of the two papers, Table 1 provides a translation table for the symbols used in each.

In essence, diffraction from a given object is a process (`a machine'), which takes an orientation as input to generate a diffraction pattern as output. With a detector consisting of p pixels, one can represent a diffraction pattern as a vector in a pdimensional Euclidean space of intensities, with the nth component of the vector consisting of the intensity recorded at the nth detector pixel. The information content of each diffraction pattern can be captured by ensuring that the pixels represent ShannonNyquist samples. In this picture, diffraction maps an orientation to a point in a pdimensional space. Because an object has only three orientational degrees of freedom (`Euler angles'), in the absence of noise, the points in the pdimensional space of intensities define a threedimensional manifold, which is, in fact, a nonlinear map of the SO(3) manifold of orientations (Giannakis et al., 2010).^{1}
The representation of object orientations bears careful consideration. Despite their widespread use, Euler angles are not a good representation of orientational similarity, because an object can be rotated through large Euler angles () and end at an orientation very close to its starting point. As the Euclidean distance in quaternion space is a good measure of (dis)similarity between orientations, both Fung et al. (2009) and Loh & Elser (2009) use unit quaternions (Kuipers, 2002) to represent orientations. Diffraction to a point in reciprocal space, therefore, can be thought of as a functional y(x), with x representing a unit quaternion.
A diffraction snapshot consists of p intensity values. The mapping thus takes an orientation x to generate a model snapshot . These are to be compared with experimental snapshots , but will, in general, not be identical to any single snapshot owing to (experimental) noise.^{2}
Because a given object has only three orientational degrees of freedom, the points representing the diffraction snapshots in the socalled manifest intensity space trace out a threedimensional manifold, which is a nonlinear map of the SO(3) manifold of orientations. At a conceptual level, given the `input' and `output' manifolds, it is possible to discover the nonlinear map between them. This links (`maps') a diffraction snapshot to a given orientation, and thus assigns an orientation to each diffraction snapshot (Fung et al., 2009; Giannakis et al., 2010). Once this has been accomplished, snapshots of similar orientation can be averaged to boost the signal, and structure recovery can proceed by standard techniques. In fact, appropriately wielded, manifold embedding can improve the signal far more efficiently than simple averaging of similar snapshots (Schwander et al., 2010; Giannakis et al., 2010), but this is beyond the scope of the present paper.
We now discuss how this conceptual outline forms the basis of the two apparently different approaches by Fung et al. (2009) (hereafter Fung) and Loh & Elser (2009) (hereafter LE).
Both approaches use the conceptual framework that snapshot orientations can be determined by discovering the nonlinear map connecting the two manifolds. The power of this general approach stems from the fact that the intensity manifold is defined by the entire ensemble of snapshots. In essence, one is using the whole data set to assign an orientation to each snapshot. This is needed to overcome the paucity of information in any single snapshot. Key here is the recognition that the `mutual information' between the snapshots of a large ensemble is much larger than the information in any single snapshot (Fung et al., 2009; Elser, 2009).
To render the formalism tractable, the SO(3) space of orientations is represented by a discrete set of K orientations (`nodes') { x_{k}}, distributed nearly uniformly on the threesphere (Lovisolo & da Silva, 2001; Coxeter, 1973). The internode spacing is chosen to satisfy the ShannonNyquist sampling criterion, determined as follows. Consider recovering the structure of an object with largest diameter D (radius R) to resolution r (Fig. 1). The orientational accuracy needed is then
with the number of independent orientations in three dimensions given by
The prefactor of accounts for the fact that the threesphere is a doublecover of SO(3). The Shannon element in terms of quaternions q is
leading to
where S is the number of symmetry elements of the molecule being reconstructed.
 Figure 1 Schematic relationship between object diameter D (= 2R), spatial resolution r and required orientational accuracy. 
The information content of the data set is compromised by noise. Noise is handled by Fung via a Gaussian model for the departures of a vector representing a noisy snapshot from its ideal noisefree position in the pdimensional intensity space. The large number of pixels used as components of a vector representing a snapshot ensures, via the central limit theorem (CLT), that a Gaussian model is appropriate regardless of the specific noise spectrum present in each pixel (see Appendix A). This is important because: (a) no prior knowledge of the noise model is required; and (b) background scattering, which need not be Poisson in nature, can be dealt with (Schwander et al., 2010). The use of a Gaussian noise model imposes no restrictions or additional requirements on Fung. LE, at least in its present form, explicitly relies on a Poisson noise model. As pointed out by LE, it remains to be established whether this is sufficient to deal with situations where other types of noise also play a role (Loh & Elser, 2009).
To link the orientations { x_{k}} to intensity space, both approaches use Bayesian inference and iterative likelihood maximization. Given a pair of events A and B with marginal probabilities P(A) and P(B), Bayes' theorem links their conditional probabilities via the expression
This is used to link the space of orientations with the space of observed diffraction snapshots. Starting with an initial guess for the nonlinear map, the likelihood of the observed data, given the model snapshots , is
where and represent the actual and model snapshots, and the indices n and k run over the set of N diffraction patterns and K orientations, respectively. The probability is determined by the noise model, and p(x_{k}) is the prior probability of the orientation x_{k}, which is 1/K when all orientations are equally likely.
Both LE and Fung maximize the loglikelihood iteratively by the well known EM algorithm (Dempster et al., 1977). Each iteration modifies the model snapshots, effectively moving the manifold defined by them closer to the experimental data. There is no guarantee that the final solution is a global maximum.
Once the mapping corresponding to maximum likelihood has been determined, the orientation of each measured diffraction pattern is taken to correspond to that x_{k} which maximizes the probability of `belonging' to . Thus we choose the orientation x_{k} which maximizes the probability . The conditional probability is determined using equation (5).
Having assigned the N diffraction snapshots to the K orientational bins, the diffraction volume can be reconstructed. In standard `classification and averaging', diffraction patterns assigned to the same orientation x_{k} are averaged so that there is one representative diffraction pattern for each x_{k}. Socalled generative models such as that used by Fung allow one to construct (`generate') model snapshots for each orientation directly from the manifold. As the manifold represents the information content of the entire data set, the generative approach offers significantly greater noise reduction than classification and averaging, which relies on the information in the neighborhood of a given orientation only (Schwander et al., 2010; Giannakis et al., 2010).
Each averaged, or alternatively, each generated snapshot is placed in reciprocal space according to its orientation, resulting in a set of irregularly spaced points in reciprocal space. These are interpolated onto a Cartesian grid so as to allow fast Fourier transformation during iterative phasing (Gerchberg & Saxton, 1972; Schwander et al., 2010; Feinup, 1978).
The only substantive difference between the GTM and the EMC algorithms is the way in which the manifold embedding process is introduced, more specifically, the way the model diffraction patterns are evolved so as to maximize the likelihood. In principle, one would modify the model diffraction patterns along the steepest ascent in loglikelihood, until the derivative with respect to changes in the model diffraction patterns is zero. However, this approach is too simple to be of use in practice. Suppose we have found the map y such that the likelihood L is maximized, and suppose we now exchange a pair of model images assigned to x_{1} and x_{2}, viz. . This simply switches the order of the first two terms in the sum over k in equation (6), leaving the likelihood unchanged. By the same reasoning, we are able to permute the images assigned to the x_{k} arbitrarily without changing the likelihood L. This means that likelihood maximization alone is unable to find a unique solution, and is, for example, unable to distinguish between the two very different neighborhood assignments shown in Fig. 2.
 Figure 2 The two different neighborhood assignments indicated by the black lines have the same likelihood. Assignment A, which `connects' neighbors, is clearly preferred to assignment B. An additional `contiguity constraint' is required to distinguish between these two assignments. The circle perimeters represent the `true' data manifold, the red dots represent the model images and the black lines represent the neighborhood assignments. 
In order to eliminate this problem, both the GTM and EMC algorithms place a `contiguity constraint' on the map y. This constraint demands that two nodes which are close to each other in the space of orientations be mapped to points close to each other in data space. Fung and LE impose this contiguity constraint differently. In the GTM approach used by Fung, the map is expanded in terms of a set of basis functions:
where is one of M basis functions ( the number of independent orientations K) and represent the expansion coefficients (weights).
Likelihood maximization proceeds by adjusting the M sets of p coefficients. The basis functions are chosen so as to vary slowly with x. In the current implementation of GTM, they are Gaussians (Bishop et al., 1998). The map in equation (7) varies slowly, provided the weights are small. This is achieved by imposing a zerocentered Gaussian distribution on the sum of the squares of the weights. This strategy helps ensure that, topologically, the neighborhood assignments in manifest (intensity) space reflect the neighborhood assignments in latent (orientation) space, i.e. is close to when x_{k} is close to x_{k'}.
The EMC algorithm of LE, in contrast, uses the model diffraction patterns themselves (rather than the weights ) as fitting parameters. After each expectationmaximization step, a socalled `compression' step inserts the model diffraction patterns into reciprocal space according to their orientations, and the resulting irregularly spaced points are interpolated onto a uniform grid to determine a new diffraction volume by local averaging. Next, an `expansion' step uses the new diffraction volume as the source for a fresh set of model diffraction snapshots by interpolating back onto the irregularly spaced points corresponding to the pixels of each of the model diffraction patterns. In this approach, both the compression and expansion steps act as lowpass filters; replacing two diffraction patterns by their average and then deducing two diffraction patterns from the average removes sharp variations between diffraction patterns mapped to similar points in reciprocal space. In essence, the socalled compressionexpansion cycle is an alternative implementation of the contiguity constraint, whereby neighboring orientations in latent space give rise to neighboring points in manifest intensity space.
The apparently different introductions of the contiguity constraint described above belie the fundamental similarity of the two approaches even in this step. As shown in Appendix B, in the limit of zero weightregularization parameter in Fung and no compressionexpansion in LE, the two approaches reduce to the same algorithm.
The fundamental similarities between the two approaches result in similar scaling in computational behavior. In brief, the computational demands rise as E^{n}, where E = (D / r )^{s} is the number of resolution elements, D and r the object diameter and spatial resolution, respectively, and s the number of orientational degrees of freedom. Typically, , i.e. the computational cost scales as the sixth to ninth power of (D / r) (Fung et al., 2009), severely limiting the achievable resolution and/or amenable object size. Significant improvements in this behavior are essential, with the most obvious route involving more efficient implementation and parallelization (Fung et al., 2009; Loh & Elser, 2009). Fundamentally, however, the high computational cost of Bayesian approaches stems from their generality. It has long been known that the most general algorithms are the most inefficient and the way to improve this involves introducing problemspecific constraints (Le Cun et al., 1990; Schwander et al., 2010). This is the basis of a new generation of more efficient algorithms, which directly incorporate the physics of scattering (Giannakis et al., 2010).
Bayesian approaches are capable of orienting snapshots containing as few as 100 scattered photons (~10^{2} photons per pixel). The present paper establishes that two apparently different Bayesian approaches to orienting diffraction snapshots are the same in all essential features. The elucidation of these features can guide the development of computationally more efficient algorithms, which are needed if the large and more complex data sets anticipated from ongoing experiments are to be successfully analyzed. The remarkable capability of the Fung and LE approaches to operate at extremely low signals stems not from algorithmic details, but from the realization that much of the information about a given snapshot resides not in the snapshot itself, but in the other snapshots in the data set, and the entire information content is needed to orient each snapshot at low signal.
The fact that the orientations deduced by GTM agree closely with the correct values for a wide variety of applications, including the case when the noise is strongly Poisson distributed, indicates that the Gaussian model is adequate, at least for the instances we have so far considered. Below we offer a mathematical justification for this observation.
In Svensén's nomenclature (Svensén, 1998) GTM maximizes the likelihood function:
where the indices k and n represent the latent space nodes and the data vectors, respectively. Equation (8) can be written as follows:
The parameter is a mean, representing the probability of a data vector belonging to a node, averaged over the K nodes of the manifold. The key point is that the parameters determining the likelihood, and hence the outcome of GTM, are a set of N means. By the central limit theorem (CLT), for sufficiently large N, the distribution of is normal, irrespective of the distributions describing p_{nk}. As in our case, this is easily satisfied. The normal distribution describing has mean and variance independent of the functional form assumed for p_{nk}.
With and as fitting parameters, GTM uses the functional form
to fit the data vector cloud. In essence, this is an attempt to describe the data cloud as a sum of Gaussians. As the latter form a complete set, this is permissible, although it may not be the most efficient representation when the noise is Poisson. The CLT is, of course, valid regardless of the representation used to describe the data, and the parameters of the final normal distribution describing are independent of this choice.
For GTM, the equation obtained from setting to zero the derivatives of the likelihood with respect to the model parameters is
where the matrix G is a K × K diagonal matrix with entries given by .
In LE, the model parameters are the pixel intensities themselves, so is the identity matrix and W = Y. There is no weight regularization in the EMC algorithm, i.e. = 0. Therefore, equation (11) reduces to:
This is to be compared with equation (11) of LE, which, translated into the same notation as equation (12) above, becomes . From the definition of the matrix G, it is clear that the LE update rule is given by , which is equivalent to equation (12) above.
We acknowledge valuable discussions with Russell Fung, Dilano Saldin, Peter Schwander, Pierre Thibault and Chun Hong Yoon. This work was partially supported by award No. DESC0002164 from the Office of Basic Energy Sciences of the US Department of Energy.
Bishop, C. M., Svensen, M. & Williams, C. K. I. (1998). Neural Comput. 10, 215234.
Coxeter, H. S. M. (1973). Regular Polytopes. New York: Dover.
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). J. R. Stat. Soc. B, 39, 138.
Elser, V. (2009). IEEE Trans. Inf. Theory, 55, 47154722.
Feinup, J. R. (1978). Opt. Lett. 3, 2729.
Fung, R., Harder, R. & Ourmazd, A. (2010). Unpublished.
Fung, R., Shneerson, V., Saldin, D. K. & Ourmazd, A. (2009). Nat. Phys. 5, 6467.
Gaffney, K. J. & Chapman, H. N. (2007). Science, 316, 14441448.
Gerchberg, R. W. & Saxton, W. O. (1972). Optik, 35, 237246.
Giannakis, D., Schwander, P., Yoon, C. H. & Ourmazd, A. (2010). http://arxiv.org/abs/1009.5035 .
Kuipers, J. B. (2002). Quaternions and Rotation Sequences, 5th ed. Princeton, NJ: Princeton University Press.
Le Cun, Y., Denker, J. S., Solla, S. A., Jackel, L. D. & Howard, R. E. (1990). Advances in Neural Information Processing Systems, Vol. 2, pp. 598605. Denver: Morgan Kaufmann Publishers Inc.
Leschziner, A. E. & Nogales, E. (2007). Annu. Rev. Biophys. Biomol. Struct. 36, 20.
Loh, N. D. et al. (2010). Phys. Rev. Lett. 104, 225501.
Loh, N.T. D. & Elser, V. (2009). Phys. Rev. E, 80, 026705.
Lovisolo, L. & da Silva, E. A. B. (2001). IEE Proc. Vis. Image Signal Process. 148, 187193.
Miao, J., Hodgson, K. O. & Sayre, D. (2001). Proc. Natl Acad. Sci. USA, 98, 66416645.
Neutze, R., Huldt, G., Hajdu, J. & van der Spoel, D. (2004). Radiat. Phys. Chem. 71, 905916.
Schwander, P., Fung, R., Phillips, G. N. & Ourmazd, A. (2010). New J. Phys. 12, 115.
Shneerson, V. L., Ourmazd, A. & Saldin, D. K. (2008). Acta Cryst. A64, 303315.
Solem, J. C. & Baldwin, G. C. (1982). Science, 218, 229235.
Svensén, J. F. M. (1998). PhD thesis, Aston University. http://eprints.aston.ac.uk/1245/1/NCRG_98_024.pdf .