Bayesian perspective for orientation determination in cryo-EM with application to structural heterogeneity analysis

Xu, S.; Balanov, A.; Singer, A.; Bendory, T.

doi:10.1107/S2059798326001415

research papers

STRUCTURAL
BIOLOGY

ISSN: 2059-7983

Volume 82| Part 4| April 2026| Pages 274-294

https://doi.org/10.1107/S2059798326001415

Open

access

Bayesian perspective for orientation determination in cryo-EM with application to structural heterogeneity analysis

Sheng Xu,^a ^*‡ Amnon Balanov,^b ‡ Amit Singer ^a,^c and Tamir Bendory ^b

^aProgram in Applied and Computational Mathematics, Princeton University, Fine Hall, Washington Road, Princeton, NJ 08544, USA, ^bSchool of Electrical and Computer Engineering, Tel Aviv University, 69978 Tel Aviv, Israel, and ^cDepartment of Mathematics, Princeton University, Fine Hall, Washington Road, Princeton, NJ 08544, USA
^*Correspondence e-mail: [email protected]

Edited by A. Bartesaghi, Duke University, USA (Received 13 October 2025; accepted 10 February 2026; online 16 March 2026)

This article is part of a focused issue on image-processing methods for electron microscopy of biological specimens.

Accurate orientation estimation is a crucial component of 3D molecular structure reconstruction, both in single-particle cryo-electron microscopy (cryo-EM) and in the increasingly popular field of cryo-electron tomography (cryo-ET). The dominant approach, which involves searching for the orientation that maximizes cross-correlation relative to given templates, is suboptimal, particularly under low signal-to-noise conditions. In this work, we propose a Bayesian framework for more accurate and flexible orientation estimation, with the minimum mean-square error (MMSE) estimator serving as a key example. Through simulations, we demonstrate that the MMSE estimator consistently outperforms the cross-correlation-based method, especially in challenging low signal-to-noise scenarios, and we provide a theoretical framework that supports these improvements. When incorporated into iterative refinement algorithms in the 3D reconstruction pipeline, the MMSE estimator markedly improves reconstruction accuracy, reduces model bias and enhances robustness to the `Einstein from Noise' artifact. Crucially, we demonstrate that orientation-estimation accuracy has a decisive effect on downstream structural heterogeneity analysis. In particular, integrating the MMSE-based pose estimator into frameworks for continuous heterogeneity recovery yields accuracy improvements approaching those obtained with ground-truth poses, establishing MMSE-based pose estimation as a key enabler of high-fidelity conformational landscape reconstruction. These findings indicate that the proposed Bayesian framework could substantially advance cryo-EM and cryo-ET by enhancing the accuracy, robustness and reliability of 3D molecular structure reconstruction, thereby facilitating deeper insights into complex biological systems.

Keywords: orientation estimation; cryo-electron microscopy; Bayesian inference; MMSE estimator; structural heterogeneity.

1. Introduction

Determining the precise three-dimensional (3D) orientation of biological molecules from their noisy two-dimensional (2D) projection images is a fundamental challenge in cryo-electron microscopy (cryo-EM; Bai et al., 2015 ; Lyumkis, 2019 ; Bendory et al., 2020 ). This process, known as orientation estimation, is crucial for various cryo-EM applications, including 3D reconstruction algorithms (Scheres, 2012b ; Punjani et al., 2017 ), heterogeneity analysis (Sorzano et al., 2019 ; Toader et al., 2023 ; Donnat et al., 2022 ; Zhong et al., 2021 ) and beyond (Mäeots & Enchev, 2022 ). For example, Fig. 1(a) illustrates the role of orientation estimation within the cryo-EM 3D reconstruction workflow.

Figure 1
Comparison between maximum cross-correlation ( $[{\hat g}_{\rm {MLE}}]$ ) and Bayesian ( $[{\hat g}_{\rm {MMSE}}]$ ) orientation estimators in cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET) applications. The figure illustrates the general workflow in cryo-EM and cryo-ET techniques, highlighting the role of orientation estimation in each technique. (a) illustrates the model with 2D projections (single-particle cryo-EM model; equation 1

), while (b) shows the model of subtomogram averaging in cryo-ET (equation 2

). (a) Cryo-EM involves imaging macromolecules embedded in a thin layer of vitreous ice using an electron beam in a transmission electron microscope (TEM). The process generates 2D projection images (micrographs) of particles in unknown 3D orientations. These 2D particles are then identified and extracted from the micrographs, forming the basis for subsequent steps of the macromolecule's 3D structure reconstruction. (b) Cryo-ET involves imaging a sample from multiple known tilt angles (typically from −60° to + 60°) to create 2D projections, which are then combined computationally to reconstruct 3D subtomograms. In this context, a subtomogram refers to a small volume containing an individual 3D particle. The subtomograms are extracted by a particle-picker algorithm for further analysis. In both (a) and (b) the rotation-estimation problem involves determining the relative orientation of a noisy 2D particle (in cryo-EM) or a noisy 3D subtomogram (in cryo-ET) relative to a reference volume V. The reference volume structure used in both setups is identical and corresponds to the 80S ribosome (Wong et al., 2014

). Under high SNR conditions, both rotation estimators closely approximate the true relative rotation. However, as the SNR decreases the estimation accuracy deteriorates. Importantly, across all SNR levels, the geodesic angular distance between the MMSE orientation estimator and the true rotation consistently remains lower than that of the MLE orientation estimator. For (a) the estimation was conducted using a grid size of L = 3000 samples of the rotation group $[{\sf{SO}}(3)]$ , while for (b) a grid size of L = 300 was used. Each point in the two curve plots represents the average error computed over 3000 trials.

In cryo-electron tomography (cryo-ET), orientation estimation presents an additional challenge in the form of subtomogram averaging. Notably, cryo-ET suffers from higher noise levels compared with single-particle cryo-EM due to the complex and heterogeneous nature of cellular samples and the challenges of capturing data from multiple angles within thicker specimens. Subtomogram averaging offers an effective approach to enhance the signal-to-noise ratio (SNR), ultimately resulting in the reconstruction of high-resolution structures. This technique often involves extracting multiple similar subtomograms containing the target protein complex or macromolecule from a large cryo-electron tomogram reconstructed from all available tilts (typically from −60° to +60°), followed by aligning and averaging them (Zhang, 2019 ; Watson & Bartesaghi, 2024 ). Unlike traditional cryo-EM, this process typically aligns 3D structures directly, without the direct use of 2D projections (for an illustration, see Fig. 1b).

Mathematically, the orientation-estimation tasks in cryo-EM and cryo-ET are slightly different. In the process of single-particle cryo-EM, which involves 2D tomographic projections, the mathematical model can be formulated as

$[y = \Pi(g\circ V)+\varepsilon,\eqno(1)]$

where $[y:{\bb R}^{2}\to{\bb R}]$ is the observed 2D projection image, $[V:{\bb R}^{3}\to{\bb R}]$ is the underlying 3D molecular structure, Π is the tomographic projection operator, g is the unknown 3D rotation operator of interest, ɛ represents measurement noise and g ○ V(x) ≡ V(g⁻¹x), representing a rotation acting on a volume V with 3D coordinate x. Analogously, the mathematical model in cryo-ET subtomogram averaging, which aligns directly with the 3D structure, can be represented by

$[y = g\circ V+\varepsilon,\eqno(2)]$

where $[y:{\bb R}^{3}\to{\bb R}]$ is the observed 3D subtomogram, $[V:{\bb R}^{3}\to{\bb R}]$ is the underlying 3D molecular structure, g is the unknown 3D rotation operator of interest and ɛ represents measurement noise. Then, the goal of orientation estimation is to find the `best' 3D rotation g based on the 2D projection image (in the cryo-EM case; equation 1) or the 3D subtomogram (in the cryo-ET case; equation 2) with respect to the 3D reference V. Namely, we aim to estimate the rotation g given the sample y and the 3D structure V.

The actual mathematical models used in cryo-EM and cryo-ET are more intricate than those presented in equations (1) and (2), incorporating additional factors such as the contrast transfer function (CTF) and in-plane translations, as elaborated in Appendix A. Although the framework introduced in this work is capable of addressing the full cryo-EM model, we adopt the simplified models in equations (1) and (2) to more effectively communicate the primary insights.

1.1. The gap

The common approach to estimating the rotation of an observation in the models above involves scanning through a pre-defined set of possible rotations and selecting the one that either maximizes the correlation or minimizes the distance to the given reference (2D projection image or 3D molecular structure, depending on the application). Typically, these metrics involve weighted correlations and distances, where the weights account for the noise characteristics (Scheres, 2012b).

From an estimation theory perspective, this process corresponds to the maximum-likelihood estimator (MLE), which does not incorporate prior information about the rotation distribution, i.e. the distribution of the 3D rotation g appearing in models (1) or (2). A natural extension is the maximum a posteriori (MAP) estimator, which combines the data likelihood with a prior distribution on rotations. Notably, when a uniform prior is assumed, indicating that all orientations are equally likely a priori, the MAP estimator reduces to the MLE estimator (Kay, 1993 ). A more rigorous treatment can be found in Section 2.3.

However, Bayesian theory provides a much deeper and richer statistical framework that leads to improvement: replacing the MLE or MAP estimator with the Bayes estimator. The full potential of this estimator, which provides optimal accuracy according to a user-defined loss function and allows the integratation of prior knowledge about the rotation distribution, has remained untapped so far.

The term rotation distribution, used interchangeably with orientation distribution in this paper, refers to the probabilistic law governing how orientations are distributed over all possible rotations in 3D space. In cryo-EM, this concept is critical because molecules are first suspended in a thin aqueous solution and subsequently vitrified into amorphous ice, at which point their orientations become fixed but remain uncontrolled. Ideally, all orientations would be equally likely, corresponding to a uniform distribution. In practice, however, molecules often adopt preferred orientations, typically due to interactions with the air–water interface or other sample-preparation factors (Tan et al., 2017 ; Lyumkis, 2019; Li et al., 2021 ). As a result, certain orientations are disproportionately represented, introducing systematic biases that can degrade reconstruction quality and complicate orientation estimation. Accounting for these deviations from uniformity is therefore essential for both methodological development and the interpretation of cryo-EM data.

Currently, the most practical way to obtain orientation estimates is through modules of reconstruction software such as RELION (Scheres, 2012b) and cryoSPARC (Punjani et al., 2017). These packages are primarily designed for 3D structure reconstruction and therefore treat pose estimation as a latent-variable subproblem. In particular, during refinement they compute posterior weights over a discrete set of candidate poses for each particle and use these weights for weighted averaging in the reconstruction step (a soft assignment over poses). At the same time, for downstream usage and reporting, the metadata typically provide a single orientation per particle, which is commonly the MLE estimator; that is, the maximizer of the pose posterior. Thus, while these pipelines do incorporate latent-variable pose weighting internally for reconstruction, they do not typically output a Bayesian rotation estimate for each particle.

In practice, researchers frequently use the reported per-particle orientation point estimates for downstream tasks such as heterogeneity analysis (Gilles & Singer, 2025 ; Zhong et al., 2021) and structural validation (Rosenthal & Henderson, 2003 ). However, relying solely on MLE-based estimates can be suboptimal, especially in low-SNR regimes where MLE estimators are known to perform poorly. This work therefore explores opportunities to improve per-particle orientation estimation by using Bayes estimators that explicitly average over pose uncertainty and incorporate prior knowledge of the rotation distribution.

1.2. The Bayesian framework

The Bayesian framework has become a powerful and widely adopted tool in cryo-EM, now recognized as the leading method for recovering 3D molecular structures (Scheres, 2012b; Punjani et al., 2017; Toader et al., 2023; Gilles & Singer, 2025). It effectively addresses challenges such as overfitting and parameter tuning while enhancing interpretability (Scheres, 2012a ). By explicitly modeling uncertainties, Bayesian methods enable more accurate and robust reconstructions of molecular structures, driving significant advances in both resolution and structural flexibility. While the majority of these methods focus on the task of structure reconstruction (for example, see Section 4) and aim to achieve the MAP estimator for the volume structure (Scheres, 2012b; Punjani et al., 2017), the Bayesian framework offers broader possibilities for addressing other problems.

This framework enables the use of any loss function over the rotational group tailored to users' requirements and accommodates a broad range of prior distributions beyond the uniform distribution. While the Bayesian framework can be adapted to different loss functions, this work focuses on the mean-squared error loss, which is equivalent to the chordal distance between 3D rotations. The primary reason for using this loss function is that its corresponding Bayes estimator has a closed-form analytical solution, making it computationally efficient and easy to interpret. We denote this estimator as $[{\hat g}_{\rm{ MMSE}}]$ , where MMSE stands for minimum mean-square error. It is worth noting that for any given loss function and prior distribution the Bayes estimator is optimal among all possible estimators, in the sense that it minimizes the posterior expected loss. Moreover, the computational cost of calculating our $[{\hat g}_{\rm {MMSE}]$ estimator is on the same scale as that of the commonly used MLE and MAP orientation estimators (see Section 3 for more details).

1.3. Applications

Orientation estimation is not only integral to 3D structure reconstruction but also underpins a range of downstream tasks. Below, we highlight several representative applications that demonstrate its broader methodological significance.

One prominent example is heterogeneity analysis, where the goal is to capture structural variability that may arise from differences in composition, discrete states or continuous conformational changes. A substantial portion of state-of-the-art methods (Gilles & Singer, 2025; Zhong et al., 2021; Levy et al., 2025 ; Luo et al., 2023 ; Punjani et al., 2017; Punjani & Fleet, 2021 ) make the simplifying assumption that particle poses are already known and fixed, typically obtained from a consensus refinement procedure; these pose estimates then form the basis for subsequent modeling of structural variability. In this work, we consider continuous conformational heterogeneity and demonstrate in Section 5 that the quality of orientation estimation has a direct impact on the fidelity of the recovered conformational landscapes.

Another important application is 3D volume alignment. Unlike the setting of equation (2), where an observation is aligned to a known reference volume, this task involves aligning two noisy volumes of the same structure with unknown relative rotations. This scenario arises, for instance, in the computation of the Fourier shell correlation (FSC) curve, a standard tool for estimating the spectral signal-to-noise ratio (SSNR) and resolution. As shown in Appendix B, this problem can be expressed within the same statistical framework as equation (2), differing only by an additional noise term, which highlights its close methodological connection to orientation estimation.

A further noteworthy application concerns validation. A classic example is the method of tilt pairs introduced by Rosenthal & Henderson (2003), in which a small subset of images (for example around ten) is collected at known relative tilt angles. While the absolute particle orientations remain unknown, the relative orientations between tilt pairs are specified by the experimental geometry. The degree of agreement between estimated and known relative rotations then provides a direct validation of orientation assignment accuracy.

These examples are by no means exhaustive, but they underscore that orientation estimation is not merely a technical nuisance in reconstruction workflows, but rather a key methodological component that enables diverse downstream analyses and validation strategies across cryo-EM and cryo-ET.

1.4. Overview of results and contributions

In this work, we introduce a versatile Bayesian framework for orientation estimation with strong statistical guarantees and high flexibility. Section 2 formulates the problem and develops the Bayesian MMSE estimator. We show theoretically that the MMSE estimator coincides with the MLE estimator in high-SNR regimes (Proposition 1), while consistently outperforming it under low-SNR conditions (Fig. 1), a typical scenario in cryo-EM and cryo-ET (Bendory et al., 2020).

Section 3 evaluates the MMSE orientation estimator through simulations. We investigate the effect of different prior distributions and the discretization resolution L of the rotation group $[{\sf {SO}}(3)]$ . Results indicate that non-uniform priors substantially improve estimation accuracy, underscoring the value of incorporating prior knowledge (Fig. 2). We further show that in high-SNR regimes the estimation error scales as L^1/3 (Fig. 3), highlighting discretization as the dominant error source, whereas in low-SNR regimes noise dominates.

In Section 4, we connect the MMSE estimator to 2D image recovery and 3D structure reconstruction, showing that it naturally leads to the expectation–maximization (EM) framework for models without projections (Proposition 3). Empirical results confirm that the MMSE estimator consistently outperforms the MLE and MAP estimators in both tasks, providing higher accuracy and greater robustness against the `Einstein from Noise' artifact (Balanov et al., 2024 ; Figs. 4 and 5).

Finally, Section 5 integrates MMSE-based pose estimation into structural heterogeneity analysis. Following the fixed-pose framework of RECOVAR, we show that the MMSE pose estimator consistently improves the recovery of conformational variability compared with the MLE counterpart. Section 6 concludes with a discussion of further applications of the Bayesian approach to orientation estimation and future research directions.

1.5. Main takeaways

Fig. 1 illustrates the critical role of orientation estimation in the cryo-EM and cryo-ET reconstruction processes, showcasing the superior performance of the MMSE orientation estimator, $[{\hat g}_{\rm {MMSE}}]$ , compared with the MLE orientation estimator, $[{\hat g}_{\rm {MLE}}]$ . Specifically, the curves show that the MMSE estimator consistently produces more accurate estimates than the MLE estimator, with the performance gap widening as the SNR decreases. Moreover, incorporating prior knowledge of rotation distribution into our MMSE estimator allows even better performance (see Fig. 2). This performance gap becomes even more pronounced when the orientation estimators are incorporated into a reconstruction algorithm (see Section 4).

Figure 2
Impact of incorporating the prior rotation distribution on estimation accuracy. Simulations are performed for the cryo-ET model (equation 2

), excluding the projection step. The true rotation distribution is modeled as an isotropic Gaussian $[{\cal IG}_{{\sf{SO}}(3)}(\eta = 0.1)]$ . Estimation performance is measured using the geodesic distance defined in equation (10)

. Here, g denotes the true rotation, $[\hat{g}_{\rm {MLE}}]$ is the maximum-likelihood estimator from equation (24)

, $[\hat{g}_{\rm {MAP}}]$ is the maximum a posteriori estimator from equation (13)

and $[{\hat g}_{\rm {MMSE}}]$ denotes the Bayesian minimum mean-square error estimator from equation (21)

. The MMSE estimators are computed assuming isotropic Gaussian priors on $[{\sf{SO}}(3)]$ with different concentration parameters η ∈ {0.5, 0.1} (see Appendix E

). As η decreases, the prior becomes more concentrated and closer to the true underlying distribution, leading to improved accuracy of both the MAP and MMSE estimators. Each data point in the plot is averaged over 3000 Monte Carlo trials using a rotation grid of size L = 2976. Bottom 2 × 2 panel: the four images compare denoised 3D volumes obtained using different rotation estimators at two representative noise regimes marked on the curve plot. Rows correspond to the estimator: the top row uses $[{\hat g}_{\rm {MLE}}]$ and the bottom row uses $[{\hat g}_{\rm {MMSE}}]$ with the true prior $[{\cal IG}_{{\sf{SO}}(3)}(\eta = 0.1)]$ . Columns correspond to SNR: the left column is a low-SNR example and the right column is a high-SNR example (where the true rotation is correctly classified).

Beyond reconstruction accuracy, we demonstrate that orientation-estimation quality has a direct and substantial impact on downstream structural heterogeneity analysis. Understanding structural variability is essential for characterizing the dynamic behavior of macromolecular complexes. Methods such as RECOVAR (Gilles & Singer, 2025), designed to recover structural heterogeneity, typically assume known particle poses (i.e. fixed-pose methods). In reality these poses must be inferred, and MLE-derived estimates are commonly used in practice. We show that replacing MLE-derived poses with MMSE estimates in RECOVAR substantially improves the recovery of the latent conformational manifold, bringing results closer to those obtained using ground-truth poses. These findings establish MMSE orientation estimation not only as a tool that enhances reconstruction fidelity, but also as a facilitator that advances state-of-the-art continuous heterogeneity analysis.

The main takeaway of this paper is the recommendation to adopt the Bayesian MMSE orientation estimator for determining the orientation of each observation, in place of the commonly used MLE estimator. The MMSE estimator demonstrates superior performance even under a uniform rotation distribution and offers further improvements when incorporating prior knowledge of the underlying rotation distribution. Importantly, while current software packages do not directly compute the MMSE estimator, they already calculate all of the necessary components required for its implementation. Therefore, adopting the MMSE approach can be achieved with minimal additional computational cost. The detailed implementation and code are available at https://github.com/AmnonBa/bayesian-orientation-estimation.

2. Problem formulation and the MMSE orientation estimator

In this section, we present a particular Bayes estimator for orientation determination under a squared-error loss, commonly known as the minimum mean-square error (MMSE) estimator. The name reflects that this estimator minimizes the expected posterior mean-squared error, and it is given by the posterior (conditional) mean rather than a pointwise maximization (Kay, 1993).

We begin by introducing a flexible mathematical model that encompasses various typical applications involving orientation estimation. Following this, we present a couple of metrics designed to assess the quality of our estimators, providing a robust framework for evaluating performance. Finally, we introduce the class of Bayes estimators, with the MMSE estimator serving as a primary example.

Throughout this paper, we use g to denote both the rotation operator and its corresponding matrix representation. For instance, g can be represented by a three by three rotation matrix in 3D. The intended meaning will be clear from the context, and this slight abuse of notation should not cause confusion.

2.1. Mathematical model for orientation estimation

We consider a unified framework for modeling measurement processes, encompassing problems such as 2D template matching, orientation estimation in cryo-EM and subtomogram averaging in cryo-ET. To better focus on the core aspects of our methodology, we omit certain physical effects, such as CTF and in-plane shifts, in this formulation. A complete mathematical model of the cryo-EM imaging process, incorporating these effects, is provided in Appendix A.

We begin with a continuous-domain formulation. Let $[V(x):{\bb R}^{n}\to{\bb R}]$ denote a reference structure, where n = 2 for 2D images and n = 3 for 3D volumes. For example, when n = 3, V corresponds to the continuous 3D electron density, as commonly used in cryo-EM or cryo-ET. Let $[{\sf G}\subset{\sf {SO}}(n)]$ be a compact group of rotations correspondingly, and let $[g\in{\sf G}]$ be an unknown transformation drawn from a distribution Λ over $[{\sf G}]$ .

We denote the measurement as a function $[y(x^{\prime}):{\bb R}^{m}\to{\bb R}]$ , where m ≤ n accounts for possible dimension reduction due to projection (e.g. m = 2 in cryo-EM). Then, the continuous measurement model is given by

$[y(x^{\prime}) = \Pi[(g\circ V)(x)]+\varepsilon(x^{\prime}),\qquad x \in{\bb R}^{n}\,\,{\rm and}\,\, x^{\prime}\in{\bb R}^{m},]$

where (g ○ V)(x) := V(g⁻¹x) is the rotated structure, Π is a known linear operator (such as a tomographic projection or identity) and ɛ denotes additive noise in the measurement domain.

In practice, measurements are only available in discretized form due to finite resolution. In the discrete setting, we assume that the operator Π also incorporates sampling onto a grid of size d, i.e. it includes both projection (if applicable) and discretization. The resulting discrete measurement model becomes

$[y_{i} = \Pi(g_{i}\circ V)+\varepsilon_{i},\eqno(3)]$

where $[y_{i}\in{\bb R}^{d}]$ and $[\varepsilon_{i}\sim{\cal N}(0,\Sigma)]$ is Gaussian noise with a d × d positive-definite covariance matrix Σ. To simplify the notation, we omit Π when it solely represents the discretization sampling operator and use the same symbol y to refer to the discretized measurement throughout the paper, unless stated otherwise.

2.1.1. Applications of the model

We present three typical examples of this model. In all cases, the goal is to estimate g given the sample y, the structure V and the covariance matrix Σ.

2D template matching. In this case, Π is solely the discretization sampling operator, d = N × N, where N is the grid size of the 2D image, $[g\in{\sf G} = {\sf {SO}}(2)]$ is a 2D in-plane rotation and V is a given 2D template image.
Rotation estimation in cryo-EM. Here, we consider a special case of equation (3) where Π comprises both the sampling and tomographic projection operators, d = N × N, where N is the grid size of 2D projection images, $[g\in{\sf G} = {\sf {SO}}(3)]$ is a 3D rotation and V is a given 3D volume representing a known reference 3D structure or a well grounded structure from prior data analyses.
3D structure alignment in cryo-ET. In this scenario, we consider a special case of equation (3) where Π is solely the discretization sampling operator, d corresponds to the total dimension of 3D subtomograms, $[g\in{\sf G} = {\sf {SO}}(3)]$ is a 3D rotation and V is a given 3D volume. Notably, the 3D alignment problem is also a critical step in the computational pipeline of cryo-EM; see, for example, Singer & Yang (2024 ) and Harpaz & Shkolnisky (2023 ).

2.2. Preliminaries

Before introducing the specific estimator for the model specified in equation (3), it is instructive to briefly revisit the general Bayesian framework and the concept of the Bayes estimator. This will provide the necessary foundation for understanding the development and analysis of the proposed MMSE estimator, as well as its statistical properties superior to those of the widely used MLE estimator.

2.2.1. Overview of the Bayesian framework and the Bayes estimators

Suppose that we aim to estimate a true rotation $[g\in{\sf G}]$ drawn from a known prior distribution Λ. Let $[{\hat g}]$ be an estimator of g based on a measurement y and let $[{\cal L}(g,{\hat g})]$ be a loss function.

The Bayes risk of $[{\hat g}]$ is defined as $[{\bb E}_{\Lambda}[{\cal L}(g,{\hat g})]]$ , where the expectation is taken over the data-generation process of y given g and the prior distribution Λ of g. The Bayes estimator with respect to the loss $[{\cal L}]$ (chapter 4, Theorem 1.1 in Lehmann & Casella, 2006 ) is defined as the estimator that minimizes the Bayes risk among all possible estimators, i.e.

$[{\hat g}_{\rm Bayes} = \mathop {\rm argmin} \limits_{\hat g} {\bb E}_{\Lambda}[{\cal L}(g,{\hat g})].\eqno(4)]$

Equivalently, it is the estimator that minimizes the posterior expected loss $[\mathbb{E}_{\Lambda}[{\cal L}\left(g,\hat{g}\right)|y]]$ , where the expectation is taken over the posterior distribution of g given the measurement y, and any other known parameter such as volume V, with the prior Λ. For the case of $[{\sf G} = {\sf{SO}}(3)]$ , it is given explicitly by

$[\eqalignno {\hat{g}_{\rm Bayes} &= \mathop {\rm argmin} \limits_{{\hat g}\in{\sf {SO}}(3)} {\bb E}_{\Lambda}[{\cal L}(g,{\hat g })|y,V] \cr & = \mathop {\rm argmin} \limits_{{\hat g}\in{\sf {SO}}(3)} \textstyle\int\limits_{{\sf {SO}}(3)}{\cal L}(g,{\hat g}){\bb P}_{\Lambda}(g|y,V)\,{\rm d}g,&(5)}]$

where, by Bayes' law, we have

$[{\bb P}_{\Lambda}(g|y,V)\,{\rm d}g = {{{\bb P}(y|g,V)\,{\rm d}\Lambda(g)} \over {\textstyle\int_{\sf G}{\bb P}(y|g,V)\,{\rm d} \Lambda(g)}}.\eqno(6)]$

2.2.2. The posterior distribution of the rotation g given an observation y under model (3)

To introduce the MMSE estimator corresponding to the model (3), we first compute the posterior distribution of g given y and all the additional parameters V, and Σ. We obtain the conditional likelihood density

$[\eqalignno {{\bb P}(y|g,V) & = {{1} \over {(2\pi)^{d/2}{\rm det}({\Sigma})^{1/2}}} \cr &\ \quad {\times}\ \exp\textstyle\{-{{1} \over {2}}[y-\Pi(g\circ V)]^{\sf T}\Sigma^{-1}[y-\Pi (g\circ V)]\}.\cr &&(7)}]$

Note that g follows the underlying prior distribution Λ. Applying Bayes' law, the posterior distribution of g given y is

$[\eqalignno {{\bb{P}}_{\Lambda}& (g|y,{V})\,{\rm d}g = \cr & {{\exp\{- {{1} \over {2}}[y-\Pi(g\circ{V})]^{\sf{T}}\Sigma^{-1}[y-\Pi(g\circ{V})]\}\, {\rm d}\Lambda(g)} \over {\int_{\sf{G}}\exp\{-{{1} \over {2}}[y-\Pi(g\circ{V})]^{\sf{T}}\Sigma^{-1}[y-\Pi(g\circ{V})]\}\,{\rm d}\Lambda(g)}}.&(8)}]$

2.2.3. Metrics over SO(3)

As the Bayes estimator is closely related to the given loss function $[{\cal L}]$ , we present two candidate metrics on the 3D special orthogonal group $[{\sf {SO}}(3)]$ . Here, we represent any rotation $[g\in{\sf {SO}}(3)]$ in its natural three by three matrix representation.

Chordal distance. For any two rotations $[g_{1},g_{2}\in{\sf {SO}}(3)]$ , the chordal distance is defined as
$[d_{F}(g_{1},g_{2}) \mathop {=}\limits^{\Delta} \|g_{1}-g_{2}\|_{\rm F} = \{ {\rm tr}[(g_{1}-g_{2})^{\sf T}(g_{1}-g_{2})]\}^{1/2},\eqno(9)]$
where || · ||_F represents the matrix Frobenius norm and tr(·) is the trace of a matrix. This metric is easy to compute and analyze; however, it does not take into account the group structure of rotations.
Geodesic distance. For any two rotations $[g_{1},g_{2}\in{\sf {SO}}(3)]$ , the geodesic distance is defined as
$[d_{G}(g_{1},g_{2})\mathop {=}\limits^{\Delta} \arccos \left({{{\rm tr}(g_{2} g_{1}^{-1})-1} \over {2}}\right),\eqno(10)]$
where tr(·) is the trace of a matrix. This metric reflects the shortest path between g₁ and g₂ in the 3D rotation manifold.

We note that the chordal and geodesic distances are closely related. In particular, if $[\theta\mathop {=}\limits^{\Delta} d_{G}(g_{1},g_{2})]$ denotes the geodesic rotation angle between g₁ and g₂ (equation 10), then

$[\eqalignno {d_{\F}(g_{1},g_{2}) &= \|g_{1}-g_{2}\|_{\rm F} = 2\left( {2} \right)^{1/2}\,\sin(\theta/2) \cr &= 2\left( {2} \right)^{1/2}\,\sin[d_{G}(g_{1},g_{2})/2].&(11)}]$

Consequently, d_F is a strictly increasing function of d_G and, for small angular errors, the two are locally equivalent in the sense that d_F(g₁, g₂) ≈ (2)^1/2d_G(g₁, g₂) (see, for example, Hartley et al., 2013 ). More generally, our framework is flexible and can accommodate other loss functions over any group $[{\sf G}]$ . For a comprehensive discussion of metrics on $[{\sf {SO}}(3)]$ and their relationships, we refer the reader to Hartley et al. (2013) and Huynh (2009 ). Similar considerations apply to $[{\sf {SO}}(2)]$ ; we omit a detailed discussion for brevity.

2.3. The MLE, MAP and MMSE orientation estimators

In the following presentation, we focus on independent and identically distributed Gaussian noise with Σ = σ²I_d×d. These noise statistics are commonly employed in modern software tools (Kimanius et al., 2021 ). While the proposed framework can incorporate more advanced noise models, we restrict attention to this case for clarity of presentation and to simplify the theoretical analysis.

2.3.1. The MLE estimator

Recalling the maximum cross-correlation method we mentioned earlier, we now connect it with the mathematical model in equation (3) and introduce the corresponding MLE estimator. The rotation g that minimizes the distance between the corresponding projected rotated volume Π(g ○ V) and the observation y is exactly the MLE estimator defined as

$[\eqalignno {{\hat g}_{\rm {MLE}} & := \mathop {\rm argmax} \limits_{g\in{\sf {SO}}(3)} {\bb P} (y|g, V) \cr & = \mathop {\rm argmax} \limits_{g\in{\sf {SO}}(3)}\exp[-\|y-\Pi (g\circ V)\|^{2}/2\sigma^{2}]\cr & = \mathop {\rm argmin} \limits_{g\in{\sf {SO}}(3)}\|y-\Pi(g\circ V) \|^{2}, &(12)}]$

where the first equality follows from equation (7) and the second equality follows from the assumption Σ = σ²I_d×d. In other words, $[{\hat g}_{\rm {MLE}}]$ maximizes the conditional density (equation 7) over all possible rotations in $[{\sf {SO}}(3)]$ . In the absence of the tomographic projection (for example in cryo-ET), the MLE estimator further simplifies to

$[{\hat g}_{\rm {MLE}} = \mathop {\rm argmax} \limits_{g\in{\sf {SO}}(3)}y^{\sf T}(g\circ V),]$

which corresponds to the rotation g such that the rotated structure maximizes the correlation with y. This estimator is also frequently used in single-particle cryo-EM, under the assumption that the norm of Π(g ○ V) is approximately constant for all g. This estimator can be approximated by performing a search over a pre-defined grid of 3D rotations, selecting the rotation that minimizes the distance between the measurement y and the projected rotated volume Π(g ○ V). This approach forms the basis of standard practices in single-particle cryo-EM and cryo-ET.

2.3.2. The MAP estimator

The MAP estimator extends the MLE estimator by incorporating prior knowledge on the distribution of the rotation in the special group $[{\sf {SO}}(3)]$ , dΛ(g). Formally, the MAP estimator is defined as

$[\eqalignno { {\hat g}_{\rm {MAP}} & = \mathop {\rm argmax} \limits_{g\in{\sf {SO}}(3)}{\bb P}_{\Lambda}(g|y, V)\cr & = \mathop {\rm argmax} \limits_{g\in{\sf {SO}}(3)}{{{\bb P}(y|g,{V})\,{\rm d}\Lambda(g)} \over {\textstyle\int_{\sf G}{\bb P}(y|g, V)\, {\rm d} \Lambda(g)}}\cr &= \mathop {\rm argmax} \limits_{g\in{\sf {SO}}(3)} {\bb P} (y|g,V) \,{\rm d}\Lambda(g) \cr &= \mathop {\rm argmax} \limits_{g\in{\sf {SO}}(3)}\exp[-\|y-\Pi (g\circ V)\|^{2}/2\sigma^{2}]\,{\rm d}\Lambda(g)\cr &= \mathop {\rm argmin} \limits_{g\in{\sf {SO}}(3)}\|y-\Pi(g \circ V) \|^{2}/2\sigma^{2}-\log[{\rm d}\Lambda(g)],&(13)}]$

where the second equality follows from Bayes' law (equation 6) and the third equality from independence of the denominator on g. In other words, $[{\hat g}_{\rm {MAP}}]$ maximizes the posterior density (equation 8) over all possible rotations in $[{\sf {SO}}(3)]$ . It can be seen that the MAP estimator $[{\hat g}_{\rm {MAP}}]$ (equation 13) coincides with the MLE estimator $[{\hat g}_{\rm {MLE}}]$ (equation 12) when the prior distribution dΛ(g) is uniform over $[{\sf {SO}}(3)]$ .

2.3.3. The MMSE estimator

For any rotation distribution Λ and loss function $[{\cal L}(\cdot,\cdot)]$ over $[{\sf {SO}}(3)]$ , the MLE and MAP estimators can be further improved by the corresponding Bayes estimator which minimizes the posterior expected loss $[{\bb E}_{\Lambda}[{\cal L}(g,\cdot)\mid y,V]]$ . In particular, following the definition (equation 5) of the Bayes estimator, and for the chordal distance d_F(·, ·) (equation 9), with its corresponding squared loss $[{\cal L}_{F}(\cdot,\cdot) = d_{F}^{2}(\cdot,\cdot)]$ , the Bayes estimator takes the form

$[{\hat g}_{\rm {MMSE}} := \mathop {\rm argmin} \limits_{\hat{g}\in{\sf{SO}}(3)} {\bb E}_{\Lambda}[d_{F}^{2}(g,{\hat g})| y,{V}],\eqno(14)]$

where the expectation is taken over the posterior density (equation 8). Here, we denote the estimator as $[{\hat g}_{\rm {MMSE}}]$ , since minimizing the squared chordal distance is equivalent to minimizing the entrywise mean-square error between the two 3 × 3 rotation matrices. Alternative distance measures, such as the geodesic distance on $[{\sf {SO}}(3)]$ discussed earlier, could certainly be considered. In this work, however, we focus on the chordal distance, primarily because it admits a particularly convenient derivation of the associated Bayes estimator, as shown below, thereby offering both computational efficiency and interpretability. Importantly, these Bayes estimators are generally distinct from both the MLE and MAP estimators, even under a uniform prior, since it minimizes the posterior expected loss rather than selecting the most probable rotation.

To compute $[{\hat g}_{\rm {MMSE}}]$ , we begin by noting that it is defined over the manifold $[{\sf {SO}}(3)]$ . In general, directly solving the optimization problem on a non-Euclidean manifold is challenging. To address this, we first relax the optimization domain to the ambient Euclidean space $[{\bb R}^{3\times 3}]$ , which leads to the following intermediate estimator:

$[\eqalignno { {\hat g}_{\rm {relax}} & := \mathop {\rm argmin} \limits_{{\hat g}\in{\bb R}^{3\times 3}} {\bb E}_{\Lambda}[d_{F}^{2}(g, {\hat g})|y, V] \cr & = {\bb E}_{\Lambda}[g | y,{V}] \cr & = {{\textstyle\int_{{\sf {SO}}(3)} g\exp \{-{1 \over 2}[y-\Pi(g \circ V)]^{\sf T}\Sigma^{-1}[y-\Pi(g\circ V)]\}\,{\rm d}\Lambda (g)} \over {\textstyle\int_{{\sf {SO}}(3)}\exp\{-{1 \over 2}[y-\Pi(g\circ V)]^{ \sf T}\Sigma^{-1}[y-\Pi(g\circ V)]\}\,{\rm d}\Lambda(g)}} \cr &= {{\textstyle \int_{{\sf {SO}}(3)} g\exp \{-{1 \over {2\sigma^{2}}} \| y - \Pi (g \circ V)\|_{2}^{2}\}\,{\rm d}\Lambda(g)} \over {\textstyle \int_{{\sf {SO}}(3)}\exp\{- {1 \over {2\sigma^{2}}}\| y-\Pi(g\circ V) \|_{2}^{2}\}\,{\rm d}\Lambda(g)}}.&(15)}]$

Here, since we naturally represent $[g\in{\sf {SO}}(3)]$ as a 3 × 3 rotation matrix, the expectation (or integral) can be interpreted directly in this matrix representation, yielding a 3 × 3 matrix as the posterior mean matrix.¹ However, $[{\hat g}_{\rm r{elax}}]$ does not, in general, lie in $[{\sf {SO}}(3)]$ and hence may not be a valid rotation operator. To obtain a feasible solution on the manifold, we subsequently apply the orthogonal Procrustes procedure to project $[{\hat g}_{\rm {relax}}]$ back onto $[{\sf {SO}}(3)]$ , as discussed next.

Given a matrix A, to obtain a valid rotation matrix we seek $[\Omega\in{\sf {SO}}(3)]$ that is closest to A in the Frobenius norm:

$[\mathop {\rm minimize} \limits_{\Omega} \|\Omega-A\|_{\rm F} \quad {\rm subject \,\,to\,\,}\Omega\in{\sf {SO}}(3).\eqno(16)]$

This optimization problem is known as the orthogonal Procrustes problem in the linear algebra literature. It can be solved efficiently using the singular value decomposition (SVD) of A (Gower & Dijksterhuis, 2004 ). Concretely, given a matrix $[A\in{\bb R}^{3\times 3}]$ (in our case $[A = {\hat g}_{\rm {relax}}]$ ), compute its singular value decomposition

$[A = U\Sigma V^{\top},\qquad U,V \in O(3),\quad \Sigma \succeq 0. \eqno(17)]$

The closest rotation matrix in Frobenius norm is then obtained by

$[\Omega = U\,{\rm diag}[1, 1, \det(UV^{\top})] \,V^{\top} \in {\sf {SO}}(3).\eqno(18)]$

Equivalently, if $[\det(UV^{\top}) = -1]$ , one flips the sign of the last column of U (or V) before forming UV^T, ensuring $[\det(\Omega) = 1]$ and yielding the Procrustes solution. Thus, applying the orthogonal Procrustes procedure to $[{\hat g}_{\rm {relax}}]$ projects it back onto $[{\sf {SO}}(3)]$ , yielding a valid rotation matrix. The following proposition establishes that this valid rotation matrix coincides with the MMSE estimator $[{\hat g}_{\rm {MMSE}}]$ , thereby completing the procedure for obtaining the MMSE estimate. A detailed proof is provided in Appendix C.

Proposition 1

The Bayes estimator of the loss function defined by the chordal distance $[{\hat g}_{\rm {MMSE}}]$ (equation 14) is equal to the orthogonal Procrustes solution (equation 16) applied on the intermediate estimator $[{\hat g}_{\rm {relax}}]$ , i.e.

$[{\hat g}_{\rm {MMSE}} = \mathop {\rm argmin}\limits_{\Omega\in {\sf {SO}}(3)}\, \|{\hat g}_{\rm {relax}}-\Omega\|_{\rm F}.\eqno(19)]$

2.3.4. The MMSE, MAP and MLE estimators in the high-SNR regime

We conclude this section by showing the next proposition. It shows that the MMSE and MAP estimators converge in the high-SNR regime (i.e. σ → 0), as demonstrated empirically in Figs. 3, 4 and 5. This implies that the MMSE estimator's superior statistical properties are most advantageous under low-SNR conditions, which are common in structural biology applications such as cryo-EM and cryo-ET (Bendory et al., 2020). In these low-SNR environments, the MMSE estimator consistently outperforms its MAP counterpart.

Figure 3
Impact of the sampling-grid size of $[{\sf{SO}}(3)]$ (L) and the signal-to-noise ratio (SNR) on rotation-estimation accuracy. This figure shows the accuracy of rotation estimation under varying sampling-grid sizes L of the rotation group $[{\sf{SO}}(3)]$ and different SNR levels of the observed data y in the model (3). Simulations are performed for the cryo-ET model (equation 2

), excluding the projection step. The metric used for comparison is the geodesic distance, as defined in equation (10)

. Here, g denotes the true rotation, $[{\hat g}_{\rm {MLE}}]$ represents the MLE estimator from equation (24)

and $[{\hat g}_{\rm {MMSE}}]$ denotes the Bayesian MMSE estimator from equation (21)

. In the high-SNR regime (σ → 0) the MLE and MMSE estimators converge, and the geodesic distance scales empirically as ∝ L^1/3. This scaling reflects the three-parameter nature of $[{\sf{SO}}(3)]$ rotations, where the resolution of the sampling grid improves as L increases. The results shown are based on Monte Carlo simulations with 3000 trials per data point.

Figure 4
Comparison of 2D image recovery using the MMSE and MLE rotation estimators. Iterative image-recovery procedures with the MMSE estimator ( $[{\hat g}_{\rm {MMSE}}]$ ) and MLE estimator ( $[{\hat g}_{\rm {MLE}}]$ ) are defined in equations (31)

and (32)

, respectively. The experiment employs a template image of Einstein and a ground-truth image of Newton, both rotated in 2D over a uniform polar grid with L = 30 samples. Each image is of size 100 × 100 pixels, and the radial direction is discretized using R = 300 points. The reconstructed images within the dark-orange rectangle (right panel) show superior performance with the MMSE rotation estimator, with Pearson cross-correlation (PCC) provided for each reconstructed image. The MLE and MMSE reconstructions are nearly identical at high SNR (σ → 0), as predicted by Proposition 2

. The SNR values used for the panels (from right to left) are 10⁻², 4 × 10⁻³, 2 × 10⁻³, 7 × 10⁻⁴ and 2 × 10⁻⁴. At very low SNR (σ → ∞) the `Einstein from Noise' effect appears, where the estimator resembles the template image of Einstein rather than the underlying truth of Newton. In intermediate SNR ranges, using the MMSE estimator in the iterative step clearly outperforms the MLE estimator.

Figure 5
Comparison of 3D structure reconstruction in cryo-ET subtomogram averaging using the MMSE and MLE rotation estimators. Iterative structure-reconstruction procedures with the MMSE estimator ( $[{\hat g}_{\rm {MMSE}}]$ ) and MLE estimator ( $[{\hat g}_{\rm {MLE}}]$ ) are defined in equations (31)

and (32)

, respectively. At high SNR levels, a low-resolution structure emerges due to finite grid sampling of the rotation group $[{\sf{SO}}(3)]$ , effectively acting as a low-pass filter. The 3D reconstruction using the MMSE estimator consistently outperforms the reconstruction using the MLE estimator. For high-SNR conditions (i.e. σ → 0), both estimators yield similar 3D structures, as expected from Proposition 2

. The SNR values used for the panels (from right to left) are 10⁻², 2 × 10⁻³, 7 × 10⁻⁴ and 2 × 10⁻⁴, with a volume size of 32 × 32 × 32. The boxes highlighted in purple resemble the true input structure (80S ribosome; Wong et al., 2014

), while the orange-highlighted boxes are more similar to the initial template (β-galactosidase; Bartesaghi et al., 2014

), illustrating the `Einstein from Noise' phenomenon. Notably, at very low SNRs, the `Einstein from Noise' effect is evident with the MLE estimator but not with the MMSE estimator.

Proposition 2

Let Λ be any distribution over $[{\sf {SO}}(3)]$ with a strictly positive density, bounded below by some constant c > 0. Let $[{\hat g}_{\rm {MLE}}]$ , $[{\hat g}_{\rm {MAP}}]$ be the MLE and MAP rotation estimators as defined in equations (12) and (13), respectively, and assume they are unique. Let $[{\hat g}_{\rm {MMSE}}]$ be the MMSE estimator as defined in equation (14). Then, we have

$[\lim_{\sigma\to 0}{\hat g}_{\rm {MMSE}} = \lim_{\sigma\to 0}{\hat g}_{\rm {MAP}} = {\hat g}_{\rm {MLE}}.\eqno(20)]$

The proof of the proposition is provided in Appendix D. The proof relies on the existing result (Theorem 5.10 in Robert, 1999 ) and is valid not only for rotations but also for other group operators, such as translations.

3. Numerical methods for MMSE orientation estimation

This section compares the numerical performance of the MMSE and MLE estimators. We also introduce various types of prior distributions for $[{\sf {SO}}(3)]$ beyond the uniform distribution and demonstrate how incorporating this prior knowledge can significantly improve the performance of the MMSE rotation estimator. Furthermore, we study the influence of the number of sampling points of the group $[{\sf {SO}}(3)]$ of 3D rotations (namely, the number of candidate rotations), denoted as L, on the quality of rotation estimation. For better illustration, we consider the simplified setting of model (3), where Σ = σ²I_d×d, although our method applies to the general covariance setting as well.

3.1. Numerical procedure and sampling of SO(3)

Since the posterior distribution of $[g\in{\sf {SO}}(3)]$ in equation (8) is continuous, a numerical discretization of the rotation group is required. In this work, the expectation is approximated using a quadrature rule over $[{\sf {SO}}(3)]$ . Let $[\{g^{(\ell)},w_{\ell})\}_{\ell = 0}^{L-1}]$ denote a numerical quadrature on $[{\sf {SO}}(3)]$ , constructed as a product of a spherical quadrature on $[{\sf S}^{2}]$ and a circular rule over the Euler angles (Gräf & Potts, 2009 ; Gräf, 2012 ). The implementation employed here follows the publicly available MATLAB code described in Hoskins et al. (2024 ).

We first consider the uniform case, where Λ is the Haar distribution on $[{\sf {SO}}(3)]$ . For each quadrature node, define $[x_{\ell} := \Pi(g^{(\ell)}\circ{V})]$ , for 0 ≤ ℓ ≤ L − 1, where $[\{g^{(\ell)}\}_{\ell = 0}^{L-1}]$ are the quadrature nodes and $[\{w_{\ell}\}_{\ell = 0}^{L-1}]$ are their associated weights. The MMSE estimator can then be expressed as the weighted average

$[\eqalignno{{\hat g}_{\rm {MMSE}}& = {\bb E}[g|y, V]\approx {{{\textstyle\sum \limits_{\ell = 0}^{L-1}}g^{(\ell)}w_{\ell}\displaystyle\exp\left ( - {{\|y-x_{\ell}\|^{2}}\over {2\sigma^{2}}} \right) } \over {{\textstyle\sum \limits_{\ell = 0}^{L-1}}w_{\ell} \displaystyle\exp\left ( - {{\|y-x_{\ell}\|^{2}}\over {2\sigma^{2}}}\right)}} \cr &= \textstyle \sum\limits_{\ell = 0}^{L -1}g^{(\ell)}\,p^{(\ell)},&(21)}]$

where the posterior quadrature weights are defined as

$[p^{(\ell)} \mathop {=}\limits^{\Delta} {{w_{\ell}\exp\left(\displaystyle -{{\|y-x _{\ell}\|^{2}}\over {2\sigma^{2}}}\right)} \over {{\textstyle\sum \limits_{r=0}^{L-1}}w_{r}\displaystyle\exp \left(- {{\|y-x_{r}\|^{2}}\over{2\sigma^{2}}}\right)}},\qquad 0\leq\ell\leq L-1. \eqno(22)]$

For a general non-uniform prior Λ(g), the same quadrature can be applied with modified posterior weights:

$[p^{(\ell)}\propto w_{\ell} \Lambda(g^{(\ell)}) \exp\left(-{{\|y-x_{\ell}\|^{2}}\over{2\sigma^{2}}}\right),\eqno(23)]$

followed by normalization to ensure $[\sum_{\ell = 0}^{L-1}p^{(\ell)} = 1]$ .

Similarly, the MLE estimator in equation (12) relies on the discretization of $[{\sf {SO}}(3)]$ , which can be approximated through grid search using the uniform rotation samples or pre-defined grid $[\{g^{(\ell)}\}_{\ell = 0}^{L-1}]$ as

$[{\hat g}_{\rm {MLE}} = \mathop{\rm argmin}\limits_{0\leq\ell\leq L-1}\|y-x_{\ell}\|^{2}.\eqno(24)]$

Both estimators require the evaluation of ||y − x_ℓ|| for all candidate rotations, which naively costs O(d) per candidate and thus O(Ld) overall. This estimate, however, overlooks the structure of $[{\sf {SO}}(3)]$ . Any 3D rotation can be decomposed into a viewing direction (two degrees of freedom, discretized with about L^2/3 samples) and an in-plane rotation about that direction (one degree of freedom, discretized with L^1/3 samples). By exploiting FFT-based methods, all L^1/3 in-plane rotations can be evaluated simultaneously for each viewing direction. Consequently, the effective complexity becomes $[O(L^{2/3}d\log d)]$ , which improves over the naive bound by a factor of L^1/3 (Kostelec & Rockmore, 2008 ; Kileel, Marshall et al., 2025 ). This reduction is substantial in practice, since L often reaches tens or even hundreds of thousands in high-resolution cryo-EM.

3.2. The impact of non-uniform distributions of rotations

One key advantage of the Bayesian framework is its flexibility in incorporating different prior distributions for the rotations. In practical cryo-EM applications, the distribution of particle orientations is often non-uniform due to preferred particle orientations or sample-preparation artifacts (Tan et al., 2017; Lyumkis, 2019). If the rotation distribution can be well estimated in an early stage, the information of the rotation distribution can then be integrated into the Bayes estimator to improve rotation estimation.

To illustrate rotation estimation under different prior distributions on $[{\sf {SO}}(3)]$ , we replace the uniform distribution on $[{\sf {SO}}(3)]$ with an isotropic Gaussian (IG) distribution on $[{\sf {SO}}(3)]$ , denoted by $[g\sim{\cal IG}_{{\sf {SO}}(3)}(\eta)]$ , parameterized by a scalar variance η². This distribution is frequently used in machine-learning probabilistic models on $[{\sf {SO}}(3)]$ (Corso et al., 2022 ; Leach et al., 2022 ; Jagvaral et al., 2024 ). It is worth noting that the IG distribution serves as a typical example, and similar phenomena as observed here extend to other non-uniform distributions on $[{\sf {SO}}(3)]$ as well.

The IG distribution $[{\cal I}{\cal G}_{{\sf {SO}}(3)}(\eta)]$ can be represented in an axis–angle form, with uniformly sampled axes of rotation and a rotation angle ω ∈ [0, π]. The scalar variance η² controls the distribution of the rotation angle ω: as η → ∞, the $[{\sf {SO}}(3)]$ distribution approaches uniformity, whereas as η → 0 ω becomes increasingly concentrated around 0, i.e. the rotation angle around the rotation axis is small. We apply the inverse sampling method to obtain independent and identically distributed (i.i.d.) samples from the IG distribution. Further details of this distribution are provided in Appendix E.

Fig. 2 illustrates how incorporating a prior distribution over $[{\sf {SO}}(3)]$ rotations, governed by the variance parameter η of an isotropic Gaussian distribution, affects the accuracy of the MMSE rotation estimator. In all cases, the true underlying rotation distribution is modeled as an isotropic Gaussian distribution $[{\cal IG}_{{\sf SO}(3)}(\eta = 0.1)]$ over $[{\sf {SO}}(3)]$ . The MLE estimator was computed according to equation (24). For the MMSE estimators, the estimation process used different prior distributions with variance parameters η = 0.7, 0.5 and 0.1, respectively. Specifically, the candidate rotations g^(ℓ) were generated according to these different priors (see equation 21), highlighting the impact of prior mismatch on estimation accuracy. As the variance decreases (indicating a more concentrated and less uniform distribution closer to the true underlying distribution), the performance of the MMSE estimator improves, particularly under lower SNR conditions. These findings highlight the value of incorporating prior knowledge in rotation estimation, demonstrating its potential to enhance accuracy substantially. In stark contrast, the MLE estimator remains entirely unchanged regardless of the true underlying rotation distribution. As a result, its performance remains suboptimal, particularly in scenarios where the true distribution deviates significantly from uniformity.

We note that in experimental cryo-EM/ET datasets the viewing-angle distribution is typically unknown. Estimating this distribution, and understanding how prior misspecification impacts downstream reconstruction and heterogeneity analysis, remains an important open problem. We discuss possible approaches for estimating the rotational prior in Appendix F. Recent work has begun to quantify misspecification effects under non-uniform group actions; see, for example, Xu et al. (2025 ). In this light, Fig. 2 can be viewed as a controlled prior-mismatch study: the data are generated from a concentrated isotropic Gaussian prior, whereas the MMSE (and MAP) estimators are evaluated using an assumed prior with varying concentration. We observe a graceful degradation under mismatch, and a clear improvement as the assumed prior approaches the true distribution. Importantly, even under the common assumption of a uniform prior, the MMSE estimator improves over MLE in the low-SNR regime, while more accurate prior information yields additional gains when strong preferred orientations are present.

3.3. The impact of sampling-grid size L and SNR on estimation accuracy

To study these effects, we restrict ourselves to the uniform rotation distribution setting. Fig. 3 illustrates the impact of the sampling-grid size of $[{\sf {SO}}(3)]$ , together with different levels of SNR, on the geodesic distance between the MLE and MMSE rotation estimators relative to the true rotations. Several observations can be made. Firstly, at high SNR, the MLE and MMSE estimators nearly coincide, consistent with Proposition 1. In this regime, the geodesic error decreases with the grid resolution, scaling empirically as L^1/3. This behavior arises because a discretization of $[{\sf {SO}}(3)]$ involves three angular parameters, so the resolution in each parameter direction grows with L^1/3. Secondly, as the SNR decreases, noise dominates and the advantage of refining the grid diminishes: both estimators approach similar performance, and the dependence on L becomes weaker. In the extreme limit σ → ∞, the mean geodesic distance of the MLE and MMSE estimators is indistinguishable and shows no dependence on the grid size L.

3.4. Sampling resolution and spectral information

Another factor that influences the discrepancy between the MAP and MMSE estimators is the effective spectral information available at a given sampling resolution. When comparing MAP and MMSE estimates across different image sizes, it is important to distinguish increasing the number of pixels from increasing usable spectral content. If the underlying signal is effectively bandlimited and the image is already sampled at (or above) the Nyquist rate for that bandlimit, then increasing the sampling resolution mainly amounts to oversampling and should not materially change the likelihood or the posterior over $[{\sf {SO}}(3)]$ ; consequently, the MAP–MMSE estimators are expected to remain essentially unchanged. In contrast, when higher sampling resolution is accompanied by additional usable high-frequency signal (i.e. non-negligible per-frequency SNR at higher bands), the posterior typically becomes more concentrated and we expect the MAP and MMSE estimates to approach each other in geodesic distance (at a fixed SNR level).

4. Bayesian orientation estimation as part of the volume-reconstruction problem

Thus far, we have introduced the MLE, MAP and MMSE orientation estimators for estimating the rotation between a single noisy observation y and a reference volume V, as defined in equation (3). An intriguing question that arises is how these estimators can be utilized and influence performance within a 3D volume-reconstruction process, which constitutes the main computational challenge in cryo-EM and cryo-ET.

Structure reconstruction typically follows two primary approaches, hard-assignment or soft-assignment methods, which are generally implemented through iterative refinement. In the hard-assignment approach, each observation is assigned a single orientation based on the highest correlation, and the 3D structure is then reconstructed given the rotations. In contrast, the soft-assignment method assigns probabilities across all possible orientations for each observation, enabling the 3D structure to be recovered as a weighted average of the observations, with weights determined by these probabilities. The iterative application of the soft-assignment procedure aligns with the expectation–maximization (EM) algorithm (Dempster et al., 1977 ; Sigworth et al., 2010 ), which serves as the core computational method in modern cryo-EM (Scheres, 2012b; Punjani et al., 2017).

In the following, we demonstrate that incorporating the MMSE orientation estimator into the volume-reconstruction process in the cryo-ET model (without projections) resembles the EM algorithm, as it accounts for the full distribution of possible outcomes. In contrast, substituting the MLE estimator into the algorithm operates more like a hard-assignment reconstruction method, focusing exclusively on the most likely outcome. This distinction underscores the broader applicability and flexibility of the Bayes estimator in capturing uncertainty and delivering more accurate estimates for structure reconstruction.

4.1. Connection to the EM algorithm in volume reconstruction without projection

Unlike the previous orientation-estimation model (equation 3), we consider the following simplified model for volume reconstruction (see also Singer, 2018 ; Bandeira et al., 2023 ; Fan et al., 2024 ). We observe M i.i.d. samples taking the form

$[y_{i} = g_{i}\circ V+\varepsilon_{i},\quad\quad i = 0,\ldots,M-1,\eqno(26)]$

where V is the 3D volume structure of interest, $[\{g_{i}\}_{i = 0}^{M-1}\in {\sf{SO}}(3)]$ are unknown latent variables following an i.i.d. uniform distribution, satisfying $[g_{i}\circ V(x)\equiv V(g_{i}^{-1}x)]$ , and $[\{\varepsilon_{i}\}_{i = 0}^{M-1}]$ is i.i.d isotropic Gaussian noise with variance σ². For the case where V represents a 2D image, $[\{g_{i}\}_{i = 0}^{M-1}\in {\sf{SO}}(2)]$ corresponds to in-plane rotations and the model is used for image recovery (for more, see Section 4.2; Ma et al., 2019 ). The goal is to recover V from the observations $[\{y_{i}\}_{i=0}^{M-1}]$ , treating the rotations $[\{g_{i}\}_{i=0}^{M-1}]$ as latent variables.

To distinguish the model (26) from (3), we highlight the key differences as follows.

(i) The parameter of interest is the unknown structure V here, whereas it was the single rotation g in model (3).
(ii) We observe M i.i.d. samples instead of a single observation, meaning that all observed M samples are used collectively to estimate the underlying volume structure V.
(iii) Although the rotations $[\{g_{i}\}_{i=0}^{M-1}]$ are also unknown, they are treated as nuisance parameters, and we are not directly concerned with their estimation (although, admittedly, more accurate estimation of $[\{g_{i}\}_{i=0}^{M-1}]$ could often contribute to better estimation of V).

The most common method for solving this reconstruction problem is the EM algorithm, which applies soft assignment iteratively, as outlined in Algorithm 2. In each iteration, the algorithm uses the volume estimate from the previous iteration, denoted as $[{\hat V}^{(t)}]$ , to update the volume estimate $[{\hat V}^{(t+1)}]$ , based on the observations $[\{y_{i}\}_{i=0}^{M-1}]$ . The following proposition illustrates the relationship between the volume structure update rule at iteration t + 1, and the MMSE orientation estimator introduced in equation (14).

To state the next result, we first introduce the MMSE back-rotation operator. Given an observation y_i and the current reference volume $[{\hat V}^{(t)}]$ , let p_i^(t)(g) denote the posterior weights over $[g\in{\sf{SO}}(3)]$ as defined in equation (27). We define $[\specialfonts\hat{\frak{g}}_{{\rm{MMSE}},i,t}]$ as the posterior average of the inverse action g⁻¹,

$[\specialfonts\hat{\frak{g}}_{{\rm{MMSE}},i,t} : = {\bb E}\left[g^{-1}|y_{i},{\hat V}^{(t)}\right],\eqno(29)]$

which should be interpreted as a linear operator acting on functions (or volumes). Namely, for any test function f on which rotations act,

$[\specialfonts (\hat{\frak{g}}_{{\rm {MMSE}},i,t}\circ f) := \textstyle \int\limits_{g\in{\sf{SO}}(3)}p_{i}^{(t)}(g)(g^{-1}\circ f\big)\,{\rm d}g.\eqno(30)]$

Proposition 3 (MMSE operator form of the EM M-step)

Let $[{\hat V}^{(t+1)}]$ be the (t + 1)th volume estimator in the EM algorithm as described in Algorithm 2, and let $[\specialfonts\hat{\frak{g}}_{{\rm{MMSE}},i,t}]$ be defined as in equations (29) and (30). The M-step update can then be written as

$[\specialfonts{\hat V}^{(t+1)} = {{1} \over {M}}\textstyle\sum\limits_{i=0}^{M-1} {\hat{\frak{g}}}_{{\rm MMSE},i,t}\circ y_{i}.\eqno(31)]$

The proof of this proposition is presented in Appendix G. In words, the proposition shows that the update rule for the volume structure estimation at iteration t + 1, given the volume $[{\hat V}^{(t)}]$ , is equivalent to aligning each observation $[\{y_{i}\}_{i=0}^{M-1}]$ using the associated back-rotation MMSE operator $[\specialfonts\hat{\frak{g}}_{{\rm{MMSE}},i,t}]$ , computed based on the reference volume $[{\hat V}^{(t)}]$ , and then averaging the aligned observations (equation 31). Thus, the MMSE back-rotation operator is a key ingredient in the EM algorithm for volume reconstruction.

Remark 1

The MMSE back-rotation operator $[\specialfonts\hat{\frak{g}}_{{\rm{MMSE}},i,t}]$ (equation 29) should be distinguished from the MMSE rotation matrix estimator $[{\hat g}_{\rm{MMSE}}]$ obtained in equation (14) by taking a posterior mean in the 3 × 3 matrix embedding and then projecting onto $[{\sf{SO}}(3)]$ (via orthogonal Procrustes). In the matrix formulation, one forms $[{\hat g}_{\rm{relax}}]$ as in equation (15) and then projects to obtain a single $[{\sf{SO}}(3)]$ element. In contrast, in equation (30) the inversion is handled inside the posterior average: the expectation is taken over the inverse actions g⁻¹ ○ (·), yielding a linear operator that acts directly on y_i [and is not, in general, itself a member of $[{\sf{SO}}(3)]$ ]. While these two objects differ in representation, they are both posterior-weighted aggregations over rotations; the operator form is the one that naturally appears in the EM M-step for reconstructing the volume.

Despite the subtle differences arising from rotation estimation versus structure reconstruction, the previously introduced MMSE rotation estimator (equation 14) can still serve as a practical tool for assessing the performance of soft assignments. Concretely, we take the posterior rotation matrix mean as in equation (29) and then perform an orthogonal Procrustes projection to replace $[\specialfonts\hat{\frak{g}}_{{\rm{MMSE}},i,t}]$ . In this way, the MMSE rotation estimator provides a concrete approximation to the soft assignment in structure reconstruction. In numerical experiments, it is compared with the hard assignment, demonstrating superior performance relative to the maximum-likelihood estimator, as discussed below.

4.1.1. The MLE estimator as part of volume reconstruction

In contrast to the soft-assignment procedure, if we replace the MMSE operator with the corresponding MLE operator, we obtain the structure-reconstruction algorithm by applying hard assignment iteratively. To be more specific, the MLE operator applied to the observations $[\{y_{i}\}_{i=0}^{M-1}]$ can be viewed as a hard assignment among all possible rotations. In practice, this procedure involves making a hard decision where a single rotation is selected from the rotation grid according to the closest alignment. In the (t + 1)th iteration, similarly to equation (31), the hard-assignment process can be expressed as follows:

$[\specialfonts{\hat V}^{(t+1)} = {{1} \over {M}}\textstyle\sum\limits_{i=0}^{M-1}\hat{\frak{g}}_{{\rm{MLE}},i,t}\circ y_{i},\eqno(32)]$

where $[\specialfonts\hat{\frak{g}}_{{{\rm{MLE}}},i,t}]$ is defined by

$[\specialfonts\eqalignno {\hat{\frak{g}}_{{\rm{MLE}},i,t} &= \mathop{\rm argmax} \limits_{g ^{-1}\in{\sf{SO}}(3)}{\bb P}\left(g|y_{i},{\hat V}^{(t)}\right) & (33) \cr &= \mathop{\rm argmax} \limits_{g^{-1}\in{\sf{SO}}(3)}p_{i}^{(t)}(g) = \mathop{\rm argmin} \limits_{g^{-1}\in{\sf{SO}}(3)}\left\|y_{i}-\left(g\circ{\hat V}^{(t)}\right)\right\|^{2}.&(34)}]$

In other words, $[\specialfonts\hat{\frak{g}}_{{\rm{MLE}},i,t} = ({\hat g}_{{\rm{MLE}},i,t})^{-1}]$ where

$[\hat{g}_{{\rm {MLE}},i,t} := \mathop{\rm argmin} \limits_{g\in{\sf {SO}}(3)}\left\|y_{i}-\left(g\circ{\hat V}^{(t)}\right)\right\|^{2}.]$

Hence, the hard assignment can be viewed as using the exact inverse of the MLE estimator introduced earlier.

4.2. Empirical results for volume reconstruction and the `Einstein from Noise' phenomenon

We demonstrate volume reconstruction empirically by applying the MLE estimator $[{\hat g}_{\rm{MLE}}]$ and the MMSE estimator $[{\hat g}_{\rm{MMSE}}]$ as part of the reconstruction problem, as specified in Section 4.1.

4.2.1. Description of the experiments

We demonstrate the reconstruction processes, which integrates MLE and MMSE rotation estimators as intermediate steps, using the two examples of 2D image recovery (Fig. 4) and 3D volume reconstruction without projection (Fig. 4). The iterative reconstruction process, as outlined in Algorithm 2, was performed until convergence (i.e. when the relative difference between consecutive iterations fell below a predefined threshold of 10⁻³ or until a maximum of 100 iterations was reached).

Figs. 4 and 5 were generated using slightly different methods. For the 2D experiment presented in Fig. 4 we used polar coordinates, while in the 3D experiment, shown in Fig. 5, we used a standard Cartesian basis. The primary difference lies in the interpolation required for producing Fig. 5, which utilizes cubic interpolation for each observation based on the estimated rotation. This introduces certain `quantization' errors. Additionally, as detailed in Section 2, the 3D reconstruction process presented in Fig. 5 requires a `rounding' step which amounts to solving the orthogonal Procrustes procedure. In Fig. 4, the true and template structures were generated in a polar representation with d = 300 radial points and L = 30 polar angle points. The reconstruction process was performed with M = 5 × 10⁴ observations. The additive noise was added in the polar representation. In Fig. 5, M = 3000 observations were used, with a rotation group $[{\sf {SO}}(3)]$ grid size of L = 300 and a volume size of 32 × 32 × 32.

4.2.2. Empirical observations

A few observations can be made from Figs. 4 and 5. Firstly, in the case of high SNR (i.e., as σ → 0), the volume reconstruction is similar whether using the MLE estimator or the MMSE estimator. This similarity is theoretically supported by Proposition 2. However, as the SNR decreases the reconstructions diverge, with the volumes reconstructed using the MMSE estimator showing a better correlation with the true volume. Secondly, in scenarios of extremely low SNR, where the structural signal is nearly nonexistent, the phenomenon known as `Einstein from Noise' manifests in both 2D and 3D contexts. This phenomenon pertains to the inherent model bias within the reconstruction procedure, specifically in relation to the initial templates. In such cases, the reconstructed volume exhibits structural similarities to the initial template, even though the observations do not substantiate this outcome. The generation of a structured image from entirely noisy data has attracted considerable attention, particularly during a significant scientific debate regarding the structure of an HIV molecule (Mao, Castillo-Menendez et al., 2013 ; Henderson, 2013 ; van Heel, 2013 ; Subramaniam, 2013 ; Mao, Wang et al., 2013 ); for a comprehensive description and statistical analysis, see Balanov et al. (2024), Balanov, Bendory et al. (2025 ), Balanov, Huleihel et al. (2025 ) and Balanov, Zabatani et al. (2025 ). Notably, our empirical evidence suggests that the `Einstein from Noise' phenomenon is more pronounced when adopting the MLE estimator compared with the MMSE estimator, implying that the MMSE approach is less vulnerable to the choice of the initial template. Furthermore, our experiment suggests that the advantage of using the Bayesian MMSE estimator over the MLE estimator is more significant in 3D structure-reconstruction tasks compared with 2D image recovery, where the 3D setting is a problem of greater interest to researchers in structural biology.

5. Structural heterogeneity analysis

Understanding heterogeneity is central to revealing the dynamic behavior of macromolecular complexes in cryo-EM. Unlike traditional 3D classification methods that assume a small number of discrete states, modern approaches aim to capture continuous structural variability by embedding projection images into a latent conformational space. Here, we adopt the RECOVAR framework (Gilles & Singer, 2025), which performs heterogeneity analysis using principal component analysis (PCA) based on a regularized estimate of the conformational covariance matrix. The key idea is to infer the covariance of 3D volumes directly from noisy 2D projection images, leveraging the fact that although the volumes themselves are unobserved, their projections contain sufficient statistical information.

5.1. Fixed-pose methods

A central computational assumption in RECOVAR is that the poses {ϕ_i} of all particle images are known and fixed, typically obtained from a consensus refinement procedure. This `fixed-pose' assumption underlies many modern heterogeneity analysis pipelines, including 3D classification in cryoSPARC (Punjani et al., 2017), 3D variability analysis (3DVA; Punjani & Fleet, 2021), 3DFlex (Punjani & Fleet, 2023 ), cryoDRGN (Zhong et al., 2021), CryoDRGN-AI-fixed (Levy et al., 2025), Opus-DSD (Luo et al., 2023) and RECOVAR (Gilles & Singer, 2025). These methods are collectively referred to as fixed-pose methods, since they treat poses as known inputs during downstream inference. Among them, RECOVAR has recently emerged as one of the most effective approaches. According to the CryoBench evaluation (Jeon et al., 2024 ), which benchmarks multiple methods across diverse synthetic and experimental datasets, RECOVAR consistently achieves state-of-the-art performance in both structural resolution and latent space recovery. Notably, CryoBench evaluations were performed by external users, indicating RECOVAR's strong out-of-the-box performance and robustness to hyperparameter choices. This motivates our choice to build upon the RECOVAR framework.

Despite its strong empirical performance, the fixed-pose assumption merits careful scrutiny. In practice, ground-truth poses are never directly accessible and cannot be recovered exactly, even if the underlying volume is known, due to noise and the intrinsic ill-posedness of the inverse problem. Most existing methods address this by using MLE pose estimates in real data. This gap between assumed and achievable pose accuracy motivates a central question: to what extent can more accurate pose estimation improve downstream heterogeneity recovery? In this work, we address this question by replacing MLE poses with our MMSE estimator within RECOVAR.

5.2. The mathematical model

The forward model (equation 1 in Gilles & Singer, 2025) used in RECOVAR is expressed in the Fourier domain as

$[\tilde{y}_{i} = C_{i}{\hat P}(\phi_{i})\tilde{V}_{i}+\tilde{\varepsilon}_{i},\eqno(35)]$

where $[\tilde{y}_{i}\in{\bb C}^{N^{2}}]$ is the observed Fourier-transformed image (with N denoting the grid size along each spatial axis), $[\tilde{V}_{i}\in{\bb C}^{N^{3}}]$ is the (unknown) 3D Fourier volume corresponding to a particular conformation, $[{\hat P}(\phi_{i})]$ is the tomographic projection operator from 3D to 2D after a rigid-body motion ϕ_i = (g_i, t_i) with $[g_{i}\in{\sf{SO}}(3)]$ a rotation and $[t_{i}\in{\bb R}^{2}]$ an in-plane shift, C_i is the contrast transfer function (CTF) and $[\tilde{\varepsilon}_{i}]$ is additive noise.

This formulation is a discretized and sampled version of our continuous forward model in (39). Specifically, the operator $[{\hat P}(\phi_{i})]$ acting on $[\tilde{V}_{i}]$ corresponds directly to $[\exp[-2\pi i\bm{\omega}\cdot t_{i}}\cdot(g_{i}\circ\tilde{V})(\omega_{1},\omega_ {2},0)]]$ , where each component reflects the same physical process, namely, in-plane shift, 3D rotation and evaluation on the central slice in Fourier space.

To model structural variability, we treat each underlying volume $[\tilde{V}_{i}]$ as a random sample from a distribution over 3D conformations. We assume this distribution has a well defined mean $[\mu\in{\bb C}^{N^{3}}]$ and covariance $[\Sigma\in{\bb C}^{N^{3}\times N^{3}}]$ given by

$[\mu = {\bb E}[\tilde{V}],\quad\Sigma = {\bb E}[(\tilde{V}-\mu)(\tilde{V}-\mu)^{*}],\eqno(36)]$

where the expectation is taken over the conformational distribution. The objective of RECOVAR is to estimate both μ and Σ from the observed projections $[{\tilde{y}_{i}}]$ with given poses ϕ_i, using regularized least-squares minimization (equations 2 and 3 in Gilles & Singer, 2025).

5.3. Numerical experiments

In our experiments, we consider a synthetic one-dimensional conformational transition, a standard benchmark for continuous heterogeneity analysis methods such as cryoDRGN (Zhong et al., 2021) and RECOVAR (Gilles & Singer, 2025). The corresponding ground-truth density maps along the conformational coordinate are shown in Fig. 6(a). To generate the projection images, the underlying conformational states were uniformly sampled along the one-dimensional transition, and molecular orientations were sampled from a uniform distribution over $[{\sf{SO}}(3)]$ . Fig. 6(b) further displays a representative noisy projection under high-noise conditions.

Figure 6
MMSE pose estimation improves heterogeneity reconstruction over MLE. (a) Ground-truth density maps from a synthetic dataset simulating a one-dimensional conformational transition, a standard benchmark for methods such as cryoDRGN (Zhong et al., 2021

) and RECOVAR (Gilles & Singer, 2025

). 50 equally spaced states approximate the continuous trajectory, with colors indicating positions along the pathway (only five states are shown in the figure for clarity). (b) Example of a noisy projection image from the high-noise regime used in this study, where even ground-truth poses recover only ∼40% of the variance for the top 30 components due to finite-sample and high-noise effects. (c) Comparison of reconstructed structures using different pose priors (ground truth, MMSE, MLE), showing that MMSE yields structures closer to the true conformation state. (d) First five principal components estimated using ground-truth poses, MMSE pose estimates and MLE pose estimates. MMSE results closely match the ground truth in structural detail, whereas MLE reconstructions degrade beyond the first modes, becoming blurrier and less representative of the underlying variability. (e) Subspace accuracy, measured as the percentage of total variance captured (as defined in Section 5.3

). Ground truth achieves ∼40%, MMSE ∼30% and MLE ∼25%. (f) Eigenvalue recovery: both methods estimate the largest true eigenvalues well, but MMSE remains accurate for smaller eigenvalues, while MLE substantially overestimates them. These results show that MMSE pose estimation consistently yields more accurate recovery of conformational variability than MLE.

To avoid storing and manipulating the full high-dimensional covariance matrix $[\Sigma\in{\bb C}^{N^{3}\times N^{3}}]$ , RECOVAR first estimates a low-dimensional subspace of rank $[r\ll N^{3}]$ in which the covariance is approximated. Following Appendix A.2 in Gilles & Singer (2025), we estimate $[{\hat{\Sigma}}]$ via regularized least squares, and then form a rank-d approximation $[{\hat{\Sigma}}_{\rm col}]$ by selecting a subset of columns using a greedy SNR-based criterion that ensures both high SNR and low inter-column correlation. An orthonormal basis $[\tilde{U}\in{\bb C}^{N^{3}\times r}]$ spanning $[{\hat \Sigma}_{\rm col}]$ is obtained via randomized SVD. The conformational mean $[{\hat{\mu}}]$ is also estimated at this stage. Each conformational volume is then represented as

$[\tilde{V}_{i} = \tilde{U}z_{i}+{\hat{\mu}},\eqno(37)]$

where $[{\hat{\mu}}]$ is the estimated conformational mean from the first stage and $[z_{i}\in{\bb C}^{r}]$ are low-dimensional coordinates. In RECOVAR, these coordinates are not explicitly recovered; their covariance matrix $[\hat{\Sigma}_{\tilde{U}}\in{\bb C}^{r\times r}]$ is estimated directly via a reduced least-squares problem (equation 14 in Gilles & Singer, 2025), yielding a low-rank covariance approximation that characterizes the variability of z_i. The eigen-decomposition of $[\hat{\Sigma}_{\tilde{U}}]$ then provides the estimated principal components and eigenvalues in the reduced space. The complete procedure is detailed in Algorithm 1 of Gilles & Singer (2025).

We perform our analyses on a controlled synthetic dataset of 30 000 projection images (N = 128), simulated from a one-dimensional conformational transition discretized into 50 equally spaced states to approximate continuous heterogeneity (Gilles & Singer, 2025). We first evaluate the effect of orientation estimation on covariance recovery. The study is conducted in the high-noise regime of RECOVAR (Fig. 6b), where even with ground-truth poses, the top 30 principal components capture only about 40% of the variance due to finite-sample and high-noise effects. We set r = 50 and compare three pose sources, ground truth, MMSE estimates and MLE estimates, keeping all other steps identical. For fixed-pose methods, poses are treated as given, so the computational cost remains unchanged.

The accuracy of the estimated subspace is quantified by the percentage of total variance captured,

$[\|U_{k}^{*}\Sigma U_{k}\|_{\rm S}/\|\Sigma\|_{\rm S},]$

where U_k contains the first k estimated principal components, Σ is the ground-truth covariance matrix and Σ^1/2 is its matrix square root. Here, $[\|A\|_{\rm S} = \textstyle\sum\sigma_{i}(A)]$ is the Schatten 1-norm of matrix A with σ_i denoting the singular values of A.

Figs. 6(e) and 6(f) present the variance-capture results as described and the corresponding eigenvalue recovery, respectively. Fig. 6(d) visualizes the first five recovered principal components. Overall, MMSE pose estimation yields principal components and eigenvalue spectra that are consistently closer to the ground truth than those obtained from MLE pose estimation.

We further perform a more granular evaluation of structure reconstruction at the level of individual conformational states. For this analysis, we select all projection images corresponding to a specific conformational state and embed them into the latent conformational space (z-space; see equation 37) using RECOVAR, conditioned on different pose priors: ground truth, MMSE estimates and MLE estimates. For each subset, the reconstruction corresponding to the mean embedding serves as an estimate of that specific conformational state. These reconstructions provide a direct measure of how different orientation estimates affect the accuracy of heterogeneity structure reconstruction. Fig. 6(c) demonstrates that MMSE orientation estimates produce reconstructed structures closer to the ground-truth conformation than those obtained from MLE, further highlighting the advantage of MMSE pose estimation in high-noise heterogeneity analysis. We further compared the reconstructions obtained using MMSE and MLE orientation estimates against the ground-truth conformational state. The reported local FSC resolution scores (in Å) show that the 50% quantile resolutions are 11.45 and 11.55 Å for MMSE and MLE, respectively, while the 90% quantiles are 18.83 and 21.43 Å (see also Supplementary S.J. in Gilles & Singer, 2025).

6. Discussion and conclusions

In this work, we have introduced the Bayesian framework for enhancing orientation estimation for various applications in structural biology. The proposed approach offers greater flexibility and improved accuracy compared with existing methods, with the MMSE estimator as a prime example. This technique handles diverse structural conformations and arbitrary rotation distributions across sample sets. Our empirical results establish that the proposed MMSE estimator consistently surpasses the performance of the current methods, particularly in challenging low-SNR environments, as well as when prior information on rotation distribution is available or approximately known. We provide a theoretical foundation to explain these performance gains. As rotation determination is crucial for both 2D and 3D reconstruction processes, we further illustrate how utilizing the MMSE estimator as a soft-assignment step in iterative refinement leads to significant improvements over hard-assignment methods. Moreover, the proposed Bayesian approach empirically offers enhanced resilience against the `Einstein from Noise' phenomenon, effectively reducing model bias and improving the overall reliability of structural reconstructions. Thus, our main recommendation is to adopt the Bayesian MMSE rotation estimator over the MLE estimator in related application scenarios. Already integrated into most software, the Bayesian rotation estimator can be easily implemented with minimal computational cost, offering improved accuracy and resilience for orientation determination and structure reconstruction.

6.1. Future work

The MMSE orientation estimator represents the most natural Bayes estimator in the context of cryo-EM, as it coincides with the M-step reconstruction of the EM algorithm. Nevertheless, alternative Bayes estimators associated with different loss functions could also be explored in future studies. A particularly promising direction is the direct estimation of the rotation distribution from observed images and the incorporation of this information into rotation estimation or EM-based 3D reconstruction algorithms (Janco & Bendory, 2022 ; Xu et al., 2025). Such approaches could substantially improve rotation accuracy, especially in low-SNR regimes, as indicated by our results in Fig. 2, and consequently enhance volume-reconstruction quality. Further extensions may involve modeling structural uncertainties, which is particularly relevant for flexible or heterogeneous proteins, and addressing general pose-estimation problems that jointly consider rotations and translations. Finally, leveraging the Bayesian framework to derive confidence regions for individual rotations presents an exciting opportunity to improve the interpretability and reliability of rotation estimation in cryo-EM.

APPENDIX A

Full cryo-EM model

The (homogeneous) cryo-EM imaging process can be described by a comprehensive generative model that incorporates the key physical components of image formation, including random orientations, in-plane shifts, tomographic projection and contrast transfer function (CTF).

Formally, each observed 2D image y_i is modeled as

$[\eqalignno {y_{i}(x_{1},x_{2}) & = h_{i}*T_{t_{i}}\{\Pi[(g_{i}\circ V)(x_{1},x_{2}, x_{3})]\}+\varepsilon_{i}(x_{1},x_{2}),\cr &\ \quad {\rm for}\ \,(x_{1},x _{2},x_{3})\in{\bb R }^{3}. & (38)}]$

where V denotes the underlying 3D volume, g_i denotes the random rotation, Π is the tomographic projection operator (i.e. line integration along the x₃ axis), T_{t_i} denotes the in-plane shift operator with shift $[t_{i}\in{\bb R}^{2}]$ , * denotes the 2D convolution in the image plane, h_i is the point-spread function (whose Fourier transform is commonly known as the CTF) and ɛ_i denotes the measurement noise. Equivalently, by applying the Fourier transform and using the Fourier slice theorem, we obtain the forward model in the Fourier domain,

$[\eqalignno {{\tilde y}_{i}(\omega_{1},\omega_{2}) & = {\tilde h}_{i}\cdot \exp(-2\pi{\bf i}{\boldomega}\cdot t_{i})\cdot[(g_{i}\circ\tilde{V})(\omega_{1},\omega_{2},0) ]\cr & +{\tilde \varepsilon}_{i}(\omega_{1},\omega_{2}) ,\quad {\rm for }\, {\boldomega} = (\omega_{1},\omega_{2})\in {\bb R}^{2},&(39)}]$

where $[{\tilde y}_{i},\tilde{h}_{i},\tilde{\varepsilon}_{i}]$ are the 2D Fourier transforms of y_i, h_i, ɛ_i, respectively, and $[{\tilde V}]$ is the 3D Fourier transform of V.

In practice, both the images and the underlying volume are discretized and sampled on finite grids. Depending on whether the model is implemented in the spatial or Fourier domain, different but equivalent discretized versions arise. Specifically:

In the spatial domain, each image y_i is represented as a vector in $[{\bb R}^{N^{2}}]$ and the 3D volume V is discretized as an array in $[{\bb R}^{N^{3}}]$ .
In the Fourier domain, we work with complex-valued data, where $[{\tilde y}_{i}\in{\bb C}^{N^{2}}]$ and $[{\tilde V}\in{\bb C}^{N^{3}}]$ are the discrete Fourier representations of the image and volume, respectively. Since the images and the volume are real-valued, it follows that $[{\tilde y}_{i}]$ and $[{\tilde V}]$ are conjugate symmetric, i.e. $[\overline{{\tilde y}_{i}(\omega_{1},\omega_{2})} = {\tilde y}_{i}(-\omega_{1},- \omega_{2})]$ and $[\overline{{\tilde V}(\omega_{1},\omega_{2},\omega_{3})} = {\tilde V}(-\omega_{1},- \omega_{2},-\omega_{3})]$ .

Here, N denotes the resolution of the reconstructed grid: each image consists of N × N pixels, and the volume is represented on an N × N × N voxel grid.

In the heterogeneous case, the underlying volume V may itself be a realization drawn from a distribution over conformational states $[V_{i}\sim{\cal P}_{V}]$ , where $[{\cal P}_{V}]$ represents a (possibly low-dimensional) distribution capturing the variability in molecular conformations, such as discrete states and continuous manifolds. This setting gives rise to the heterogeneous cryo-EM problem, where the goal is to infer not just a single structure, but the distribution $[{\cal P}_{V}]$ from noisy 2D projections.

APPENDIX B

Bayes pose estimation with rotations and shifts

To account for in-plane shifts, we model the latent pose as $[\phi = (g,t)\in{\sf{SO}}(3)\times{\bb R}^{2}]$ , where g denotes the 3D rotation and t denotes the in-plane translation. Given a volume V, the forward model can be written as

$[\displaystyle p(y\mid g,t)\propto\exp\left(-{{1}\over{2}}\|y- {\cal A}(g,t)V\|_{\Sigma^{-1}}^{2}\right),\eqno(40)]$

where $[{\cal A}(g,t)]$ is the cryo-EM/ET forward operator including both rotation and shift [for example in the Fourier domain the shift acts via the phase factor exp(−2πiω·t); see Appendix A]. We assume priors Λ(g) on rotations and π(t) on shifts. The joint posterior then takes the form

$[p(g,t\mid y)\propto p(y\mid g,t)\,\Lambda(g)\,\pi(t).\eqno(41)]$

B1. Additive loss function

To define a Bayes estimator, we specify a loss function on $[{\sf {SO}}(3)\times{\bb R}^{2}]$ . Throughout, we adopt an additive pose loss,

$[{\cal L}[(g,t),(\hat{g},\hat{t}\,)] = {\cal L}_{g}(g,\hat{g})+{\cal L}_{t}(t,\hat{t}\,),\eqno(42)]$

which penalizes rotation and translation errors separately (optionally with application-dependent weights to reconcile units). Under equation (37), the Bayes pose estimator is defined by

$[(\hat{g}_{\rm { Bayes}},\hat{t}_{\rm {Bayes}})\in \mathop{\rm argmin}\limits_{\hat{g}\in{\sf SO}(3),\hat{t}\in{\bb R}^{2}} {\bb E}\{{\cal L}[(g,t),(\hat{g},\hat{t})]|\,y\},\eqno(43)]$

where the expectation is taken with respect to the joint posterior (equation 36). Because the loss is additive, the minimization in equation (38) decomposes into two marginal Bayes estimators:

$[\hat{g}_{\rm {Bayes}} = \mathop{\rm argmin}\limits_{\hat{g }\in{\sf{SO}}(3)}{\bb E}[{\cal L}_{g}(g,\hat{g})\mid y], \eqno (44)]$

$[\hat{t}_{\rm {Bayes}} = \mathop{\rm argmin}\limits_{\hat{t }\in{\bb R}^{2}} {\bb E}[{\cal L}_{t}(t,\hat{t})\mid y].\eqno(45)]$

B2. Closed-form MMSE rules under squared loss

For translations, choosing Euclidean squared loss $[{\cal L}_{t}(t,\hat{t}) = \|t-\at{t}\|_{2}^{2}]$ yields the posterior mean (similarly to equation 15),

$[{\hat t}_{\rm {MMSE}} = {\bb E}[t\mid y] = \textstyle\int\limits_{{\sf{SO}} (3)}\int\limits_{{\bb R}^{2}}t p(g,t\mid y)\,{\rm d}t\,{\rm d}g,\eqno(46)]$

where p(g, t ∣ y) is defined through equation (41). For rotations, when g is represented by its 3 × 3 matrix and we take the squared chordal/Frobenius loss $[{\cal L}_{g}(g,\hat{g}) = \|g-\hat{g}\|_{\rm F}^{2}]$ , the corresponding Bayes estimator is obtained by first forming the relaxed posterior mean matrix (similarly to equation 15),

$[\hat{g}_{\rm {relax}} = {\bb E}[g\mid y] = \textstyle\int\limits_{{\sf{SO }}(3)}\int\limits_{{\bb R}^{2}}g p(g,t\mid y)\,{\rm d}t\,{\rm d}g,\eqno(47)]$

and then projecting it onto $[{\sf{SO}}(3)]$ via the orthogonal Procrustes map (similarly to equation 19),

$[\hat{g}_{\rm {MMSE}} = \mathop{\rm argmin} \limits_{\Omega\in {\sf{SO}}(3)}\|\hat{g}_{\rm {relax}}-\Omega\|_{\rm F}.\eqno(48)]$

which returns the closest valid rotation matrix (in Frobenius norm) to $[\hat{g}_{{\rm{ relax}}}]$ .

APPENDIX C

Alignment between two noisy volumes

The model described in equation (3) can be generalized to address the alignment problem between two noisy volumes. In equation (3), a single observation y was assumed to follow the model, with the volume V considered deterministic and known. For aligning two noisy volumes, we now consider two observations y_i for i = 1, 2, given by

$[y_{i} = g_{i}\circ V+\epsilon_{i},\eqno(49)]$

where $[\epsilon_{i}\sim{\cal N}(0,\Sigma_{d\times d})]$ for a positive-definite covariance matrix Σ_d×d and g_i are drawn from a distribution Λ over $[{\sf{SO}}(3)]$ . We assume that ε₁, ε₂, g₁, g₂ are independent and that the volume V is deterministic but unknown. Furthermore, we assume that the noise statistics are invariant under the group actions of $[{\sf{SO}}(3)]$ , meaning the distributions of ε and g ○ ε are identical for any g ∈ $[{\sf{SO}}(3)]$ . The objective is to determine the `best' alignment $[{\hat g}\in{\sf{SO}}(3)]$ between y₁ and y₂, according to criteria such as the MLE, MAP or MMSE estimators.

From equation (49), the unknown volume V can be expressed as

$[V = g_{1}^{-1}\circ(y_{1}-\epsilon_{1}).\eqno(50)]$

Substituting equation (50) into equation (49) yields

$[\eqalignno {y_{2} & = g_{2}\circ V+\epsilon_{2} & (51) \cr & = g_{2}\circ[g_{1}^{-1}\circ(y_{1}-\epsilon_{1})]+ \epsilon_{2} & (52) \cr & = (g_{2}\cdot g_{1}^{-1})\circ y_{1}-(g_{2}\cdot g_{1}^{-1})\circ \epsilon_{1}+\epsilon_{2} & (53) \cr & = (g_{2}\cdot g_{1}^{-1})\circ y_{1}+\epsilon, & (54)}]$

where equation (53) follows from the associativity of group actions, and in equation (53) we have used $[\epsilon \,{\mathop{=}\limits^{\Delta}}\,(g_{2}\cdot g_{1}^{-1})\circ\epsilon_{1}+\epsilon_{2}]$ . Since the noise statistics are invariant under group actions, it follows that $[\epsilon\sim{\cal N}(0,2\Sigma_{d\times d})]$ .

Thus, the problem of estimating the relative rotation $[g_{2}\cdot g_{1}^{-1}]$ between y₁ and y₂ is similar to estimating y₂ under the assumption that V is known. However, the covariance of the additive noise ε in this estimation is doubled compared with the case when V is known.

APPENDIX D

Proof of Proposition 1

By the definition of $[{\hat g}_{\rm{MMSE}}]$ (equation 14), we have

$[{\hat g}_{\rm{MMSE}} = \mathop {\rm argmin}\limits_{{\hat g}\in{\sf{SO}}(3)} \textstyle \int\limits_{{\sf{SO}}(3)} \|g-{\hat g}\|_{\rm F}^{2} {\bb P}_{\Lambda}(g|y,{V})\,{\rm d}g. \eqno(55)]$

Then, as $[t\|g\|_{\rm F},\|{\hat g}\|_{\rm F}]$ are fixed for any $[g,{\hat g}\in{\sf{SO}}(3)]$ , we have

$[\eqalignno {{\hat g}_{\rm{MMSE}} & = \mathop {\rm argmax}\limits_{{\hat g}\in{\sf{SO}}(3)} \textstyle \int \limits_{{\sf{SO}}(3)}\langle{\hat g},g\rangle{\bb P}_{\Lambda}(g|y,{V})\, {\rm d}g &(56) \cr & = \mathop {\rm argmax}\limits_{ {\hat g}\in{\sf{SO}}(3) } \langle{\hat g},{\hat g}_{\rm{relax}}\rangle. & (57)}]$

As a consequence, equation (57) is equal to the rotation $[{\hat g}\in{\sf{SO}}(3)]$ that has the highest correlation with the relaxed estimator $[{\hat g}_{\rm{relax}}]$ as in equation (15). It follows that

$[{\hat g}_{\rm{MMSE}} = \mathop {\rm argmax}\limits_{{\hat g}\in{\sf{SO}}(3)} \langle{\hat g},{\hat g}_{\rm{relax}}\rangle = \mathop {\rm argmin}\limits_{{\hat g}\in{\sf{SO}}(3)}\|\hat{g}- \hat{g}_{{\rm{relax}}}\|_{\rm F}^{2}.\eqno(57)]$

Thus, $[{\hat g}_{\rm{MMSE}}]$ is exactly the Procrustes procedure performed on the intermediate estimator $[{\hat g}_{\rm{relax}}]$ .

APPENDIX E

Proof of Proposition 2

We split the proof into two parts. Firstly, we show that

$[\lim_{\sigma\to 0}{\hat g}_{\rm{MAP}} = {\hat g}_{\rm{MLE}}. \eqno(58)]$

Secondly, we show that

$[\lim_{\sigma\to 0}{\hat g}_{\rm{MMSE}} = {\hat g}_{\rm{MLE}}. \eqno(59)]$

For equation (58), the argument is straightforward. Since the prior density Λ(g) is bounded below by some positive constant, the term $[\log[{\rm d}\Lambda(g)]]$ becomes negligible as σ → 0 compared with the other term. Consequently, the MAP estimator converges to the MLE in this limit.

For equation (59), we apply the following theorem (Corollary 5.11 in Robert, 1999).

Theorem 1

Consider h a real-valued function defined on a closed and bounded set Θ of $[{\bb R}^{p}]$ . Let π be a positive density on Θ. If there exists a unique solution θ* satisfying

$[\theta^{*} = \mathop {\rm argmax}\limits_{\theta\in\Theta} h(\theta), \eqno(60)]$

then

$[{\lim}_{\lambda\to\infty}{{\textstyle\int_{\Theta}\theta \exp[ \lambda h(\theta)]\pi(\theta)\,{\rm d}\theta} \over {\textstyle\int_{\Theta}\exp[\lambda h(\theta)]\pi(\theta)\,{\rm d}\theta}} = \theta^{*},\eqno(61)]$

provided h is continuous at θ*.

Let h(g) = −||y − Π(g ○ V)||² and π(g)dg = dΛ(g). We immediately obtain

$[\eqalignno {\lim_{\sigma\to 0}\hat{g}_{\rm {relax}} & = \lim_{\sigma\to 0} {{\int_{{\sf{SO}}(3)}g\exp[-\|y-\Pi(g\circ V)\|^{2}/2 \sigma^{2}]\,{\rm d}\Lambda(g)} \over {\int_{{\sf{SO}}(3)}\exp [-\|y-\Pi(g\circ V)\|^{2}/2\sigma^{2}]\,{\rm d} \Lambda(g)}} \cr & = \lim_{\lambda\to\infty}{{\textstyle \int_{{\sf{SO}}(3)}g\exp[- \lambda\|y-\Pi(g\circ V)\|^{2}]\,{\rm d}\Lambda(g)} \over {\textstyle\int_{ {\sf{SO}}(3)}\exp[-\lambda\|y-\Pi(g\circ V)\|^{2}]\, {\rm d}\Lambda(g)}} \cr & = \mathop {\rm argmax} \limits_{g\in{\sf{SO}}(3)} h(g) = {\hat g}_{\rm{MLE}}. &(62)}]$

Finally, we note that $[{\hat g}_{\rm{MMSE}}]$ is obtained by applying the orthogonal Procrustes procedure to project $[{\hat g}_{\rm{relax}}]$ back onto $[{\sf{SO}}(3)]$ . Since $[{\hat g}_{\rm{MLE}}]$ is itself an element of $[{\sf{SO}}(3)]$ , it follows naturally that

$[\lim_{\sigma\to 0}{\hat g}_{\rm{MMSE}} = {\hat g}_{\rm{MLE}}. \eqno(63)]$

APPENDIX F

Non-uniform distribution of rotations

F1. Isotropic Gaussian distribution of SO(3) rotations

The IG distribution $[{\cal I}{\cal G}_{{\sf{SO}}(3)}(\eta)]$ , parameterized by the scalar variance η², can be represented in an axis–angle form, with uniformly sampled axes and a rotation angle ω ∈ [0, π], having the following probability density function (Savjolova, 1985 ):

$[f(\omega) = {{1-\cos\omega} \over {\pi}}{\textstyle\sum\limits_{\ell = 0}^{\infty}}(2\ell+1)\exp[-\ell(\ell+1)\eta^{2}] {{\sin[(\ell+1/2)\omega]} \over {\sin(\omega /2)}}.\eqno(64)]$

Notably, the uniform distribution on $[{\sf{SO}}(3)]$ , denoted $[{\cal U}_{{\sf{SO}}(3)}]$ , corresponds to uniformly sampled axes and is characterized by the density function f(ω) = (1 − cosω)/π, which represents the limiting case of η → ∞ in equation (64).

To provide intuition for the spread induced by η, consider the concentrated regime $[\eta\ll 1]$ . In this regime, one may use the standard small-angle approximation in which a rotation is represented by a rotation vector $[r\in{\bb R}^{3}]$ with

$[r\sim{\cal N}(0,\eta^{2}I_{3}),\qquad\omega\approx\|r\|,\eqno(65)]$

so that ω is approximately Maxwell distributed with scale parameter η (Johnson et al., 1995 ). Consequently, its standard deviation is

$[{\rm Std}(\omega)\approx\eta\left( {3-{{8} \over {\pi}}} \right)^{1/2} = \eta\, \left( {{{3\pi-8} \over {\pi}}} \right)^{1/2}\approx 0.67\,\eta\quad{\rm (radians)}.\eqno(66)]$

For example, η = 0.1 corresponds to Std(ω) ≈ 0.067 rad ≈ 3.8°.

For η > 1, the series converges quickly, and ℓ_max = 5 is typically sufficient for sub-percent accuracy. However, as η decreases, convergence becomes slower, making it inefficient for modeling concentrated distributions. Fortunately, this series has been well studied, and an excellent approximation for η < 1 has been proposed in the literature (Matthies et al., 1988 ), given by the following closed-form approximation:

$[\eqalignno {& f(\omega) \simeq {{1-\cos\omega} \over {(\pi)^{1/2}}} \eta^{-{{3} \over {2}}} \exp\left({{\eta} \over {4}}-{{\left(\omega/2\right)^{2}} \over {\eta}}\right)\cr & \ {\times}\ \left({{\omega-\exp\left( {{-\pi^{2}}\over {\eta}}\right)\left[(\omega-2\pi)\exp\left({{\pi\omega}\over{\eta}}\right) + (\omega+2\pi)\exp\left({{-\pi\omega}\over {\eta}}\right)\right]} \over {2\sin(\omega/2)}}\right).\cr &&(67)}]$

The method for generating samples according to the isotropic Gaussian distribution on $[{\sf{SO}}(3)]$ relies on the inverse sampling theorem. This technique enables sampling from a specified probability distribution by utilizing its cumulative distribution function (CDF). Specifically, it involves generating a uniform random number between 0 and 1 and then applying the inverse CDF to obtain a sample from the target distribution (Savjolova, 1985).

F2. Estimating the rotational prior

A key practical ingredient in Bayesian pose estimation is the prior over rotations. From an algorithmic standpoint, two complementary approaches are currently being developed to handle unknown viewing-angle distributions. Firstly, building on EM-type refinement, recent work proposes the introduction of an explicit parameterization of the orientation distribution over $[{\sf{SO}}(3)]$ and joint optimization of the structure and the viewing-angle distribution in the M-step, effectively recovering the rotational prior from the data (Xu et al., 2025). This perspective generalizes Bayesian EM refinement pipelines such as RELION by treating the pose prior as an estimable object that can be learned jointly with the structure during refinement (Scheres, 2012b). Secondly, method-of-moments approaches relax the uniformity assumption and aim to recover both the structure and the unknown viewing-angle distribution from low-order statistics, typically via alternating optimization between the two components (Sharon et al., 2020 ; Kileel, Mickelin et al., 2025 ).

In practice, we recommend starting with a uniform (or weak) prior when little is known about the rotational prior, and then optionally estimating a smoothed viewing-angle distribution from an initial refinement and plugging it into the MMSE estimator; simple sensitivity checks (uniform versus learned priors, and varying regularization) can help diagnose the effect of preferred orientations. We emphasize, however, that principled estimation of the rotational prior, and a full theory of robustness to prior misspecification in cryo-EM/ET, remains an important open direction for future work.

APPENDIX G

Proof of Proposition 3

Following equation (28), the update law (M-step) of the (t + 1)th iteration of the expectation–maximization algorithm is given by

$[{\hat V}^{(t+1)} = \mathop {\rm argmin} \limits_{V}\textstyle\sum\limits_{i=0}^ {M-1} \int\limits_{g\in{\sf{SO}}(3)}p_{i}^{(t)}(g)\|y_{i}-(g\circ V)\|^{2}\,{\rm d}g,\eqno(68)]$

where

$[p_{i}^{(t)}(g) = {{\exp[-\|y_{i}-(g\circ{\hat V }^{(t)})\|^{2}/2\sigma^{2}]} \over {\textstyle\int_{g\in{\sf{SO}}(3)}\exp [-\|y_{i}-(g\circ{\hat V}^{(t)})\|^{2}/2\sigma^{2}]\,{\rm d}g}}.\eqno(69)]$

As the action by the group element g preserves the norm, we have

$[\eqalignno {\|y_{i}-(g\circ V)\|^{2} & = \|g^{-1}\circ {y_{i}}-V\|^{2} = \|g^{-1}\circ y_{i}\|^{2}+\|V\|^{2 }\cr &\ \quad -\ 2{\,\langle{g^{-1}\circ y_{i}},V\rangle}.&(70)}]$

Thus, substituting equation (70) into equation (68) leads to

$[\eqalignno {\hat{V}^{(t+1)} & = \mathop {\rm argmin} \limits_{V} \textstyle\sum\limits_{i=0}^ {M-1} \textstyle\int\limits_{g\in{\sf{SO}}(3)}p_{i}^{(t)}(g)\{\|g^{-1}\circ y_{i} \|^{2}+\|V\|^{2} \cr &\ \quad -\ 2{\langle{g^{-1}\circ y_{i}},V\rangle}\}\,{\rm d}g.&(71)}]$

As the term ||g ○ y_i||² is independent of V, equation (71) can be simplified to

$[{\hat V}^{(t+1)} = \mathop {\rm argmin} \limits_{V} \textstyle \sum\limits_{i=0}^ {M-1} \textstyle\int\limits_{g\in{\sf{SO}}(3)}p_{i}^{(t)}(g)\{\|V\|^{2}-2{\langle{g^{-1}\circ y_{i}},V\rangle}\}\, {\rm d}g.\eqno(72)]$

Let us denote by h(V) the right-hand side of equation (72):

$[h(V) \mathop {=}\limits^{\Delta} \textstyle\sum\limits_{i=0}^{M-1} \textstyle\int\limits_{g\in{\sf{SO}}(3)}p_{i}^{(t)}(g)\{\|V\|^{2}-2{\langle{g^{-1}\circ y_{i}},V\rangle}\}\,{\rm d}g.\eqno(73)]$

Taking the first-order condition with respect to V gives

$[{{\partial} \over {\partial V}}h(V) = 2\textstyle\sum\limits_{i=0}^{M-1}\textstyle\int\limits_{g\in {\sf{SO}}(3)}p_{i}^{(t)}(g)(V-g^{-1}\circ y_{i})\,{\rm d}g = 0.\eqno (74)]$

As $[\textstyle\int_{g\in{\sf{SO}}(3)}p_{i}^{(t)}(g) = 1]$ for every i, we can simplify equation (74) to obtain

$[\eqalignno {V &= {{1} \over {M}}\textstyle\sum\limits_{i=0}^{M-1} \textstyle\int\limits_{g\in{\sf{SO}}(3)}p_{i}^{(t) }(g)\cdot(g^{-1}\circ y_{i}) \cr & = {{1} \over {M}}\textstyle \sum\limits_{i=0}^{M-1} \left(\textstyle\int\limits_{g\in{\sf{SO}}(3)}p_{i} ^{(t)}(g)\cdot g^{-1}\,{\rm d}g\right)\circ y_{i}.&(75)}]$

Thus, using the definition (30) we obtain

$[\specialfonts{\hat V}^{(t+1)} = {{1} \over {M}}\textstyle \sum\limits_{i=0}^{M-1} \hat{\frak{g}}_{{\rm{MMSE}},i,t}\circ y_{i},\eqno(76)]$

which proves the proposition.

Footnotes

‡These authors contributed equally.

¹More generally, if one works with different representations of $[{\sf {SO}}(3)]$ , or interprets the expectation as acting on operators (as in Proposition 3).

Acknowledgements

We thank Marc Auréle Gilles for providing guidance with the RECOVAR software. We also thank Marc Auréle Gilles, Eric Verbeke and Ruiyi Yang for their helpful discussions.

Data availability

The detailed implementation and code are available at https://github.com/AmnonBa/bayesian-orientation-estimation.

Funding information

AS and SX are supported in part by AFOSR under grant FA9550-23-1-0249, the Simons Foundation Math+X Investigator Award, NSF under grant DMS 2510039 and NIH/NIGMS under grant R01GM136780-01. TB is supported in part by BSF under grant 2020159, in part by NSF–BSF under grant 2024791, in part by ISF under grant 1924/21 and in part by a grant from The Center for AI and Data Science at Tel Aviv University (TAD).

References

Bai, X.-C., McMullan, G. & Scheres, S. H. W. (2015). Trends Biochem. Sci. 40, 49–57. Web of Science CrossRef CAS PubMed Google Scholar
Balanov, A., Bendory, T. & Huleihel, W. (2025). IEEE Trans. Inf. Theory, 71, 8871–8898. Web of Science CrossRef Google Scholar
Balanov, A., Huleihel, W. & Bendory, T. (2024). arXiv:2407.05277. Google Scholar
Balanov, A., Huleihel, W. & Bendory, T. (2025). arXiv:2505.21435. Google Scholar
Balanov, A., Zabatani, A. & Bendory, T. (2025). arXiv:2507.03951. Google Scholar
Bandeira, A. S., Blum-Smith, B., Kileel, J., Niles-Weed, J., Perry, A. & Wein, A. S. (2023). Appl. Comput. Harmon. Anal. 66, 236–319. Web of Science CrossRef Google Scholar
Bartesaghi, A., Matthies, D., Banerjee, S., Merk, A. & Subramaniam, S. (2014). Proc. Natl Acad. Sci. USA, 111, 11709–11714. Web of Science CrossRef CAS PubMed Google Scholar
Bendory, T., Bartesaghi, A. & Singer, A. (2020). IEEE Signal Process. Mag. 37, 58–76. Web of Science CrossRef PubMed Google Scholar
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. (2022). arXiv:2210.01776. Google Scholar
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). J. R. Stat. Soc. Ser. B Stat. Methodol. 39, 1–22. CrossRef Web of Science Google Scholar
Donnat, C., Levy, A., Poitevin, F., Zhong, E. D. & Miolane, N. (2022). J. Struct. Biol. 214, 107920. Web of Science CrossRef PubMed Google Scholar
Fan, Z., Lederman, R. R., Sun, Y., Wang, T. & Xu, S. (2024). Ann. Stat. 52, 52–77. Web of Science PubMed Google Scholar
Gilles, M. A. & Singer, A. (2025). Proc. Natl Acad. Sci. USA, 122, e2419140122. Web of Science CrossRef PubMed Google Scholar
Gower, J. C. & Dijksterhuis, G. B. (2004). Procrustes Problems. Oxford University Press. Google Scholar
Gräf, M. (2012). Adv. Comput. Math. 37, 379–392. Google Scholar
Gräf, M. & Potts, D. (2009). Numer. Funct. Anal. Optim. 30, 665–688. Google Scholar
Harpaz, Y. & Shkolnisky, Y. (2023). Biol. Imaging, 3, e8. CrossRef PubMed Google Scholar
Hartley, R., Trumpf, J., Dai, Y. & Li, H. (2013). Int. J. Comput. Vis. 103, 267–305. Web of Science CrossRef Google Scholar
Henderson, R. (2013). Proc. Natl Acad. Sci. USA, 110, 18037–18041. Web of Science CrossRef CAS PubMed Google Scholar
Hoskins, J., Khoo, Y., Mickelin, O., Singer, A. & Wang, Y. (2024). arXiv:2410.06889. Google Scholar
Huynh, D. Q. (2009). J. Math. Imaging Vis. 35, 155–164. Web of Science CrossRef Google Scholar
Jagvaral, Y., Lanusse, F. & Mandelbaum, R. (2024). AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence, pp. 12754–12762. New York: ACM. Google Scholar
Janco, N. & Bendory, T. (2022). IEEE Trans. Signal Process. 70, 3237–3248. Web of Science CrossRef Google Scholar
Jeon, M., Raghu, R., Astore, M., Woollard, G., Feathers, R., Kaz, A., Hanson, S. M, Cossio, P. & Zhong, E. D. (2024). Adv. Neural Inf. Process. Syst., 37, 89468–89512. Google Scholar
Johnson, N. L., Kotz, S. & Balakrishnan, N. (1995). Continuous Univariate Distributions, Vol. 2, 2nd ed. Hoboken: Wiley. Google Scholar
Kay, S. M. (1993). Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Cliffs: Prentice-Hall. Google Scholar
Kileel, J., Marshall, N. F., Mickelin, O. & Singer, A. (2025). SIAM J. Sci. Comput. 47, A1117–A1144. Google Scholar
Kileel, J., Mickelin, O., Singer, A. & Xu, S. (2025). arXiv:2511.07438. Google Scholar
Kimanius, D., Dong, L., Sharov, G., Nakane, T. & Scheres, S. H. W. (2021). Biochem. J. 478, 4169–4185. Web of Science CrossRef CAS PubMed Google Scholar
Kostelec, P. J. & Rockmore, D. N. (2008). J. Fourier Anal. Appl. 14, 145–179. Web of Science CrossRef Google Scholar
Leach, A., Schmon, S. M., Degiacomi, M. T. & Willcocks, C. G. (2022). Tenth International Conference on Learning Representations 2022. Google Scholar
Lehmann, E. L. & Casella, G. (2006). Theory of Point Estimation. New York: Springer. Google Scholar
Levy, A., Raghu, R., Feathers, J. R., Grzadkowski, M., Poitevin, F., Johnston, J. D., Vallese, F., Clarke, O. B., Wetzstein, G. & Zhong, E. D. (2025). Nat. Methods, 22, 1486–1494. Web of Science CrossRef CAS PubMed Google Scholar
Li, B., Zhu, D., Shi, H. & Zhang, X. (2021). J. Struct. Biol. 213, 107783. Web of Science CrossRef PubMed Google Scholar
Luo, Z., Ni, F., Wang, Q. & Ma, J. (2023). Nat. Methods, 20, 1729–1738. Web of Science CrossRef PubMed Google Scholar
Lyumkis, D. (2019). J. Biol. Chem. 294, 5181–5197. Web of Science CrossRef CAS PubMed Google Scholar
Ma, C., Bendory, T., Boumal, N., Sigworth, F. & Singer, A. (2019). IEEE Trans. Image Process. 29, 1699–1710. Web of Science CrossRef Google Scholar
Mäeots, M.-E. & Enchev, R. I. (2022). Acta Cryst. D78, 927–935. Web of Science CrossRef IUCr Journals Google Scholar
Mao, Y., Castillo-Menendez, L. R. & Sodroski, J. G. (2013). Proc. Natl Acad. Sci. USA, 110, E4178–E4182. Web of Science CrossRef CAS PubMed Google Scholar
Mao, Y., Wang, L., Gu, C., Herschhorn, A., Désormeaux, A., Finzi, A., Xiang, S. H. & Sodroski, J. G. (2013). Proc. Natl Acad. Sci. USA, 110, 12438–12443. Web of Science CrossRef CAS PubMed Google Scholar
Matthies, S., Muller, J. & Vinel, G. W. (1988). Texture Stress Microstruct. 10, 77–96. CrossRef Web of Science Google Scholar
Punjani, A. & Fleet, D. J. (2021). J. Struct. Biol. 213, 107702. Web of Science CrossRef PubMed Google Scholar
Punjani, A. & Fleet, D. J. (2023). Nat. Methods, 20, 860–870. Web of Science CrossRef CAS PubMed Google Scholar
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. (2017). Nat. Methods, 14, 290–296. Web of Science CrossRef CAS PubMed Google Scholar
Robert, C. P. (1999). Monte Carlo Statistical Methods. New York: Springer. Google Scholar
Rosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721–745. Web of Science CrossRef PubMed CAS Google Scholar
Savjolova, T. I. (1985). Metallurgija (Moscow), 4, 6. Google Scholar
Scheres, S. H. W. (2012a). J. Mol. Biol. 415, 406–418. Web of Science CrossRef CAS PubMed Google Scholar
Scheres, S. H. W. (2012b). J. Struct. Biol. 180, 519–530. Web of Science CrossRef CAS PubMed Google Scholar
Sharon, N., Kileel, J., Khoo, Y., Landa, B. & Singer, A. (2020). Inverse Probl. 36, 044003. Web of Science CrossRef PubMed Google Scholar
Sigworth, F. J., Doerschuk, P. C., Carazo, J.-M. & Scheres, S. H. W. (2010). Methods Enzymol. 482, 263–294. Web of Science CrossRef CAS PubMed Google Scholar
Singer, A. (2018). Proceedings of the International Congress of Mathematicians (ICM 2018), pp. 3995–4014. Singapore: World Scientific. Google Scholar
Singer, A. & Yang, R. (2024). Biol. Imaging, 4, e5. CrossRef PubMed Google Scholar
Sorzano, C. O. S., Jiménez, A., Mota, J., Vilas, J. L., Maluenda, D., Martínez, M., Ramírez-Aportela, E., Majtner, T., Segura, J., Sánchez-García, R., Rancel, Y., del Caño, L., Conesa, P., Melero, R., Jonic, S., Vargas, J., Cazals, F., Freyberg, Z., Krieger, J., Bahar, I., Marabini, R. & Carazo, J. M. (2019). Acta Cryst. F75, 19–32. Web of Science CrossRef IUCr Journals Google Scholar
Subramaniam, S. (2013). Proc. Natl Acad. Sci. USA, 110, E4172–E4174. Web of Science CrossRef CAS PubMed Google Scholar
Tan, Y. Z., Baldwin, P. R., Davis, J. H., Williamson, J. R., Potter, C. S., Carragher, B. & Lyumkis, D. (2017). Nat. Methods, 14, 793–796. Web of Science CrossRef CAS PubMed Google Scholar
Toader, B., Sigworth, F. J. & Lederman, R. R. (2023). J. Mol. Biol. 435, 168020. Web of Science CrossRef PubMed Google Scholar
van Heel, M. (2013). Proc. Natl Acad. Sci. USA, 110, e4175. Web of Science CrossRef PubMed Google Scholar
Watson, A. J. I. & Bartesaghi, A. (2024). Curr. Opin. Struct. Biol. 87, 102861. Web of Science CrossRef PubMed Google Scholar
Wong, W., Bai, X., Brown, A., Fernandez, I. S., Hanssen, E., Condron, M., Tan, Y. H., Baum, J. & Scheres, S. H. W. (2014). eLife, 3, e03080. Web of Science CrossRef PubMed Google Scholar
Xu, S., Zhang, A. Y. & Singer, A. (2025). arXiv:2509.22945. Google Scholar
Zhang, P. (2019). Curr. Opin. Struct. Biol. 58, 249–258. Web of Science CrossRef CAS PubMed Google Scholar
Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. (2021). Nat. Methods, 18, 176–185. Web of Science CrossRef CAS PubMed Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

STRUCTURAL
BIOLOGY

ISSN: 2059-7983

Volume 82| Part 4| April 2026| Pages 274-294

https://doi.org/10.1107/S2059798326001415

Open

access

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Search IUCr Journals		doi		Advanced search
Author		volume	page

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Bayesian perspective for orientation determination in cryo-EM with application to structural heterogeneity analysis

1. Introduction

1.1. The gap

1.2. The Bayesian framework

1.3. Applications

1.4. Overview of results and contributions

1.5. Main takeaways

2. Problem formulation and the MMSE orientation estimator

2.1. Mathematical model for orientation estimation

2.1.1. Applications of the model

2.2. Preliminaries

2.2.1. Overview of the Bayesian framework and the Bayes estimators

2.2.2. The posterior distribution of the rotation g given an observation y under model (3)

2.2.3. Metrics over SO(3)

2.3. The MLE, MAP and MMSE orientation estimators

2.3.1. The MLE estimator

2.3.2. The MAP estimator

2.3.3. The MMSE estimator

Proposition 1

2.3.4. The MMSE, MAP and MLE estimators in the high-SNR regime

Proposition 2

3. Numerical methods for MMSE orientation estimation

3.1. Numerical procedure and sampling of SO(3)

3.2. The impact of non-uniform distributions of rotations

3.3. The impact of sampling-grid size L and SNR on estimation accuracy

3.4. Sampling resolution and spectral information

4. Bayesian orientation estimation as part of the volume-reconstruction problem

4.1. Connection to the EM algorithm in volume reconstruction without projection

Proposition 3 (MMSE operator form of the EM M-step)

Remark 1

4.1.1. The MLE estimator as part of volume reconstruction

4.2. Empirical results for volume reconstruction and the `Einstein from Noise' phenomenon

4.2.1. Description of the experiments

4.2.2. Empirical observations

5. Structural heterogeneity analysis

5.1. Fixed-pose methods

5.2. The mathematical model

5.3. Numerical experiments

6. Discussion and conclusions

6.1. Future work

APPENDIX A

Full cryo-EM model

APPENDIX B

Bayes pose estimation with rotations and shifts

B1. Additive loss function

B2. Closed-form MMSE rules under squared loss

APPENDIX C

Alignment between two noisy volumes

APPENDIX D

Proof of Proposition 1

APPENDIX E

Proof of Proposition 2

Theorem 1

APPENDIX F

Non-uniform distribution of rotations

F1. Isotropic Gaussian distribution of SO(3) rotations

F2. Estimating the rotational prior

APPENDIX G

Proof of Proposition 3

Footnotes

Acknowledgements

Data availability

Funding information

References

research papers