new algorithms workshop
Introducing robustness to
of electronmicrosopy data^{a}Centro Nacional de Biotecnología – CSIC, Darwin 3, Cantoblanco, 28049 Madrid, Spain
^{*}Correspondence email: scheres@cnb.csic.es
An expectationmaximization algorithm for tdistributions. The novel algorithm has intrinsic characteristics for providing robustness against atypical observations in the data, which is illustrated using an experimental test set with artificially generated outliers. Tests on experimental data revealed only minor differences in twodimensional classifications, while threedimensional classification with the new algorithm gave stronger elongation factor G density in the corresponding class of a structurally heterogeneous ribosome data set than the conventional algorithm for Gaussian mixtures.
of electronmicroscopy images is presented that is based on fitting mixtures of multivariateKeywords: electron microscopy; maximumlikelihood refinement; expectationmaximization algorithm; robustness.
1. Introduction
Whereas e.g. Bricogne, 1997; de La Fortelle & Bricogne, 1997; Read, 2001; Blanc et al., 2004), in singleparticle threedimensional (3DEM) such statistical approaches have only recently been started to be explored. An important characteristic of the approach is the natural way in which one may model the experimental noise in the data. Because the noise levels in 3DEM data are typically extremely high, one would expect that 3DEM problems could greatly benefit from a proper error model. However, for many years image processing in 3DEM has been addressed using methods that do not take the noisy character of the experimental data into account in a statistical way (Frank, 2006). Provencher and Vogel performed early work on a statistical model for the noise in 3DEM data (Provencher & Vogel, 1988; Vogel & Provencher, 1988) and it was only in 1998 that Sigworth introduced a algorithm for the alignment of a set of twodimensional images (Sigworth, 1998). Thereafter, Doerschuk and coworkers used the same principles for the threedimensional reconstruction of icosahedral viruses (Doerschuk & Johnson, 2000; Yin et al., 2003; Lee et al., 2007), PascualMontano and coworkers introduced a algorithm for selforganizing maps (PascualMontano et al., 2001) and Zeng and coworkers applied this approach to twodimensional alignment of crystal images (Zeng et al., 2007).
approaches have become a gold standard in many areas of macromolecular Xray crystallography (We were the first to address the problem of simultaneous twodimensional image alignment and classification using et al., 2005) and we then extended this methodology to the general case of threedimensional reconstruction from structurally heterogeneous data (Scheres, Gao et al., 2007). The latter is of special relevance for 3DEM singleparticle analysis, in which many samples constitute large and flexible macromolecular complexes. These complexes typically adopt multiple conformations that are often directly related to their function in living organisms. In principle, provided that one can sort the projections from distinct structures using a computer, multiple threedimensional reconstructions of the particles in their distinct functional states may be obtained from a single 3DEM experiment. However, this sorting is strongly intertwined with the orientational assignment of the projections and at present still represents one of the major challenges in singleparticle image processing (Leschziner & Nogales, 2007).
principles (Scheres, Valle, NuñezWe model structurally heterogeneous data as a finite mixture and treat the unavailable information about the orientation and the structural class of each experimental projection as missing data. We then tackle the mixture problem using expectation maximization, which can be shown to converge to the et al., 1977; McLachlan & Peel, 2000). The resulting algorithm is a multireference procedure which is similar to conventional approaches in the field (Radermacher, 1994; Penczek et al., 1994). However, the most important difference of the approach is that the underlying statistical data model allows one to marginalize over the missing variables. That is, whereas conventional approaches assign a single orientation and class membership to each projection, the approach calculates probabilityweighted assignments for all possibilities. This provides an intrinsic stabilization of the possibly unstable reconstruction problem. Together with the typical use of relatively small images (also to reduce computational costs) and an early stopping criterion in the underlying algrebraic reconstruction algorithm with smooth basis functions, or blobs (Marabini et al., 1998; Scheres, Gao et al., 2007), this yields a stable algorithm in practice which has been shown to be highly effective on multiple occasions (see, for example, Nickell et al., 2007; Cuellar et al., 2008; Julián et al., 2008; Rehmann et al., 2008).
estimation of the mixture parameters under relatively mild conditions (DempsterDespite the importance of the underlying data model in statistical approaches, little work has been performed to explore alternative models for ). The pixel areas for the classifications described in this paper, for example, range from 12 to 30 squared angstroms. Moreover, several additional sources of noise exist such as structural noise arising from the surrounding ice and detector noise; the combination of these multiple independent sources of noise has been shown to follow a Gaussian distribution (Sorzano, de la Fraga et al., 2004). The additive character of the Gaussian noise model results in a computationally attractive algorithm, but the assumption of whiteness is known to be a poor one for electronmicroscopy projections. Therefore, we recently introduced an alternative data model in that allows the modelling of nonwhite, or coloured, Gaussian noise (Scheres, NunezRamirez et al., 2007). The Gaussian distribution still remains a common factor, while in other patternrecognition fields a notable interest has developed in the use of alternative distributions. For many applied problems the tails of the Gaussian are shorter than required and mixtures of Gaussians may lack robustness in the presence of atypical observations. In particular, the use of multivariate tdistributions has repeatedly been proposed as a more robust alternative. The tdistribution has wider tails and its degree of freedom ν essentially plays the role of rejecting atypical observations. As ν tends to infinity, the tdistribution approaches the Gaussian, so that ν may be viewed as a robustness tuning parameter. Several contributions defining frameworks of expectationmaximization algorithms for mixtures of tdistributions have appeared and mixtures of tdistributions have been successfully applied to a range of different types of data (Lange et al., 1989; McLachlan & Peel, 2000; Wang et al., 2004).
approaches in 3DEM. All the approaches mentioned above share the assumption of additive white Gaussian noise in real space. A large part of the noise may result from shot noise owing to the small number of imaging electrons (10–20 per squared angstrom). The latter would require a multiplicative noise model with a However, in practice the additive Gaussian model is a good approximation when each pixel represents many squared angstroms (Sigworth, 2004In this contribution, we explore the suitability of modelling structurally heterogeneous 3DEM data as a mixture of multivariate tdistributions. We derive the corresponding expectationmaximization algorithm in §2. In §3 we illustrate its intrinsic properties of providing robustness against outliers and compare the performance of the new algorithm with the conventional algorithm for Gaussian mixtures in twodimensional and threedimensional classification. We conclude this paper with a discussion on the potential usefulness of the proposed algorithm in §4.
2. Methods
2.1. The optimization problem
We model twodimensional images X_{1}, X_{2}, …, X_{N} as follows:
where

The reconstruction problem at hand is to estimate V_{1}, V_{2}, …, V_{K} from the observed data X_{1}, X_{2}, …, X_{N}. We view this estimation problem as a missing data problem, where the missing data associated with the observed data elements X_{i} are the position Φ_{i} and the random index κ_{i}. Thus, the complete data set is
We solve the reconstruction problem by way of Θ* that maximize the logarithm of the joint probability of observing the entire set of images X_{1}, X_{2}, …, X_{N}:
estimation, where we aim to find those parametersAs described previously (see, for example, the Supplementary Note in Scheres, Gao et al., 2007), we assume that particle picking has left a twodimensional Gaussian distribution of residual translations q_{x} and q_{y} centred at the origin and with standard deviation ξ. Furthermore, we assume an even distribution of the Q_{rot} sampled inplane rotations and a discretized distribution of estimated proportions α_{kp} of the data belonging to the pth projection of the kth underlying threedimensional structure (with α_{kp} ≥ 0 and = 1). Thereby, P(k, p, qΘ) is calculated as follows:
In contrast to previous contributions where a Gaussian distribution was employed, we calculate P(X_{i}k, p, q, Θ) as a multivariate tdistribution, with a diagonal covariance matrix with all diagonal elements equal to σ^{2}:
with
and · denoting Euclidian distance.
Following McLachlan & Peel (2000), we notice that the multivariate tdistribution may be viewed as a weighted average Gaussian distribution with the weight given by the Gamma distribution:
Here, q(u_{i}) is the p.d.f. of a Gamma distribution with equal scale and G (ν/2, ν/2), and n_{J}(X_{i}; k, p, q, u_{i}, Θ) is a multivariate Gaussian distribution centred at R_{pq}V_{k}, and again with a diagonal covariance matrix, which has all diagonal elements equal to σ^{2}/u_{i},
Therefore, it is convenient to introduce another set of `missing' variables u_{1}, …, u_{N}, which are defined such that
independently for i = 1, …, N and all u_{i} are independently distributed according to
Thus, the complete data set becomes
and the function to be optimized becomes
In analogy with (3) and with the previously introduced algorithm for Gaussian distributions (Scheres, Gao et al., 2007), the reconstruction problem at hand is to find the parameter set Θ* that maximizes (12). However, Θ now includes an additional parameter ν and the missing data vector has been augmented to not only include positions Φ_{i} and random indices κ_{i} but also variables u_{i}. In this way, atypical observations in the data (i.e. observations with relatively large residuals) may be accomodated by relatively wide Gaussian distributions (i.e. with small values of u_{i}) and the additional parameter ν is used to describe the assumed distribution of all u_{i} according to (10).
2.2. The optimization algorithm
This optimization problem may be solved by expectation maximization (Dempster et al., 1977). This algorithm is used for finding estimates of parameters in probabilistic models that depend on unobserved or hidden variables. Expectation maximization is an iterative method that alternates between expectation (E) and maximization (M) steps. In the Estep one computes the expectation of the likelihood by including the hidden variables as if they were observed. In the Mstep that follows, the estimate of the model parameters is computed by maximizing the expected likelihood found in the previous Estep. The parameters found in the Mstep are then used to begin another Estep and the process is repeated. As stated above, the missing variables in this case are u_{i}, Φ_{i} and κ_{i} and the parameters to be estimated are contained in Θ.
In the Estep, again following McLachlan & Peel (2000), we calculate the of the loglikelihood function using the current estimates of the model parameters (Θ^{old}):
Here, τ^{old}_{ikpq} is the conditional probability distribution of k, p and q given X_{i},
and for the conditional expectation of u_{i}, given X_{i}, k, p and q, we obtain
In the subsequent Mstep of the algorithm, we maximize the lower bound (13) with respect to all model parameters in Θ. Since there exists no closed form for the update of ν^{new}, we will consider ν to be known (i.e. userdetermined). The updates for ξ and the proportions α_{pk}^{new} may be calculated independently from the updates of the other model parameters as follows:
For the updates of V and σ, we note that they are a weighted version of the corresponding updates in the case of Gaussian distributions, with standard deviations , …, and with the weights being the additional missing variables u_{1}, …, u_{N}. Therefore, as for the Gaussian case, updating V may be performed by separately solving K leastsquares problems, for which we use a modified algebraic reconstruction algorithm (wlsART; see Scheres, Gao et al., 2007). In this case, the leastsquares problems are
and the updated σ is obtained as
2.3. Implementation
We implemented a total of four variants of the abovedescribed algorithm in the opensource package XMIPP (Sorzano, Marabini et al., 2004; Scheres et al., 2008). The proposed algorithm for threedimensional classification can be adapted with only minor changes to a twodimensional classification algorithm. In this case, instead of optimizing (13) with respect to threedimensional structures V_{1}, …, V_{K}, one optimizes this function with respect to twodimensional images A_{1}, …, A_{K}. The algorithm remains basically the same, except for the fact that in this case R_{pq} represents an inplane transformation (parametrized by a single rotation and two inplane coordinates) and the leastsquares problem in (18) is replaced by the following updated formula:
In addition, both the twodimensional and the threedimensional variants may also be expressed in X_{1}, …, X_{N}, A_{1}, …, A_{K} and V_{1}, …, V_{K} represent the Fourier transforms of the observed data and the twodimensional or threedimensional models, respectively, G_{i} is independent zeromean additive noise in and R_{pq} represents the reciprocalspace equivalent of either a projection operation or an inplane transformation in real space. In the former model one describes the noise by independent distributions on the realspace pixels, while in the latter the noise is modelled as being spatially stationary, which allows one to describe nonwhite or coloured noise. For a more extensive elaboration on these characteristics and their implementation, the reader is referred to Scheres, NunezRamirez et al. (2007).
In this case,Finally, we mention that the summations over k, p and q are extremely computingintensive operations. Therefore, we have implemented three deviations from the strict expectationmaximization algorithm that result in a considerable speedup of the calculations without hampering the classification performance in practice. The first two deviations were also implemented as such in the algorithms using Gaussian distributions, whereas the third deviation is specific for the tdistribution case: (i) instead of integrating over the entire search space of k and q, we employ a reducedspace approach (Scheres, Valle & Carazo et al., 2005), (ii) the update of σ is performed using V^{old} instead of V^{new} and (iii) following the proposal of McLachlan & Peel (2000), we replace the division by N in (19) by u^{old}_{ikpq}.
3. Results
3.1. Robustness to outliers
We used a simplified twodimensional test case to illustrate the potential of the tdistribution in providing robustness to outliers. The test data consisted of 1000 experimental cryoEM projections of a 70S Escherichia coli ribosome particle in a single orientation. In 50 of the 1000 images, we positioned circles of constant density with radii varying uniformly between 10 and 15 pixels and with centres varying between −15 and 15 pixels from the image origin. The intensity of these circles was set to a constant value of 5, 10, 15 or 20 times the standard deviation of the original experimental images. We then performed twodimensional realspace refinements with a single reference image for these data sets, comparing the performance of the Gaussian and tmixtures (Fig. 1). The resulting averages clearly showed the effect of the improved robustness to outliers provided by the tmixture. For the data sets with the strongest outliers in particular, the averages obtained with the Gaussian mixture showed clear artefacts that were not visible in the averages obtained using the tmixture model. Analysis of the converged estimates for the standard deviation in the noise indicated that the algorithm for the Gaussian case tries to accommodate the outliers by increasing the widths of the Gaussians. This is much less the case for the tdistribution case, where low values for u^{old}_{ikφ} downweight the contribution of the outliers in the calculation of the averages and the standard deviation of the noise. The stronger the outliers, the larger this downweighting effect and the larger the differences between the two algorithms.
3.2. Performance in twodimensional classification
To explore the potential of the new algorithm in twodimensional image classification, we performed et al., 2005) and on a negativestain data set of G40P top views (NunezRamirez et al., 2006). For each data set we performed four runs, using mixtures of Gaussians or of tdistributions with six and performing twodimensional refinements in real or in All four runs were started from identical seeds, which were obtained as average images over three random subsets of the data sets. Fig. 2 shows the resulting images of these runs, which show only minor differences between the two types of mixtures either in real or in In all cases, the refined images look very similar to those obtained with a Gaussian mixture. Not only do the densities for the averaged particles in the centre of the images look very similar, the two mixture types even result in common characteristics in the noise background. The optimization path and the optimal orientation and classification parameters of the individual images upon convergence also showed only small differences (not shown).
multireference refinements on a cryoEM data set of MCM top views (GómezLlorente3.3. Performance in threedimensional classification
For threedimensional classification, we compared the performance of both types of mixtures using a data set of 20 000 ribosome particles. This data set was previously shown to be structurally heterogeneous as only part of the ribosomes are complexed with elongation factor G (EFG; see Scheres, NunezRamirez et al., 2007). Refinements with four references typically converge to a single class corresponding to ribosomes in complex with EFG and three classes of ribosomes without EFG. We again performed four runs using realspace or reciprocalspace and using a Gaussian or a tdistribution mixture with six The intensity of segmented EFG density in the class corresponding to ribosomes in complex with EFG may serve as an indicator of classification quality, since remaining heterogeneity will generally yield lower density levels for EFG. Fig. 3 shows segmented EFG densities for the four runs performed. Starting from identical seeds, the tmixture model in real space gave somewhat higher EFG densities than the Gaussian mixture and the corresponding classes overlapped by 87%. in yielded stronger EFG densities than in real space for both types of mixtures. The differences between the Gaussian and the tmixture were smaller in this case, as no obvious difference in the intensity of EFG density was observed and the EFGcontaining classes overlapped by 94%.
4. Discussion
The selection of individual particles from electron micrographs, called particle picking, is typically a difficult task. For cryoEM data on relatively small particles (200–500 kDa) in particular, automated procedures may have relatively high error rates and the collection of good data is often strongly dependent on the specialized skills of the electron microscopist (Zhu et al., 2004). Therefore, it is relatively common for cryoEM data sets to contain significant amounts of outliers. Atypical observations that were mistakenly assumed to be a particle of interest may deteriorate the quality of the threedimensional reconstruction. In the best case scenario they only affect the resolution obtained. In the worst case scenario artefacts introduced by outliers may affect the interpretation of the structure itself. Conventionally, outliers have been dealt with by removing those particles with the lowest crosscorrelation coefficients with the reference from the process (Frank, 2006). Although effective in practice, such discrete decisions are hard to accommodate in the statistical framework of refinement.
The algorithm proposed in this contribution provides an alternative statistical solution to outlier removal. The problem of structurally heterogeneous projection data is modelled as a finite mixture of multivariate tdistributions with a given degree of freedom. In the resulting expectationmaximization algorithm, images with atypically large residuals contribute relatively little to the model estimates through lower values of u^{old}_{ikφ}; see (15), (18) and (19). Note that the residuals used to calculate the weights u^{old}_{ikφ} are closely related to the crosscorrelation coefficient, but instead of taking discrete decisions the statistical approach applies a continuous downweighting of outliers as their residuals increase. We illustrated this effect for a small experimental test set with artificially generated outliers. In the Gaussian model all particles contribute equally to the model estimates. Consequently, especially in the presence of strong outliers, the average images obtained showed outlierrelated artefacts and the variance of the noise was overestimated. In contrast, the tdistribution model resulted in clean average images and reliable noise estimates through an effective downweighting of the outliers.
In practice, however, there are limits to the downweighting of outliers in the proposed algorithm. From (15) and the example in Fig. 1, we can see that even for a few (e.g. six) significant downweighting is only achieved for images with squared residuals that exceed the standard deviation of the noise in the images several times. This may restrict the usefulness of the proposed algorithm in identifying aberrant particles. In practice, particles with such large residuals may be easily recognizable at earlier stages of image processing, whereas one would ideally want to downweight any particle that does not correspond to a projection of one of the K reference structures. Analyses of the twodimensional classifications presented in §3.2 indeed did not reveal an obvious relation between low values of u^{old}_{ikφ} and what one would consider atypical images in terms of the underlying signal (results not shown).
The number of will dominate the calculation of when using few Together with the observation made above that even for few the effect of outlier downweighting may be relatively small, this suggests that in practice it may be sufficient to run this algorithm only with few This is confirmed by our calculations. When using three, nine or 30 in the runs shown in Fig. 2, almost identical results were obtained (not shown) compared with using six degrees of freedom.
is a free parameter of the proposed algorithm. Although in theory the optimal value of any free parameter should be tested, we performed all calculations presented in this work with a fixed value of six Because 3DEM images typically contain many pixels, the righthand side of both the numerator and the denominator in (15)Improved image classification would be the ultimate aspiration of introducing a novel algorithm for tmixture model over the conventional Gaussian model in our twodimensional classifications. One could attribute this to the observation that strong downweighting may only be achieved for outliers with very large residuals and that such strong artefacts were not present in these data. However, although the differences in the u^{old}_{ikφ} weights may appear to be relatively small in practice, more subtle effects may still play an important role in the complicated convergence process. This may perhaps explain why threedimensional classification of a structurally heterogeneous ribosome data set with a realspace tmixture model may have given better classification results than the Gaussian mixture, as hinted at by a stronger signal for the complexed EFG density.
of 3DEM data. Despite the fact that both the MCM and the G40P data set contained significant amounts of neighbouring particles and other artefacts that were not accounted for in the model, we did not observe any significant improvement in using theWe have presented too few tests to allow the drawing of general conclusions on the relative suitability of the tmixture model and the conventional Gaussian model. A continuing application of the proposed algorithms on multiple test cases may provide further insights, but this falls beyond the scope of this contribution. Most probably, the optimal choice of algorithm will depend on the data set at hand. Therefore, we have made all algorithms described in this work accessible to the community by implementing them in our opensource package XMIPP (Sorzano, Marabini et al., 2004; Scheres et al., 2008). Apart from modifications to the classification approach as presented here, we also foresee the exploration of alternative algorithms, such as maximum a posteriori (MAP) estimation, which may offer signifant benefits in additional stabilization of the reconstruction problem through the incorporation of prior information.
Acknowledgements
We thank Dr Yacob GómezLlorente for providing the MCM data, Dr Rafael NúñezRamírez for providing the G40P data, Drs Haixiao Gao and Joachim Frank for providing the ribosome data and the latter for useful comments on an earlier version of this manuscript. We are grateful to the supercomputing centers of Barcelona (BSCCNS) and Galica (CESGA) for providing computer resources. Funding was provided by the European Union (FP6502828), the US National Institutes of Health (HL70472), the Spanish Ministry of Science (CSD200600023, BIO200767150C031 and 3) and the Spanish Comunidad de Madrid (SGEN01662006).
References
Blanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210–2221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bricogne, G. (1997). Methods Enzymol. 276, 472–494. Google Scholar
Cuellar, J., MartinBenito, J., Scheres, S. H. W., Sousa, R., Moro, F., LópezViñas, E., GomezPuertas, P., Muga, A., Carrascosa, J. & Valpuesta, J. (2008). Nature Struct. Mol. Biol. 15, 858–864. Web of Science CrossRef CAS Google Scholar
Dempster, A., Laird, N. & Rubin, D. (1977). J. R. Stat. Soc. Ser. B, 39, 1–38. Google Scholar
Doerschuk, P. C. & Johnson, J. E. (2000). IEEE Trans. Inf. Theory, 46, 1714–1729. Web of Science CrossRef Google Scholar
Frank, J. (2006). ThreeDimensional Electron Microscopy of Macromolecular Assemblies. Oxford University Press. Google Scholar
GómezLlorente, Y., Fletcher, R. J., Chen, X. S., Carazo, J. M. & San Martín, C. (2005). J. Biol. Chem. 280, 40909–40915. Web of Science PubMed Google Scholar
Julián, P., Konevega, A. L., Scheres, S. H. W., Lázaro, M., Gil, D., Wintermeyer, W., Rodnina, M. V. & Valle, M. (2008). Proc. Natl Acad. Sci. USA, 105, 16924–16927. Web of Science PubMed Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 361–423. Google Scholar
Lange, K., Little, R. & Taylor, J. (1989). JSTOR, 84, 881–896. Google Scholar
Lee, J., Doerschuk, P. C. & Johnson, J. E. (2007). IEEE Trans. Image Process. 16, 2865–2878. Web of Science CrossRef PubMed Google Scholar
Leschziner, A. E. & Nogales, E. (2007). Annu. Rev. Biophys. Biomol. Struct. 36, 43–62. Web of Science CrossRef PubMed CAS Google Scholar
Marabini, R., Herman, G. T. & Carazo, J. M. (1998). Ultramicroscopy, 72, 53–65. Web of Science CrossRef CAS PubMed Google Scholar
McLachlan, G. & Peel, D. (2000). Finite Mixture Models. New York: John Wiley & Sons. Google Scholar
Nickell, S., Beck, F., Korinek, A., Mihalache, O., Baumeister, W. & Plitzko, J. M. (2007). FEBS Lett. 581, 2751–2756. Web of Science CrossRef PubMed CAS Google Scholar
NunezRamirez, R., Robledo, Y., Mesa, P., Ayora, S., Alonso, J. C., Carazo, J. M. & Donate, L. E. (2006). J. Mol. Biol. 357, 1063–1076. Web of Science CrossRef PubMed CAS Google Scholar
PascualMontano, A., Donate, L. E., Valle, M., Bárcena, M., PascualMarqui, R. D. & Carazo, J. M. (2001). J. Struct. Biol. 133, 233–245. Web of Science CrossRef PubMed CAS Google Scholar
Penczek, P. A., Grassucci, R. A. & Frank, J. (1994). Ultramicroscopy, 53, 251–270. CrossRef CAS PubMed Web of Science Google Scholar
Provencher, S. W. & Vogel, R. H. (1988). Ultramicroscopy, 25, 209–221. CrossRef CAS PubMed Web of Science Google Scholar
Radermacher, M. (1994). Ultramicroscopy, 53, 121–136. CrossRef CAS PubMed Web of Science Google Scholar
Read, R. J. (2001). Acta Cryst. D57, 1373–1382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rehmann, H., AriasPalomo, E., Hadders, M., Schwede, F., Llorca, O. & Bos, J. (2008). Nature (London), 455, 124–127. Web of Science CrossRef PubMed CAS Google Scholar
Scheres, S. H. W., Gao, H., Valle, M., Herman, G. T., Eggermont, P. P. B., Frank, J. & Carazo, J. M. (2007). Nature Methods, 4, 27–29. Web of Science CrossRef PubMed CAS Google Scholar
Scheres, S. H. W., NunezRamirez, R., GomezLlorente, Y., San Martin, C., Eggermont, P. P. B. & Carazo, J. M. (2007). Structure, 15, 1167–1177. Web of Science CrossRef PubMed CAS Google Scholar
Scheres, S. H. W., NunezRamirez, R., Sorzano, C. O. S., Carazo, J. M. & Marabini, R. (2008). Nature Protoc. 3, 977–990. Web of Science CrossRef CAS Google Scholar
Scheres, S. H. W., Valle, M. & Carazo, J. M. (2005). Bioinformatics, 21, Suppl. 2, ii243–ii244. Google Scholar
Scheres, S. H. W., Valle, M., Nuñez, R., Sorzano, C. O. S., Marabini, R., Herman, G. T. & Carazo, J. M. (2005). J. Mol. Biol. 348, 139–149. Web of Science CrossRef PubMed CAS Google Scholar
Sigworth, F. J. (1998). J. Struct. Biol. 122, 328–339. Web of Science CrossRef CAS PubMed Google Scholar
Sigworth, F. J. (2004). J. Struct. Biol. 145, 111–122. CrossRef PubMed CAS Google Scholar
Sorzano, C. O. S., de la Fraga, L. G., Clackdoyle, R. & Carazo, J. M. (2004). Ultramicroscopy, 101, 129–138. Web of Science CrossRef PubMed CAS Google Scholar
Sorzano, C. O. S., Marabini, R., VelázquezMuriel, J., BilbaoCastro, J. R., Scheres, S. H. W., Carazo, J. M. & PascualMontano, A. (2004). J. Struct. Biol. 148, 194–204. Web of Science CrossRef PubMed CAS Google Scholar
Vogel, R. H. & Provencher, S. W. (1988). Ultramicroscopy, 25, 223–239. CrossRef CAS PubMed Web of Science Google Scholar
Wang, H., Zhang, Q. & Wei, S. (2004). Pattern Recognit. Lett. 25, 701–710. Web of Science CrossRef Google Scholar
Yin, Z., Zheng, Y., Doerschuk, P. C., Natarajan, P. & Johnson, J. E. (2003). J. Struct. Biol. 144, 24–50. Web of Science CrossRef PubMed Google Scholar
Zeng, X., Stahlberg, H. & Grigorieff, N. (2007). J. Struct. Biol. 160, 362–374. Web of Science CrossRef PubMed CAS Google Scholar
Zhu, Y. et al. (2004). J. Struct. Biol. 145, 3–14. Web of Science CrossRef PubMed CAS Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.