research papers
Approximating deformation fields for the analysis of continuous heterogeneity of biological macromolecules by 3D Zernike polynomials
aCentro Nacional de Biotecnologia-CSIC, C/ Darwin 3, Cantoblanco, Madrid 28049, Spain, bDepartment of Statistics and Data Science, Yale University, New Haven, Connecticut, USA, cDepartment of Computational and Systems Biology, University of Pittsburgh, Pennsylvania, USA, dInstitute of Computer Science, Masaryk University, Botanická 68a, 60200 Brno, Czech Republic, and eFaculty of Informatics, Masaryk University, Botanická 68a, 60200 Brno, Czech Republic
*Correspondence e-mail: dherreros@cnb.csic.es
Structural biology has evolved greatly due to the advances introduced in fields like
This image-capturing technique, combined with improved algorithms and current data processing software, allows the recovery of different conformational states of a macromolecule, opening new possibilities for the study of its flexibility and dynamic events. However, the ensemble analysis of these different conformations, and in particular their placement into a common variable space in which the differences and similarities can be easily recognized, is not an easy matter. To simplify the analysis of continuous heterogeneity data, this work proposes a new automatic algorithm that relies on a mathematical basis defined over the sphere to estimate the deformation fields describing conformational transitions among different structures. Thanks to the approximation of these deformation fields, it is possible to describe the forces acting on the molecules due to the presence of different motions. It is also possible to represent and compare several structures in a low-dimensional mapping, which summarizes the structural characteristics of different states. All these analyses are integrated into a common framework, providing the user with the ability to combine them seamlessly. In addition, this new approach is a significant step forward compared with principal component analysis and normal mode analysis of cryo-electron microscopy maps, avoiding the need to select components or modes and producing localized analysis.Keywords: multi-dimensional scaling (MDS); 3D reconstruction and image processing; single-particle cryo-EM; spherical harmonics; Zernike polynomials; conformations.
1. Introduction
The application in ) or electron cryo-tomography (Schur, 2019) has proven to be a versatile tool to trace high-resolution structures. In particular, cryo-EM SPA has proven to be especially good at providing not only one structure, but a series of them, with most methods aiming to resolve stable states that are referred to as classes. In this way, we get a first approximation to the conformational landscape of the macromolecule, albeit restricted to these stable states.
of techniques such as cryo-electron microscopy (cryo-EM), single-particle analysis (SPA) (Carroni & Saibil, 2016However, the limited number of classes that can be extracted from a 3D classification is usually not enough to unveil fully the dynamics of a given macromolecule. The complete characterization of a conformational landscape can only be achieved through the analysis of multiple transient and stable states needed to describe the molecular flexibility in a more accurate manner. The knowledge of these transient and stable states leads to a better description of how structural changes might affect molecular function or interaction affinity, among other properties of interest.
The formulation we introduce here is oriented towards modelling continuous flexibility (Sorzano et al., 2019), which can be used to characterize the motions undergone by a molecule when exploring different states. We have already addressed this problem in our previous work on continuous heterogeneity using normal mode analysis (NMA) (Sanchez Sorzano et al., 2016). However, this process relied on manual selection of the modes describing the structural changes reflected by two cryo-EM maps, thus making the analysis of molecular flexibility more complex for the user. The new algorithm that we propose in this work tries to address this problem by simplifying the analysis for the user.
In our new methodology, there is no longer a normal modes space where some choices have to be made. Instead, a totally new approach is presented here, based on an expansion on a 3D basis that does not require user intervention at all. We have also improved the analysis of pairwise comparisons by introducing a multidimensional scaling algorithm that automatically combines the outputs from two different metrics. Finally, the new algorithm also allows the analysis of local strains and rotations, as done by us earlier (Sorzano et al., 2016), with the advantage of having all the analyses integrated into a single mathematical framework. We provide a more in-depth comparison with alternative methods in Section 2.1.
The paper makes the following major contributions.
(i) The development of an automatic algorithm to analyse continuous heterogeneity of macromolecules through cryo-EM maps.
(ii) Representation of the strain and rotation components defining a transition between two different conformational states.
(iii) Representation of a series of conformations in a structure mapping and consensus of different mappings defined by different comparison metrics.
(iv) A methodology to compare cryo-EM maps with simulated data.
(v) The application of deformation fields to atomic structures to predict different conformations given by a series of cryo-EM maps.
2. Methods
2.1. Determining structural deformations
In order to detect the movements defining a conformational transition between two states of the same macromolecule, we need to determine the displacements that each region of the molecule will undergo between the two states. The key development in this work is the successful expression of the maps in terms of a mathematical basis on which the displacements are calculated. Although full details are provided in Appendices A and B, here it suffices to say that we use a generalized form of Zernike polynomials to expand functions on a ball (as the macromolecule we are interested in is defined inside a spherical volume). This is not the only possible choice of basis functions [for example, it would have been possible to use the Laguerre polynomials described by Provencher & Voguel (2010) or the prolate spheroidal functions (Greengard & Serkh, 2018)], and we do not expect superiority of any of these possible bases as long as all of them are bases of functions defined within a sphere. Additionally, we find that Zernike polynomials have some appealing mathematical properties especially well suited to our problem. Indeed, these Zernike polynomials allow for the expansion of functions on a sphere which do not vanish at the boundaries (so that the more external parts of the macromolecule can move). Moreover, the basis is closed under rotations. In Appendix B we further explore its properties and its relationship to spherical harmonics.
Considering a pair of electron-density maps representing two conformational states of a macromolecule, it is possible to pose the displacement-finding problem as
where V1 and V2 represent two conformations of a given molecule. Here it is important to note that we are measuring the distance between the target and the distorted volumes in terms of the L1 norm. Although it would also have been possible to use the L2 norm, we have chosen this definition as it is more robust to outliers (i.e. it is more robust to those cases where the maps do not match completely or have missing regions). The displacement to be applied to the coordinates of V1 is defined by the deformation field g(r) parameterized through the expansion in Zernike polynomials Zl, n, m(r) (see Appendix A),
where N and L represent the maximum allowed degrees for the Zernike polynomials and the corresponding spherical harmonics, respectively.
The amount of displacement at every point is controlled by the deformation coefficients αl, n, m. Our objective is to find the deformation coefficients that minimize the goal function in equation (1). This is achieved through a Powell's conjugate direction method starting from an initial guess of αl, n, m = 0 for all indices l, n, m and directions x, y, z (that is, no deformation). This initialization of the minimization method assumes that the identity/equilibrium solution (αl, n, m = 0) is close enough to the real solution defining the represented by the cryo-EM maps. Since in most of the cases this assumption is fulfilled, this initial guess allows the minimization method to find the set of coefficients that appropriately describes the motion between the two maps. However, it is important to note that there are many local minima where the minimization process could be trapped. In this respect, and although in our experience the initialization conditions proposed in this work provide results close enough to the ideal solution, there could be cases in which other ways to initialize the algorithm could be more beneficial in terms of minima search.
The deformation field estimated above can be submitted to the local strain and rotation analysis described by Sorzano et al. (2016). This analysis reveals the nature (stretching, compression or rotation) of the local forces acting on V1 to transform it into V2 as well as their local intensity.
In our deformation model, it is possible to divide the movements that a molecule may undergo into `low'- and `high'-frequency movements, depending on how localized these movements are, e.g. a transition from an open to a closed state can be considered a low-frequency movement, while the rotation of a specific α-helix might be a high-frequency movement. Parameters L and N specify the maximum degree of the polynomials used in the description of molecular flexibility. In this way, we may control the maximum frequency of the movements that could be analysed by the basis. Obviously, analysing larger L and N will result in a longer computational time, because more αl, n, m coefficients will need to be determined and there is a higher risk of overfitting. However, the larger the values given to the parameters L and N, the higher the frequencies the algorithm will be able to analyse (although in general, global motions dominate the conformational change; Bahar et al., 2010).
Although in many cases analysing global motions is enough to describe in a precise manner the structural changes a macromolecule may undergo, it can be the case that the motions of interest are focused on a very localized area of the molecule. In that case, being able to go to higher degrees on the basis will allow the algorithm to study those motions specifically, without modifying the areas that should remain still. Another possibility is direct restriction of the structural analysis to any specific region in the macromolecule by centring a sphere on that area and selecting an appropriate radius. In this way, it will not be necessary to reach very high degrees in the basis (thus reducing the computational complexity). However, by imposing these kinds of restrictions the algorithm might include artefacts in the surface of the sphere as the molecular regions outside of it will remain untouched. Depending on the molecule and motions to be analysed, the researcher can decide which analysis will be more appropriate for a specific case.
It is important to mention that only in a very few cases did we need to increase the degree of the basis to analyse a localized motion that we were interested in, or have to play with the regularization parameter to get a better approximation of the deformation fields, since the default values were good enough for most of the experiments we have performed so far.
To reduce the possibility of overfitting as much as possible, we regularize the cost function by adding two penalty terms,
The first term of the regularization penalizes excessive deformation and the second penalizes changes in the mass of V1 due to the deformation. Regularization terms λ1 and λ2 are usually given low values to prevent large deviations from the ideal solution. Nevertheless, both can be set by the user to any value they consider appropriate for their specific analysis. The guideline for their selection should be that the three terms in the goal function should have values of the same order of magnitude. In our implementation, we report the three contributions helping the user to choose these multipliers.
2.2. Relationship to other continuous deformation models
Probably the two most widely continuous deformation models used by the structural biology community in mapping the conformational space of biomolecules (or in analysing cryo-EM images) are principal component analysis (PCA) (Tagare et al., 2015) and normal mode analysis (NMA) (Cui & Bahar, 2006). The three models (PCA, NMA and 3D Zernike) claim to be bases for continuous movements. However, as will be clarified below, they define bases of different mathematical entities.
PCA considers a volume of size N3 voxels as a vector in . Due to the continuous heterogeneity and the uncertainty in the 3D reconstruction process, the reconstructed map can be considered as the mean of a set of other vectors (maps) whose projections are acquired by the microscope. If we consider the covariance matrix associated with that set of maps (a matrix of size N3 × N3), then the principal components form a basis (if the covariance matrix is not degenerate) in which the set of maps can be linearly expressed. The PCA approach approximates the deformed volume by a linear combination of volumes (the principal directions),
where Vn are the eigenvolumes of the PCA decomposition. The undeformed model is then obtained by subtracting the appropriate amount of each of the eigenvolumes,
Due to the low-frequency nature of the PCA principal directions (Sorzano & Carazo, 2021), the undeformed volume is necessarily of low resolution.
In our model, we assume that any deformed volume V2 can be undeformed by applying gL,
Zernike polynomials provide a basis for gL(r), not the volumes. Our model revolves around the location of the voxel (which implies a nonlinear relationship between V1 and V2), providing an intrinsically better handling of the local characteristics of the map, while in PCA there is a linear model at the level of the volumes themselves (not their internal coordinates).
Our approach has another potential advantage over the PCA model: it can easily be applied to atomic structures fitted into V1. For any given atom in the atomic structure at a position r1, that is defined in the same coordinate system as V1, we simply have to move it to the location r1 + gL(r1).
In NMA, volumes are approximated by a set of P pseudoatoms with weights cp and basis function b(r) located at the locations rp (Jonić & Sorzano, 2016),
NMA is based on a second-order Taylor approximation of the energy landscape of the macromolecule, starting with a description of the interactions between the pseudoatoms. This is typically treated using an elastic network model where pseudoatoms within a distance criterion are connected by harmonic springs (Bahar et al., 2010). The associated Hessian is of size 3P × 3P and the normal modes are its eigenvectors (sorted by increasing eigenvalue) and a basis of the space. Let us call the kth normal mode, and the part of the normal mode corresponding to the pth pseudoatom. To deform V2 to make it similar to V1 we consider the first K normal modes with different weights αk,
Similar to our method, NMA acts by displacing the location of the pseudoatoms (our model acts by displacing the location at which we must interpolate V2). However, an advantage of our new method with respect to NMA is that the NMA deformation is only known at the location of the pseudoatoms, while our new method is fully defined within the sphere containing the macromolecule. In this way, the NMA would be a discretized version of the underlying continuous deformation field, while 3D Zernike polynomials would be an estimate of that continuous field.
Summarizing, each of the methods described so far (PCA, NMA and 3D Zernike polynomials) has a basis in different mathematical entities (vectors in , or the set of square integrable functions defined within the sphere of a given radius). 3D Zernike polynomials have the advantage that they are defined for every point in the macromolecule (as opposed to NMA) and the undeformed volumes do not lose resolution (as opposed to PCA).
Elastic deformations have also become popular for the alignment of frames within a movie (Abrishami et al., 2015; Tegunov & Cramer, 2019; Zheng et al., 2017). Although they have not been explicitly used to deform volumes, one could envision that they could be easily extended to three dimensions. This would certainly be a possible approach and we earlier used cubic splines for this purpose (Sorzano et al., 2016). However, the basis used in this paper, which is defined exclusively within a sphere, is more appropriate for the task at hand (describing a function whose support is fully contained within that sphere) than for a more generic set of functions that constitute a basis of functions defined within a cube. This `greater appropriateness' translates into requiring fewer coefficients to express the same deformation field to the same level of accuracy.
2.3. Distances between a set of maps
In most practical cases, the number of states that can be reconstructed by cryo-EM SPA is larger than two, which naturally implies the generalization of the case presented above to a number of pairwise operations capturing the different structural relationships among the set of maps under consideration. This information is summarized in a graph known as a structure map (Sanchez Sorzano et al., 2016) or conformational landscape (Zhang et al., 2021b), which represents each conformation as a point in conformational space. The closer two points are in the structure map, the more similar they are.
By estimating the Zernike polynomial deformation for all possible pair combinations in a set of N cryo-EM maps, a distance matrix can be computed in which we measure how far two cryo-EM maps are from each other. The deformation field between the two cryo-EM maps gL(r) provides a mechanism for calculating such a distance. For instance, we may define the distance between two cryo-EM maps V1 and V2 as the sum of the lengths of the deformations at each point,
Besides equation (9), there are additional sensible ways of defining the distance between two cryo-EM maps. One of them consists of measuring the correlation between V1 and V2 once V2 is undeformed to resemble V1,
where ρ is Pearson's correlation coefficient.
By comparing all cryo-EM maps, we would construct a matrix of the distances of all versus all maps.
It is worth mentioning here that, in order to get accurate comparison measurements, it is desirable to have a set of cryo-EM maps with similar characteristics. In particular, it is important to filter the maps in the set so that all their resolutions match the lowest value present in the data set. In this way, the structure mappings and distance matrices will not be affected by resolution changes, leading to a more meaningful projection of the different maps in the low-dimensional space resulting from the application of this method.
2.4. Embedding of conformations using multiple multidimensional scaling
Once we have the above-mentioned distance matrix, we may use multidimensional scaling (MDS) (Härdle & Simar, 2012) to find points in a low-dimensional space of dimension p (typically p = 2 or p = 3 for ease of representation) such that the distances between points in the low-dimensional space represent in some form the distances between the cryo-EM maps in the full dimensionality space [e.g. equation (9)]. For a detailed description of MDS, see Härdle & Simar (2012). If we have N cryo-EM maps to compare, let us refer to the matrix collecting all the points in the low-dimensional space as X1 [, that is, the set of cryo-EM maps of size N × p]. The subscript 1 indicates that we used d1 to perform the low-dimensional mapping.
If instead of equation (9) we use equation (10), then this would give us another MDS representation X2. While the distance d1 concentrates on the amount of deformation required to transform V1 into V2, d2 describes the distance between V1 and V2 after applying the inverse deformation to V2.
We could similarly conceive other strategies to measure the distance between any pair of cryo-EM maps V1 and V2. None of them should necessarily be better than the others, since each one addresses the problem from a different perspective. In this regard, it is impossible to favour any one of the different metrics without a specific task to accomplish. However, it is still sensible to combine the different mappings induced by each one of the distances as a way of producing a single summary of all their information. For the task of producing such a summary, we propose to construct a combination of the embeddings that minimizes the of the result, understanding that the is reduced when more order is found.
At this point and following the aforementioned idea, we may want to combine all those low-dimensional mappings into a single set of points to summarize the relative distances derived from each distance definition. For doing so we have found useful the following procedure that we call multiple multidimensional scaling:
(i) We take one of the mappings as reference, for instance, X1.
(ii) We look for the affine transformation Ti that minimizes the Frobenius norm [for an arbitrary matrix A, its Frobenius norm is defined as ] between each Xi transformed mapping and the reference mapping (since the MDS mappings of different distances, performed in an independent way, normally result in mappings of different scales, central locations, rotations and mirrors),
For convenience of notation, let us define T1(X1) = X1.
(iii) The consensus mapping is constructed as the convex combination of all transformed mappings (the determination of the specific αi coefficients for the combination will be addressed in the following step),
with the constraints αi ≥ 0 and (with these constraints Xα is said to be a convex combination of the input matrices). Note that the jth row of the matrix Xα (referred to as xj, α) indicates the position of the jth cryo-EM map in the low-dimensional space (whose dimension is p). For each one of the consensus candidates we associate the probability density function
where Gσ is a p-multivariate spherical Gaussian whose covariance matrix is σ2I {in our experiments, we chose , where range(Xi) is the difference between the maximum and minimum values of any of the components of the mapped vectors}.
(iv) Since the best combination of coefficients αi is not known beforehand, each possible convex combination has to be analysed. The criterion followed was to look for the convex combination that minimized the Shannon of the probability density function defined above,
The rationale is that we are looking for the convex combination that brings maximum order to the low-dimensional mapping.
We observe that the procedure described above normally finds a good balance between the properties of the different low-dimensional mappings, resulting in well structured summaries.
3. Results
This algorithm has been implemented in Xmipp (de la Rosa-Trevín et al., 2013) and it is available through Scipion (de la Rosa-Trevín et al., 2016) under the protocols named volume deform - Zernike3D and struct map - Zernike3D.
We performed some tests with a pair of maps to compare these two implementations to analyse the performance improvement. The maps used for the tests had dimensions of 250 in X, Y and Z, leading to averaged execution times of 1 h and 20 min (CPU) and 39.5 s (GPU). The tests were performed with an Intel i7-9750H and a Nvidia 2060 with Cuda 10.1, respectively.
3.1. Experiment 1: cryo-EM maps of the human mitochondrial ribosome
We first tested our approach using a small data set covering a range of conformational states of a human mitochondrial ribosome (Amunts et al., 2015), as previously described by Sanchez Sorzano et al. (2016). To check whether the structure map suggested two independent (pre-translocation and post-translocation) states following different conformational transitions as found in our previous study (Sanchez Sorzano et al., 2016), we applied the methodology described above with N = 3 and L = 2 (the maximum allowed degrees for the Zernike polynomials and the spherical harmonics, respectively). As expected, the structure map indeed suggests two independent arrangements following their own conformational transitions, grouped as red and blue dots in Fig. 1 (the black line segments joining the dots are just provided to enhance visualization). We thus conclude that the new approach is capable of reproducing the results of previous supervised methods that perform similar analyses and accurately groups the seven cryo-EM structures [indicated by their EMDB (Electron Microscopy Data Bank, https://www.ebi.ac.uk/emdb/) identification numbers] into two groups of conformers, each representative of a different functional state.
Additionally, the new approach also allows for the local decomposition of the deformation field into local strains and local rotations, as was done by Sorzano et al. (2016). The representation of these two components is shown in Fig. 2 for one of the pairs of ribosomes (EMDB entries 1720 and 1723). In addition, Video 1 in the supporting information shows the conformational changes described by these two maps. For this video, we coloured the ribosomes using the rotation component represented in Fig. 2 to simplify their comparison. According to this analysis, the rotations appear to be distributed through the whole structure of the ribosome, although the larger rotations (shown in red) are mostly found in the small subunit. Similarly, the strains are mainly localized in the small subunit and appear to be less distributed. This reveals that the basis is capable of deforming in a localized fashion, leading to a better description and identification of the different movements that define the transition between the two conformations. It is also possible to see that we are obtaining results comparable with those found by Sorzano et al. (2016), with the advantage of having all these analyses unified in the same framework, which implies an overall simplification leading to more complete studies.
3.2. Experiment 2: trajectory recovery of the CCT complex
Our next experiment is aimed at characterizing the ability of the method to recover the sequence of events present in a set of conformations defining a certain trajectory in conformational space. Such conformations can be created computationally by taking advantage of biophysical methods such as ) and normal mode analysis (NMA) (Bahar et al., 2010), simulating the movements defining a transition between two conformations. In this case, we used a trajectory from a recent study (Zhang et al., 2021b), which was generated using a purely NMA-based approach called the adaptive anisotropic network model (adaptive ANM; Yang et al., 2010) implemented in ProDy (Zhang et al., 2021a). This gave us 30 different models along an open–closed transition of the mammalian chaperonin CCT complex between two atomic models derived from a previous cryo-EM study (Cong et al., 2012), taken from the Protein Data Bank (PDB) (wwPDB Consortium, 2019), as described by Zhang et al. (2021b). The starting structure with one ring open and one ring closed (PDB entry 4a0w) (Cong et al., 2012) corresponded to an ATP-bound state and the target structure with both rings in an intermediate conformation (PDB entry 4a13) (Cong et al., 2012) corresponded to the ADP-bound state, allowing us to explore the conformational changes triggered by ATP hydrolysis. In the adaptive ANM method, all steps are based on coarse-grained normal modes calculated using the anisotropic network model (ANM) (Atilgan et al., 2001; Doruker et al., 2000; Eyal et al., 2006), providing coordinate changes for Cα atoms only. At each step, normal modes were selected that had the highest directional overlap (correlation cosine) with the deformation vector between the current conformation and the target structure up until the sum of the squared overlaps exceeded a threshold of 0.4. The contribution of each mode to the deformation was chosen so as to take 20% of the maximum provided by the unnormalized dot products (a scaling factor of 0.2) so as to avoid unphysical deformations while maintaining efficiency. The normal modes were recalculated until the root-mean-square deviation (r.m.s.d.) from the target structure fell below 1 Å, resulting in a total of 30 steps. Each step recruited a larger number of modes and had a smaller total size as the required deformation became less cooperative and more local (see Video 2). We focus our discussion on the ring that goes from open to intermediate–closed for simplicity.
simulations (MD) (Adcock & McCammon, 2006We then transformed these atomic structures into Coulomb potential maps using the electron atomic scattering factors (EASFs) as described in previous work (Sorzano et al., 2015). Fig. 3 shows the structure maps recovered after applying our methodology. We can see that the sequential order of the 30 intermediate conformers along the trajectory was successfully recovered by our approach. The direction, however, is arbitrary and in this case the start of the trajectory was numbered as conformer 30 and the end as conformer 1.
With this example, we additionally illustrate the distinct MDS mappings obtained when the distances d1 [amount of deformation, Fig. 3 (top)] and d2 [similarity after deformation, Fig. 3 (middle)] are used. Although the trajectory was successfully recovered by both distances, the correlation distance d2 was slightly more accurate in this case. The reason is that most of the changes between the structures at the end of the transition (labelled 1 to 13 by the algorithm) are high-frequency movements (i.e. movements of loops or small α-helices and β-sheets) that cannot be fully captured by the Zernike 3D basis with N = 3 and L = 2 (although larger N and L would allow one to express these small-detail movements, they would also increase the computational cost). Fortunately, the consensus mapping [Fig. 3 (bottom)] is able to identify the existence of high-frequency movements and gives more weight to the d2 mapping (correlation distance) automatically, resulting in an almost exact recovery of the volume sequence along the trajectory.
At least in this case we can conclude that d1 is very good for describing the low-frequency movements (e.g. C23–C30), while d2 is very good for characterizing the high-frequency differences (ca C1–C13), and both perform well in the intermediate-frequency regime. Depending on whether our set of input maps are related by large or small movements, one distance or the other will be better suited to capturing the overall set of relationships. The consensus mapping will thus analyse both mappings and automatically determine the optimal weight that results in a low-dimensional mapping that can be readily interpreted.
3.3. Experiment 3: comparison of atomic models and cryo-EM maps from the rabbit ryanodine receptor RyR1
In the following example, we explored the possibility of matching (pseudo/simulated) cryo-EM maps derived from atomic models with experimental 5tam, 5tau, 5taz, 5tb4 and 5t9n).
maps in the same low-dimensional space. For this purpose we selected five experimental cryo-EM maps deposited for the ryanodine receptor 1 (RyR1) from rabbit (EMDB entries 8379, 8385, 8390, 8395 and 8373) and their respective atomic models in the PDB (PDB entriesFirst, we converted the atomic models into density maps using EASFs, as described in the previous section. Then, to make the cryo-EM maps and atomic models comparable, we also filtered all volumes in the analysis to a common resolution (specifically, to the lowest of the reported resolutions of the cryo-EM maps). Note that without applying this low-pass filter the minimization process of equation (1) might not reach a meaningful minimum. Finally, we applied the method presented in this work to this combined data set.
Our results, shown in Fig. 4, report the main difficulties that appear when mixing simulated and experimental cryo-EM maps. While the structure map based on d1 (the distance based on the amount of deformation) illustrates that many pairs are correctly placed together, the structure map based on d2 (the distance based on the similarity after undeforming) discriminated between maps derived from atomic models and maps coming from cryo-EM experiments. However, the point of this example was to intermix maps from different origins, so discrimination by origin was to be minimized, requiring a further adjustment to our approach. To tackle this problem, we extended our methodology by analysing separately the sub-blocks of the distance matrix including only atomic or only cryo-EM maps [see Fig. 5 (top)]. We thus performed the MDS of each one of the sub-blocks independently, obtaining the low-dimensional mappings XAA and XCC (the subscript indicates whether it corresponds to atomic/computational or cryo-EM/experimental maps). These two low-dimensional mappings were the input into the consensus procedure described in Section 2.4. Focusing on the consensus, we can see that the information provided by the two mappings XAA and XCC is combined into two different trajectories corresponding to each dimension in the distance matrices (simulated and experimental cryo-EM maps) that show a similar distance relationship among their points, illustrating that both trajectories correspond to the same states of RyR1. Therefore, the counterpart of each other, and their relative distances/positions, are retained [Fig. 5 (bottom)].
3.4. Experiment 4: application of the deformation field to atomic models of the CCT complex
We described our deformation field gL as a function that deforms V1 to let it become similar to V2, that is, as we have done in previous cases, acting only on two cryo-EM maps. However, since the deformation field is defined in the coordinate system of V1, it can also be applied to atomic models defined in the same coordinate system and not only to maps. In this way, we can also deform an atomic model defined for V1 and use it as starting point for a model of V2. Obviously, since the new atomic model defined in the coordinate system of V2 has been constructed purely based on geometrical considerations, all the stereochemical constraints have to be further imposed.
An example of an atomic model deformed following the previous procedure is presented in Fig. 6. The example was taken from the same data set as used in Experiment 2, which shows an open–closed transition of the CCT complex. The figure illustrates how the deformation applied to the atomic model of the open conformation results in an approximation to the closed conformation. Naturally, we can now compare this deformed model representing the closed conformation with the one obtained directly from the experimental map of the closed conformation. The r.m.s.d. (computed with ChimeraX) between these two models was 5.29 Å, certainly high, but substantially reduced compared with that between the open–closed models without applying any deformation, which was 7.90 Å. This r.m.s.d. reduction suggests that the deformation applied is appropriately reproducing a conformational change in the right direction, from open towards the closed state.
However, the overall scores obtained for the two deformed structures still show a high value, as many stereochemical features are not taken into account when computing the deformations. In order to improve the geometry of the deformed structures, we applied a real-space Phenix software (Liebschnerm et al., 2019) with the default parameters] to the predicted structures using their respective electron-density maps. After this the r.m.s.d. value measured before decreased further to 4.52 Å. As a conclusion, the combination of deformation and of atomic structures enables us to achieve predictions of different structure conformations on the path between two end points, suitable for performing further studies, though there is clearly room for improvement. For example, refining in between smaller deformations could be of benefit, e.g. in hybrid simulations methods where local refinement/simulation complements global deformations (Krieger et al., 2020).
[executing4. Conclusions
The development of automatic algorithms to study continuous flexibility presented in this work results in simplified yet precise procedures, avoiding the need for user interference with the software and increasing the reproducibility of the results. It is also a significant step forward with respect to approaches like PCA and NMA of cryo-EM maps, avoiding the need to select components or modes and producing localized analysis.
The way this new approach works is by defining a new 3D basis where all deformation occurs. It is conceptually similar to the Fourier transform. The movements defining a transition between two different conformational states are decomposed into different components (that can be regarded as low-, medium- and high-frequency movements). Those components will depend on the degree of the basis used in the calculations. The displacements needed along each different component to minimize the distances between two electron-density maps are stored in a series of deformation coefficients αl, n, m, which can be further analysed to obtain the local strains and rotations undergone by the macromolecules during conformational transitions. The new approach thus unifies two of our previous developments (NMA and strain/rotation component extraction) for the analysis of continuous heterogeneity.
Apart from the information extracted from the deformation coefficients, our method allows for the definition of a distance measure based on the deformed electron-density maps, which is useful for building distance matrices. These distance matrices can be used afterwards to recover structure mappings that show the structural relationships existing among the diverse conformational states. Different definitions of the distance measures may focus on different aspects of the comparison. For this reason, we have devised a new procedure to combine several low-dimensional mappings into a single consensus mapping based on a minimum
criterion that tends to produce well ordered low-dimensional mappings and outperforms the results obtained by individual distance metrics.The possibility of converting atomic models back to electron densities opens the possibility of a combined analysis on maps and models in the same conformational space. An illustrative example has been provided in Experiment 4, where cryo-EM maps, together with their respective structural models between two end points, have been represented in the same space as a set of experimental cryo-EM maps.
In the future, it may be interesting to explore alternative bases for the deformation field and the distance between volumes (like the Wasserstein distance).
APPENDIX A
3D real-valued generalized Zernike polynomials
In this section we discuss the functions that we use as basis functions of the deformations in the unit ball B. We use the generalized Zernike polynomials defined on the 3D ball; in Appendix B1, we briefly review the relation of this basis to the better-known 2D form of Zernike polynomials.
In general, the expansion of any real valued function g(r) ∈ L2(B) in this basis is defined by the formula
where αl, n, m are real-valued coefficients, and Zl, n, m(r) are the 3D real-valued (normalized) generalized Zernike polynomials defined by the formula
where r is the radial component of the 3D coordinate r, θ and ϕ are its polar and azimuthal angles, respectively, in spherical coordinates, n and l are non-negative integers, and m is an integer such that −l ≤ m ≤ l. We refer to l as the spherical frequency. ylm is the real-valued spherical harmonic defined by the formula
where P ml are the associated Legendre polynomials defined by the formula
The real and imaginary parts of the complex-valued spherical harmonics are available in standard textbooks such as that by Abramowitz & Stegun (1966). For completeness, Table 1 shows these spherical harmonics in Cartesian coordinates. Before defining the normalized generalized radial Zernike polynomials denoted by above, we define the (unnormalized) generalized radial Zernike polynomials Rl,n1 as follows:
where are the Jacobi polynomials,
The definitions and properties of the standard associated Legendre polynomials and Jacobi polynomials are available, inter alia, in the book by Abramowitz & Stegun (1966).
Finally, while the radial polynomials are orthogonal (with the appropriate norm), they are not orthonormal. This is easily corrected by replacing the radial polynomials with the normalized generalized radial Zernike polynomials, denoted by ,
The parameter p is associated with the choice of inner product and the dimensionality of the balls; in the case of a 3D ball, the natural choice of p is p = 1, which yields the basis in (16) which is orthonormal in the natural inner product on L2(B). We give in Table 2 the explicit list of radial functions that we use.
We recognize that our choice of basis functions for the expansion is certainly not the only possible choice. We used the Zernike polynomials (in the generalized form presented here) to obtain an expansion of functions in a ball which do not vanish at the boundaries. Our use of Zernike polynomials also yields a basis that is closed under rotations. The graphical representation of some components of the basis is shown in Fig. 7.
APPENDIX B
Properties and relationships of this basis
B1. Properties of the polynomials involved – Zernike polynomials
We note that slightly different definitions and normalization are used in different sources; the most commonly used form of Zernike polynomials is associated with 2D functions on the unit disc, whereas we are interested in 3D functions on the unit ball. The better-known traditional radial Zernike polynomials, denoted here by , are a special case of the generalized radial Zernike polynomials Rn,l p (x),
with if l − m is odd or if m > l. The definition of the 3D real-valued Zernike polynomial [equation (16)] is analogous to the definition of the traditional 2D Zernike polynomials Zlm on the unit disc,
Unfortunately, the common notation for the 2D and 3D cases can be misleading: the parameter m here plays the role of the parameter l in the analogous 3D case; the parameter m in the 3D case is related to the existence of both sine and cosine for each m here, but does not otherwise have an immediate counterpart here.
Some properties of these generalized Zernike polynomials, including the higher-dimensional cases, are discussed in further detail by Slepian (1964), Serkh (2015), Greengard & Serkh (2018) and Lederman (2017).
The radial Zernike polynomials in equation (19) are orthogonal with respect to the inner product,
so that if n1 ≠ n2. Note that they are not necessarily orthogonal for different l, that is, is not 0, in general. It follows that the Zernike polynomials Zl, n, m (16) are orthogonal (across all different combinations of n, l and m) on the natural inner product on the unit ball.
B2. A remark on numerical evaluation
As is the case with many orthogonal polynomials, the direct computation using the explicit sum of monomials is generally unstable and not recommended in numerical computation. However, in this work, since we truncate the polynomials at low n, the explicit form has been found experimentally to be sufficiently stable. For additional details on computation see Lederman (2017).
B3. Closure under rotations
We recall that we use basis functions defined in equation (15), which are composed of a radial component and an angular component. Furthermore, we truncate the expansion in equation (2) such that if Zl, n, m(r) is in the expansion, then is also in the expansion for any −l ≤ m, m′ ≤ m. As is well known, the rotation of the frame of reference of spherical harmonics of a given spatial frequency l is a unitary operation and closed rotations. If follows that the linear combination of is closed under rotations. In other words, regardless of the frame of axis we choose for our spherical harmonics, we can represent the same functions using our choice of basis.
We recall that the deformation field gL(r) is a 3D vector defined in equation (2) at any point r. If we rotate the axes x, y and z we must rotate the vector gL(r) to obtain a vector in the new coordinate system. As is well known,
where A is the appropriate unitary rotation matrix.
It follows that the basis we have chosen is closed under rotations; regardless of the orientation of the frame of reference we choose (but depending on the position of the centre), we can represent the same fields. Furthermore, the transformation between frames of reference is unitary. We note that, since the volume is defined on the grid, the problem definition is not entirely closed under rotations, although the definition of the field is closed under rotations.
APPENDIX C
Complex-valued Zernike 3D basis
We may extend our basis function to a complex-valued Zernike 3D basis. This second basis uses spherical harmonics, which are well known basis functions for functions defined over the surface of a unit sphere. For doing so, we should define [see equation (16)],
where are the standard spherical harmonics,
It is well known that the spherical harmonics are a complete orthonormal basis on the surface of the unit sphere such that
where Ω is the surface of the unit sphere.
In this new basis the expansion would be expressed as
The relationship between the expansion that uses the real-valued basis functions and the expansion that uses the complex-valued basis functions is
and for the basis functions
Using the fact that the radial Zernike polynomials are a complete radial orthonormal basis, and the fact that spherical harmonics are a complete orthonomal basis on the the surface of a sphere, one can show that the generalized Zernike polynomials are a complete orthonormal basis of function on the unit ball with respect to the natural L2 norm on the unit ball.
Suppose that r is a 3D coordinate of a point in real space and R is a 3D frequency. Then the Fourier transform of the complex-valued basis function would be
where 〈r, R〉 is the usual inner product in , R is the modulus of R and Jα(x) is the Bessel function of the first kind and order α.
Supporting information
Video 1. DOI: https://doi.org/10.1107/S2052252521008903/eh5012sup1.mp4
Video 2. DOI: https://doi.org/10.1107/S2052252521008903/eh5012sup2.avi
Funding information
The project that gave rise to these results received the support of a fellowship from `la Caixa' Foundation (No. LCF/BQ/DI18/11660021). This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement (No. 713673) and the European Regional Development Fund Project `CERIT Scientific Cloud' (No. CZ.02.1.01/0.0/0.0/16_013/0001802). This work was also partially supported by the National Institutes of Health (P41GM103712 to IB), by a MolSSI Seed Software Fellowship (to JK) and by the NIH/NIGMS (No. 1R01GM136780-01 to RRL). Funding is also acknowledged from the Ministry of Science and Innovation through grants: SEV 2017-0712, PID2019-104757RB-I00/AEI48/10.13039/501100011033; the `Comunidad Autonoma de Madrid' through grant: S2017/BMD-3817; and the European Union (EU) and Horizon 2020 through grant: HighResCells (ERC - 2018 - SyG, Proposal: 810057) and iNEXT-Discovery (Proposal: 871037).
References
Abramowitz, M. & Stegun, I. A. (1966). Handbook of Mathematical Functions. New York: National Bureau of Standards. Google Scholar
Abrishami, V., Vargas, J., Li, X., Cheng, Y., Marabini, R., Sorzano, C. S. & Carazo, J. M. (2015). J. Struct. Biol. 189, 163–176. Web of Science CrossRef PubMed Google Scholar
Adcock, S. A. & McCammon, J. A. (2006). Chem. Rev. 106, 1589–1615. Web of Science CrossRef PubMed CAS Google Scholar
Amunts, A., Brown, A., Toots, J., Scheres, S. H. W. & Ramakrishnan, V. (2015). Science, 348, 95–98. Web of Science CrossRef CAS PubMed Google Scholar
Atilgan, A. R., Durell, S. R., Jernigan, R. L., Demirel, M. C., Keskin, O. & Bahar, I. (2001). Biophys. J. 80, 505–515. Web of Science CrossRef PubMed CAS Google Scholar
Bahar, I., Lezon, T. R., Yang, L. W. & Eyal, E. (2010). Annu. Rev. Biophys. 39, 23–42. Web of Science CrossRef CAS PubMed Google Scholar
Carroni, M. & Saibil, H. R. (2016). Methods, 95, 78–85. Web of Science CrossRef CAS PubMed Google Scholar
Cong, Y., Schröder, G. F., Meyer, A. S., Jakana, J., Ma, B., Dougherty, M. T., Schmid, M. F., Reissmann, S., Levitt, M., Ludtke, S. L., Frydman, J. & Chiu, W. (2012). EMBO J. 31, 720–730. Web of Science CrossRef CAS PubMed Google Scholar
Cui, Q. & Bahar, I. (2006). Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems. Boca Raton: CRC press. Google Scholar
Doruker, P., Atilgan, A. R. & Bahar, I. (2000). Proteins, 40, 512–524. Web of Science CrossRef PubMed CAS Google Scholar
Eyal, E., Yang, L. W. & Bahar, I. (2006). Bioinformatics, 22, 2619–2627. Web of Science CrossRef PubMed CAS Google Scholar
Greengard, P. & Serkh, K. (2018). arXiv:1811.02733. Google Scholar
Härdle, W. K. & Simar, L. (2012). Multidimensional Scaling, pp. 397–412. Heidelberg: Springer. Google Scholar
Jonić, S. & Sanchez Sorzano, C. O. (2016). IEEE J. Sel. Top. Signal. Process. 10, 161–173. Google Scholar
Krieger, J. M., Doruker, P., Scott, A. L., Perahia, D. & Bahar, I. (2020). Curr. Opin. Struct. Biol. 64, 34–41. Web of Science CrossRef CAS PubMed Google Scholar
Lederman, R. R. (2017). arXiv:1710.02874. Google Scholar
Liebschnerm, D., Afonine, P. V., Baker, M. L., Bunkóczi, G., Chen, V. B., Croll, T. I., Hintze, B., Hung, L. W., Jain, S., McCoy, A. J., Moriarty, N. W., Oeffner, R. D., Poon, B. K., Prisant, M. G., Read, R. J., Richardson, J. S., Sammito, M. D., Sobolev, O. V., Stockwell, D. H., Terwilliger, T. C., Urzhumtsev, A. G., Videau, L. L., Williams, C. J. & Adams, P. D. (2019). Acta Cryst. 65, 861–877. Google Scholar
Provencher, S. W. & Vogel, R. H. (1988). Ultramicroscopy, 25, 209–221. CrossRef CAS PubMed Web of Science Google Scholar
Rosa-Trevín, J. M. de la, Otón, J., Marabini, R., Zaldívar, A., Vargas, J., Carazo, J. M. & Sorzano, C. O. S. (2013). J. Struct. Biol. 184, 321–328. Web of Science PubMed Google Scholar
Rosa-Trevín, J. M. de la, Quintana, A., del Cano, L., Zaldívar, A., Foche, I., Gutiérrez, J., Gómez-Blanco, J., Burguet-Castell, J., Cuenca-Alba, J., Abrishami, V., Vargas, J., Otón, J., Sharov, G., Vilas, J. L., Navas, J., Conesa, P., Kazemi, M., Marabini, R., Sorzano, C. O. S. & Carazo, J. M. (2016). J. Struct. Biol. 195, 93–99. Web of Science PubMed Google Scholar
Sanchez Sorzano, C. O., Alvarez-Cabrera, A. L., Kazemi, M., Carazo, J. M. & Jonić, S. (2016). Biophys. J. 110, 1753–1765. Web of Science CAS PubMed Google Scholar
Schur, F. K. M. (2019). Curr. Opin. Struct. Biol. 58, 1–9. Web of Science CrossRef CAS PubMed Google Scholar
Serkh, K. (2015). arXiv:1811.02733. Google Scholar
Slepian, D. (1964). Bell Labs Technical J., 43, 3009–3057. CrossRef Web of Science Google Scholar
Sorzano, C. O. S. & Carazo, J. M. (2021). Acta Cryst. D77, 835–839. Web of Science CrossRef IUCr Journals Google Scholar
Sorzano, C. O. S., Jiménez, A., Mota, J., Vilas, J. L., Maluenda, D., Martínez, M., Ramírez-Aportela, E., Majtner, T., Segura, J., Sánchez-García, R., Rancel, Y., del Caño, L., Conesa, P., Melero, R., Jonic, S., Vargas, J., Cazals, F., Freyberg, Z., Krieger, J., Bahar, I., Marabini, R. & Carazo, J. M. (2019). Acta Cryst. F75, 19–32. Web of Science CrossRef IUCr Journals Google Scholar
Sorzano, C. O. S., Martín-Ramos, A., Prieto, F., Melero, R., Martín-Benito, J., Jonic, S., Navas-Calvente, J., Vargas, J., Otón, J., Abrishami, V., de la Rosa-Trevín, J. M., Gómez-Blanco, J., Vilas, J. L., Marabini, R. & Carazo, J. M. (2016). J. Struct. Biol. 195, 123–128. Web of Science CrossRef CAS PubMed Google Scholar
Sorzano, C. O. S., Vargas, J., Otón, J., Abrishami, V., de la Rosa-Trevín, J. M., Riego, S., Fernández-Alderete, A., Martínez-Rey, C., Marabini, R. & Carazo, J. M. (2015). AIMS Biophys. 2, 8–20. CAS Google Scholar
Tagare, H. D., Kucukelbir, A., Sigworth, F. J., Wang, H. & Rao, M. (2015). J. Struct. Biol. 191, 245–262. Web of Science CrossRef CAS PubMed Google Scholar
Tegunov, D. & Cramer, P. (2019). Nat. Methods, 16, 1146–1152. Web of Science CrossRef CAS PubMed Google Scholar
wwPDB Consortium (2019). Nucleic Acids Res. 47, D520–D528. Web of Science CrossRef PubMed Google Scholar
Yang, Z., Májek, P. & Bahar, I. (2010). PLoS Comput. Biol. 5, e1000360. Web of Science CrossRef Google Scholar
Zhang, S., Krieger, J. M., Zhang, Y., Kaya, C., Kaynak, B., Mikulska-Ruminska, K., Doruker, P., Li, H. & Bahar, I. (2021a). Bioinformatics, doi: 10.1093/bioinformatics/btab187. Google Scholar
Zhang, Y., Krieger, J., Mikulska-Ruminska, K., Kaynak, B., Sorzano, C. O. S., Carazo, J. M., Xing, J. & Bahar, I. (2021b). Prog. Biophys. Mol. Biol. 160, 104–120. Web of Science CrossRef CAS PubMed Google Scholar
Zheng, S. Q., Palovcak, E., Armache, J. P., Verba, K. A., Cheng, Y. & Agard, D. A. (2017). Nat. Methods, 14, 331–332. Web of Science CrossRef CAS PubMed Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.