 1. Introduction
 2. A normalized correlation coefficient as a proximity measure between threedimensional objects
 3. The superposition algorithm for the alignment of two heterogeneous objects
 4. Optimal resolution parameters for the NCC calculation
 5. Comparison of the performance of SUPCOMB and SUPALM
 6. Additional test cases for SUPALM alignments
 7. Discussion and conclusions
 References
 1. Introduction
 2. A normalized correlation coefficient as a proximity measure between threedimensional objects
 3. The superposition algorithm for the alignment of two heterogeneous objects
 4. Optimal resolution parameters for the NCC calculation
 5. Comparison of the performance of SUPCOMB and SUPALM
 6. Additional test cases for SUPALM alignments
 7. Discussion and conclusions
 References
research papers
Rapid automated superposition of shapes and macromolecular models using spherical harmonics
^{a}Laboratory of Reflectometry and SmallAngle Scattering, A.V. Shubnikov Institute of Crystallography of Federal Scientific Research Centre `Crystallography and Photonics', Russian Academy of Sciences, Leninsky prospekt 59, Moscow, 119333, Russian Federation, and ^{b}Hamburg Outstation, European Molecular Biology Laboratory, Notkestrasse 85, Hamburg, 22607, Germany
^{*}Correspondence email: svergun@emblhamburg.de
A rapid algorithm to superimpose macromolecular models in Fourier space is proposed and implemented (SUPALM). The method uses a normalized integrated crossterm of the scattering amplitudes as a proximity measure between two threedimensional objects. The reciprocalspace algorithm allows for direct matching of heterogeneous objects including high and lowresolution models represented by atomic coordinates, beads or dummy residue chains as well as density maps and inhomogeneous multiphase models (e.g. of protein–nucleic acid complexes). Using spherical harmonics for the computation of the amplitudes, the method is up to an order of magnitude faster than the realspace algorithm implemented in SUPCOMB by Kozin & Svergun [J. Appl. Cryst. (2001), 34, 33–41]. The utility of the new method is demonstrated in a number of test cases and compared with the results of SUPCOMB. The spherical harmonics algorithm is best suited for lowresolution shape models, e.g. those provided by solution scattering experiments, but also facilitates a rapid crossvalidation against structural models obtained by other methods.
Keywords: smallangle scattering; protein structure; model superposition; scattering amplitudes; SUPALM.
1. Introduction
Smallangle scattering of Xrays and neutrons (SAXS and SANS) is more and more actively employed for structural studies of biomacromolecular solutions, including hybrid modelling in combination with other structural methods (Svergun et al., 2013). The SAXS/SANS analysis methods provide threedimensional models of different nature and resolution, and comparisons between such heterogeneous models are often required for crossvalidation of structural results obtained by different techniques. The comparisons usually require automated bestmatching superposition of threedimensional structures. Superposition of heterogeneous models, which may not only be different in resolution and representations (atomic models, bead models, dummy residue chains, density maps, surfaces etc.) but also contain parts with significantly different density (e.g. nucleoprotein complexes), is not a trivial task.
We previously developed a program SUPCOMB (Kozin & Svergun, 2001) for matching high and lowresolution threedimensional structures, which uses a normalized spatial discrepancy (NSD) as a proximity measure between the objects. For every point (bead or atom) in the first model, the closest neighbouring point in the second model is found, and the same is done for all points in the second model. The squared proximal distances are added and normalized against the squared average distances between neighbouring points for the two models, yielding the NSD value for the given relative positions of the models. Starting from an inertiaaxes alignment, the algorithm minimizes the NSD and finds the bestmatching alignment of the structures. However, the CPU time used by SUPCOMB is proportional to the product of the number of points (beads or atoms) representing the two objects and for large macromolecular complexes the program becomes computationally expensive.
There are a number of fitting algorithms and programs in SITUS (Chacón & Wriggers, 2002), FRM (Kovacs et al., 2003), FOLDHUNTER (Jiang et al., 2001), COAN (Volkmann & Hanein, 2003), NORMA (Suhre et al., 2006) and ADP_EM (Garzón et al., 2007). All these methods use a realspace crosscorrelation coefficient as a proximity measure between the models, some of them (e.g. ADP_EM) utilizing a spherical harmonics expansion of the density function. Most algorithms employ an exhaustive search of the sixdimensional parameter space (three rotations and three translations) using different mathematical transformations to speed up the process (fast Fourier transforms and/or spherical harmonics), while some of them perform rapid local optimization algorithms making use of the score gradient (e.g. steepest ascent). The above methods are similar in performance and the accuracy of the results largely depends on the software implementation.
(EM) to fit highresolution models into lowerresolution electrondensity maps, such asHere, we propose an algorithm for a fast matching of macromolecular models based on the spherical harmonics representation of the scattering amplitudes in Fourier space. The method uses a normalized integrated crossterm of the scattering amplitudes as a proximity measure between two lowresolution threedimensional objects. The method is implemented in a program which is significantly faster than SUPCOMB and is directly applicable for comparisons of lowresolution shapes and heterogeneous models of different nature (atoms, beads, EM maps etc.).
2. A normalized as a proximity measure between threedimensional objects
For an arbitrary threedimensional object with scattering density ρ_{A}(r) in real space, the scattering amplitude A(s) can be represented using spherical harmonics as (Stuhrmann, 1970)
where the momentum transfer s = 4π sin (θ/2)/λ with θ the scattering angle and λ the wavelength, Ω is the solid angle in s = (s, Ω), and the truncation value L defines the resolution. For an object represented by N points (atoms, beads or density values) the partial amplitudes A_{lm}(s) are calculated as
where the sum runs over all elements with coordinates r_{j} = (r_{j}, ω_{j}) = (r_{j}, θ_{j}, φ_{j}) and f_{j}(s) are the corresponding form factors.
Owing to the orthogonal properties of the spherical harmonics, a spherically averaged scattering intensity I_{A}(s) (e.g. as measured in a smallangle experiment from an ensemble of randomly oriented particles) is expressed as the sum of individual multipole contributions:
Let us now consider two threedimensional objects with densities ρ_{A}(r) and ρ_{B}(r). The best superposition of these objects should maximize the integral correlation expressed as ∫[ρ_{A}(r) + ρ_{B}(r)]^{2} dr. Following Parseval's theorem (Parseval des Chênes, 1799) this integral is equal to the total intensity in
where I_{A}(s) and I_{B}(s) are the averaged scattering intensities of the objects A and B, respectively, and I_{AB}(s) denotes the crossterm. Note that the individual integrated intensities do not depend on the position and only the integral from the crossterm changes when the particles are moved/rotated. Using the spherical harmonics expansion [equations (1)–(3)], the three terms are readily calculated in terms of the partial amplitudes and a quantitative measure of the agreement between the two realspace threedimensional objects can therefore be expressed in as a normalized (NCC):
where A_{lm}(s) and B_{lm}(s) are the scattering amplitudes of the objects A and B, respectively. Here, all integrals are evaluated in a restricted angular range up to s = s_{m} defining the resolution of the objects. Theoretically (for an infinite upper limit of integration), NCC varies between 0 and 1, the latter value corresponding to an ideal overlap of two identical structures. Thus, NCC can serve as a convenient proximity measure and its maximization should allow one to obtain the best overlap of two objects.
3. The superposition algorithm for the alignment of two heterogeneous objects
Numerical minimization of a proximity measure between threedimensional heterogeneous objects with respect to the positional and rotational parameters is a nontrivial task. The target function may display multiple local minima, where the search algorithms may be trapped. Global minimization algorithms (e.g. simulated annealing) can overcome local minima but they are not computationally efficient. A practical approach to solving this problem is the use of a local minimization starting from prealigned positions. These can be obtained from the inertiaaxes matching procedure as implemented in SUPCOMB (Kozin & Svergun, 2001). Here, the principal axes of inertia are found for both objects as the eigenvectors of the inertia tensor matrix representing linear combinations of the second central moments of distribution around the mass centres. The two objects are set in canonical positions where they are origin centred and rotated so that their principal inertia axes, taken in ascending order of eigenvalues, are aligned along the X, Y and Z axes, respectively. Depending on whether structures are allowed or not, there are four or eight sign combinations of the eigenvalues.
Here, we employ the following algorithm for the superposition of two threedimensional objects (model A and model B):
(i) Inertia tensors and their eigenvectors are computed for both model A and model B.
(ii) Model B is rotated by M_{A}M_{B}^{T} and shifted by T_{B}–T_{A} to align its principal axes with those of model A (here, M_{A} and M_{B} are the rotation matrices of model A and model B, respectively, and T_{A} and T_{B} are the corresponding shifts of the centres of mass from the origin of the coordinate system). The signs of the columns of the rotation matrix M_{B} are selected out of four sign combinations (or eight, if enantiomorphs are allowed).
(iii) The scattering amplitudes of the two objects A_{lm}(s) and B_{lm}(s) are evaluated. Here, one can use CRYSOL (Svergun et al., 1995) for the highresolution atomic models or DAM2ALM [available in the ATSAS package (Petoukhov et al., 2012)] for lowresolution ab initio shapes. Options for the direct reading of electrondensity maps (EMD files, EMDataBank; https://emdatabank.org/index.html) or multiphase ab initio bead models [obtained by the program MONSA (Svergun & Nierhaus, 2000)] and calculation of their scattering amplitudes are also provided.
(iv) The position and orientation of model B are refined by minimizing the value of 1/NCC using a local minimization by the nonlinear leastsquares package NL2SOL (Dennis et al., 1981). The obtained best NCC value is reported as an estimate of proximity measure between the objects.
Steps (i) and (ii) are similar to the ones used in SUPCOMB; however, steps (iii) and (iv) are different and offer several advantages. First, the use of the precomputed amplitudes allows one to easily match heterogeneous objects and models of different nature. Second, the changes in B_{lm}(s) upon rotations and displacements of object B are rapidly calculated using the finite rotation elements matrix (Edmonds, 1957; Svergun, Volkov et al., 1997). Importantly, the scattering amplitudes are calculated only once and the computational cost of the algorithm does not depend on the complexity of the structures to be superimposed (in contrast to SUPCOMB, where the calculation time is proportional to the product of the numbers of elements in the two objects, N_{A}N_{B}).
4. Optimal resolution parameters for the NCC calculation
The method was implemented in a computer program called SUPALM, and its performance was tested in a number of test cases on various high and lowresolution models of biological macromolecules and compared with SUPCOMB. Most of the lowresolution models in Tables 1–3 are experimentally determined ab initio protein shapes reconstructed from SAXS patterns measured at the beamlines X33 and P12 (EMBL c/o DESY) using DAMMIN (Svergun, 1999), DAMMIF (Franke & Svergun, 2009) and GASBOR (Svergun et al., 2001). All these shapes are superimposed with the available crystal structures of these proteins taken from the appropriate Protein Data Bank (PDB) entries. To illustrate the performance of SUPALM on EM maps, a bead model generated directly from the experimental EMD map is superposed with the highresolution cryoEMderived model of the human gammasecretase (Sun et al., 2015). As an example of a heterogeneous assembly the model of the protein–RNA distribution in the 70S ribosome of Escherichia coli derived from Xray and neutron scattering data (Svergun, Burkhardt et al., 1997) is compared with the recent cryoEM structure of the ribosome (Mitra et al., 2006). As seen from Tables 1–3, for all these different objects the NSD values yielded by SUPCOMB agree well with the NSDs computed from the models provided by SUPALM. In Figs. 1–3 the template models and those superimposed by SUPCOMB/SUPALM are presented to illustrate the similarity of the results of the two programs. Note that, for the case of human gammasecretase where the largest NSD values are observed, the atomic resolution fragments of the cryoEM model were manually fitted inside the EMD map using Coot (Emsley & Cowtan, 2004) in the original publication (Sun et al., 2015).



The results of Table 2 indicate that, in all test cases, a maximum number of spherical harmonics of L_{max} = 5 was sufficient to ensure a goodquality superposition of the two objects. If a lower number of harmonics are used, the matching quality becomes significantly worse (as can be judged from the higher NSD values), whereas higher L_{max} values increase the computation times but do not significantly improve the overlap.
The angular data range employed to compute the integrals in equation (5) can also be optimally selected. As illustrated in Table 3, s_{m} values corresponding to seven Shannon channels N_{sh} (where N_{sh} = D_{max}s_{m}/π and D_{max} is the maximum size of the particle) are sufficient for a reliable positioning. The use of wider angular intervals does not significantly improve the NSD values but, again, slows down the calculations. On the basis of these results the default values for SUPALM are selected to be L = 5 and s_{m} = 7π/D_{max} (if needed, these values can be changed by the user from the commandline arguments).
5. Comparison of the performance of SUPCOMB and SUPALM
SUPALM with the default parameters is about 1.5–2 times faster than SUPCOMB for small macromolecules with molecular weights lower than 100 kDa (represented by about 10^{3} atoms). For large macromolecular complexes (e.g. 1 MDa, about 10^{5} atoms) SUPALM performs more than ten times faster compared to SUPCOMB. For superpositions of crystal structures with EM density maps the speed of SUPALM is of the same order of magnitude as that of the highperformance EM docking programs like SITUS or ADP_EM.
The superimposed models obtained by SUPCOMB and SUPALM are, as expected, not identical as the quantities to be minimized (NSD and 1/NCC, respectively) are different. Still, the overlaps in Figs. 1–3 and comparison of the NSD values in Table 1 indicate that, although the solutions by SUPALM reveal slightly higher NSD values, their positions essentially coincide with those given by SUPCOMB. Fig. 4 displays an example of NSD and 1/NCC contour profiles in the vicinity of a SUPALM solution. The plots of both functions display well defined minima, and both behave as analytical functions smoothly approaching the minimum values. Overall, in all the test examples, SUPALM provides practically the same results as SUPCOMB but runs much faster, especially for large macromolecular complexes.
We have also implemented an alternative procedure for speeding up the model superposition of SUPCOMB, using a different realspace metric for the proximity measure of threedimensional objects, the normalized overlapped volume (NOV). Here, both objects are voxelized on a cubic grid in real space and the maximization of NOV provides the best matching of two objects. We found that the performance of the accelerated SUPCOMB using this metric is comparable to that of SUPALM; however, the NOVbased superpositions are not sufficiently sensitive to the finer details of the objects. For complicated shapes, the use of NOV may yield worse NSD values compared to SUPALM and to the standard (NSDdriven) mode of SUPCOMB. The rapid NOV mode may still be useful for highthroughput superpositions of simple shapes and will therefore be offered to SUPCOMB users.
An important feature of SUPALM (not available in SUPCOMB) is the ability to work directly with electrondensity maps (EMD files) and with multiphase ab initio MONSA models of inhomogeneous particles (e.g. protein–RNA and/or protein–DNA complexes). In the case of EMD models, the threedimensional grid of voxels and the corresponding electron densities are directly used to calculate the scattering amplitudes; in the case of inhomogeneous bead models additional input files with the scattering density values for each bead phase (e.g. protein or DNA/RNA part of the complex) must be provided by the user. After the appropriate computation of the scattering amplitudes, the superposition runs automatically as in the plain bead or atomic model case.
6. Additional test cases for SUPALM alignments
The examples considered in Table 1 utilized ab initio shapes restored from the experimental data to illustrate similarity between SUPALM and SUPCOMB results; therefore no `true' positions of the structures were available, with SUPCOMB solutions used as reference. To test the ability of SUPALM to reconstruct `true' alignments, an additional series of model calculations were performed. First, lowresolution bead envelopes were generated from the highresolution structures in their reference positions. Then, the bead models were arbitrarily moved/rotated as rigid bodies and SUPALM was employed to reconstruct their initial position. In all cases presented in Table 1, the initial position was restored, with a r.m.s.d. of about 1 Å, demonstrating the ability of SUPALM to adequately find the best overlap. The small differences between the `true' and matched structures are within the resolution limit used in the calculations.
Another test was conducted to check the stability of SUPALM when superposing variable conformations. For this, an NMR ensemble of protein structures (PDB code 2ma0 containing ten conformers of the small protein UVI31+; Chary, Rout & Rao, unpublished work) was selected. The first model in the PDB file was taken as a reference and all conformers were superimposed onto it by SUPALM. Fig. 5 displays the original NMR ensemble (where the rigid parts overlap with each other but the flexible portions display great variability) and the SUPALM superpositions, where the differences are, as expected, more uniformly distributed between the structures, but still the common parts of the ensemble structures are well superposed.
As a rule of thumb, for SUPCOMB alignments, NSD values around unity indicate good similarity between the two objects, whereas significantly larger values point to poor similarity. For NCC, the cutoff value for a good overlap depends on the molecular size, shape complexity and model resolution used in the calculations. From our test examples, the lower limit for a good similarity for NCC is around 0.7–0.8 at the resolution level used (L = 5 and seven Shannon channels).
7. Discussion and conclusions
An algorithm for rapid automated superposition of high and lowresolution models was proposed and tested on a number of macromolecular structures to demonstrate the reliability of the method. Contrary to SUPCOMB (Kozin & Svergun, 2001) where the CPU time is proportional to the product of the number of points in the two objects, the performance of the reciprocalspace superposition depends solely on the maximum number of spherical harmonics L and on the angular data range [0, s_{m}] used to calculate the The default values of these parameters (L = 5 and seven Shannon channels in the range of integration) ensure a good quality of the superposition and at the same time provide a significant gain in the computing time against SUPCOMB. The gain is especially noticeable for large macromolecular structures where the Fourier space positioning becomes over ten times faster than in SUPCOMB.
The use of the normalized ), but contrary to the rotation function, we utilize only loworder harmonics at small angles to find the best overlap. Further, the number of harmonics in SUPALM can be defined by the user and it is not bound to a power of two as in the packages utilizing the fast Fourier transform technique (like SITUS, FRM or ADP_EM). We stress that the proposed method is not meant as an alternative to the mentioned EM packages, rather that the main aim of SUPALM is a rapid and convenient object matching at a level of the overall shape (e.g. using lowresolution models obtained from SAXS). Still, SUPALM permits the direct input and overlap of electrondensity maps (EMD files) and multiphase ab initio bead models (MONSA) and this significantly extends the applicability of the compared to SUPCOMB. SUPALM eliminates the need for intermediate steps (e.g. the transformation of the EMD file into a PDB model) and increases the accuracy of the superposition of heterogeneous structures (as the electron densities of each coordinate of the object are taken into account during the calculation of the scattering amplitudes, instead of using the simple approximation of a homogeneous particle). This option should facilitate a faster and more accurate comparison of SAXS models with those from complementary methods (such as EM, SANS, NMR and Xray crystallography).
based on the spherical harmonics representation employs a similar principle to the crystallographic fast rotation function method (Crowther, 1972The proposed method reliably operates with default parameters, does not require user input and is therefore applicable in automated pipelines. The procedure, implemented in a program module SUPALM included in the ATSAS package (https://www.emblhamburg.de/biosaxs/software.html), is freely available to academic users together with other programs from the ATSAS 2.7 release.
Acknowledgements
The authors acknowledge the support of the Bundesministerium für Bildung und Forschung (BMBF) project BIOSCAT, grant No. 05K20912, and of the European Commission, FP7 Infrastructures program grant BioStructX, project No. 283570.
References
Chacón, P. & Wriggers, W. (2002). J. Mol. Biol. 317, 375–384. Web of Science CrossRef PubMed CAS Google Scholar
Crowther, R. A. (1972). The Molecular Replacement Method, edited by M. G. Rossmann, pp. 173–178. New York: Gordon and Breach. Google Scholar
Dennis, J., Gay, D. & Walsh, R. E. (1981). ACM Trans. Math. Softw. 7, 348–368. CrossRef Google Scholar
Edmonds, A. R. (1957). Angular Momentum in Quantum Mechanics. Princeton University Press. Google Scholar
Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Franke, D. & Svergun, D. I. (2009). J. Appl. Cryst. 42, 342–346. Web of Science CrossRef CAS IUCr Journals Google Scholar
Garzón, J. I., Kovacs, J., Abagyan, R. & Chacón, P. (2007). Bioinformatics, 23, 427–433. Web of Science PubMed Google Scholar
Jiang, W., Baker, M. L., Ludtke, S. J. & Chiu, W. (2001). J. Mol. Biol. 308, 1033–1044. Web of Science CrossRef PubMed CAS Google Scholar
Kovacs, J. A., Chacón, P., Cong, Y., Metwally, E. & Wriggers, W. (2003). Acta Cryst. D59, 1371–1376. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kozin, M. B. & Svergun, D. I. (2001). J. Appl. Cryst. 34, 33–41. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lambright, D. G., Sondek, J., Bohm, A., Skiba, N. P., Hamm, H. E. & Sigler, P. B. (1996). Nature, 379, 311–319. CrossRef CAS PubMed Google Scholar
Marino, M., Zou, P., Svergun, D., Garcia, P., Edlich, C., Simon, B., Wilmanns, M., MuhleGoll, C. & Mayans, O. (2006). Structure, 14, 1437–1447. CrossRef PubMed CAS Google Scholar
Mitra, K., Schaffitzel, C., Fabiola, F., Chapman, M. S., Ban, N. & Frank, J. (2006). Mol. Cell, 22, 533–543. Web of Science CrossRef PubMed CAS Google Scholar
Parseval des Chênes, M.A. (1799). Mémoire sur les séries et sur l'intégration complète d'une équation aux différences partielles linéaire du second ordre, à coefficients constants. Paris. Google Scholar
Petoukhov, M. V., Franke, D., Shkumatov, A. V., Tria, G., Kikhney, A. G., Gajda, M., Gorba, C., Mertens, H. D. T., Konarev, P. V. & Svergun, D. I. (2012). J. Appl. Cryst. 45, 342–350. Web of Science CrossRef CAS IUCr Journals Google Scholar
Schubert, W.D., Urbanke, C., Ziehm, T., Beier, V., Machner, M. P., Domann, E., Wehland, J., Chakraborty, T. & Heinz, D. W. (2002). Cell, 111, 825. CrossRef PubMed Google Scholar
Stuhrmann, H. B. (1970). Acta Cryst. A26, 297–306. CrossRef IUCr Journals Web of Science Google Scholar
Suhre, K., Navaza, J. & Sanejouand, Y.H. (2006). Acta Cryst. D62, 1098–1100. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sun, L., Zhao, L., Yang, G., Yan, C., Zhou, R., Zhou, X., Xie, T., Zhao, Y., Wu, S., Li, X. & Shi, Y. (2015). Proc. Natl Acad. Sci. USA, 112, 6003–6008. CrossRef CAS PubMed Google Scholar
Svergun, D. I. (1999). Biophys. J. 76, 2879–2886. Web of Science CrossRef PubMed CAS Google Scholar
Svergun, D., Barberato, C. & Koch, M. H. J. (1995). J. Appl. Cryst. 28, 768–773. CrossRef CAS Web of Science IUCr Journals Google Scholar
Svergun, D. I., Burkhardt, N., Pedersen, J. S., Koch, M. H. J., Volkov, V. V., Kozin, M. B., Meerwink, W., Stuhrmann, H. B., Diedrich, G. & Nierhaus, K. H. (1997). J. Mol. Biol. 271, 602–618. CrossRef CAS PubMed Google Scholar
Svergun, D. I., Koch, M. H. J., Timmins, P. A. & May, R. P. (2013). Small Angle Xray and Neutron Scattering From Solutions of Biological Macromolecules. Oxford University Press. Google Scholar
Svergun, D. I. & Nierhaus, K. H. (2000). J. Biol. Chem. 275, 14432–14439. Web of Science CrossRef PubMed CAS Google Scholar
Svergun, D. I., Petoukhov, M. V. & Koch, M. H. J. (2001). Biophys. J. 80, 2946–2953. Web of Science CrossRef PubMed CAS Google Scholar
Svergun, D. I., Volkov, V. V., Kozin, M. B., Stuhrmann, H. B., Barberato, C. & Koch, M. H. J. (1997). J. Appl. Cryst. 30, 798–802. Web of Science CrossRef IUCr Journals Google Scholar
Volkmann, N. & Hanein, D. (2003). Methods Enzymol. 374, 204–225. Web of Science CrossRef PubMed CAS Google Scholar
Zhang, X., Meining, W., Fischer, M., Bacher, A. & Ladenstein, R. (2001). J. Mol. Biol. 306, 1099–1114. Web of Science CrossRef PubMed CAS Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.