Theoretical and practical approaches to improve the performance of local correlation algorithms for volume data analysis and shape recognition

Several approaches are presented to improve the performance of local correlation algorithms based on prior information about 3D search and target maps.


Introduction
The goal of many cryogenic electron microscopy (cryo-EM) studies is to obtain an atomic model or models corresponding to the data and the molecular complexes imaged. Independent of the techniques used to set up data collection (for example single-particle EM or tilt-series tomography), we obtain sets of projections of an unknown 3D structure onto a digital 2D sensor. After much processing of the recorded images, a reconstruction algorithm provides the user with a 3D density map. In cryo-EM we image with electrons, so the map obtained is strictly an electron potential map. However, it is very similar to an electron-density map that would be obtained by X-ray crystallography, so we will call it a density map by analogy and for convenience. Many software packages have been developed by large teams worldwide to assist with the processing and reconstruction steps, in which all possible information about the experiment is used to determine the best quality density map; see the Collaborative Computational Project for Electron cryo-Microscopy (CPP-EM; Burnley et al., 2017).
In favourable cases cryo-EM has produced maps to genuine atomic resolution (1.2 Å ; Nakane et al., 2020;Yip et al., 2020); however, 3-4 Å is more typical of a high-resolution result, and more maps in the lower resolution range between 4 and 8 Å are being published than high-resolution maps.
For higher resolution maps (<3.5 Å ) it is possible to derive or build atomic models directly from the map, as individual atoms are resolved or the expected features of protein side chains can be seen and modelled (Emsley & Cowtan, 2004;Liebschner et al., 2019). The other case is when the 3D map is at a lower resolution and the features are not rich enough for a structure to be built directly into the map. Of course, there are intermediate situations in which some parts of the map are better than others. In some cases a reliable protein backbone trace can be defined, but most of the side chains are not identifiable.
When an atomic model cannot be built directly into the experimental map manually, if not automatically, it is possible to build a model from larger known fragments (Roseman, 2000). In a typical case, domains or parts of the molecules might have been solved by X-ray crystallography or nuclear magnetic resonance, or they might be related by homology to parts of other structures that have already been solved. The problem to be solved is the docking of these molecular shapes into the larger map of the entire complex to assemble a model using 3D molecular-density docking. Ideally, objective and quantitative computational methods are needed, with provision for user interaction. At low resolution it may often be necessary to include some additional information to create an unambiguous model, such as constraints from chemical crosslinking or the known effects of mutations.
A complete search of all parameters to locate or dock a particular search density into a larger target map is computationally expensive and time-consuming. It is a large sixdimensional search covering three spatial dimensions and three orientation parameters.
Docking problems in electron microscopy (EM), i.e. locating a known density object optimally within a larger 3D density map, are similar to but different from molecularreplacement (MR) problems in X-ray crystallography. The maps produced by cryo-EM techniques are the final results, and no further atomic model-building steps are used to improve the maps. EM images contain phase information and therefore the resultant 3D maps have good-quality phases and amplitudes in Fourier space. However, in some recent programs EM maps are improved by sharpening based on local B-factor estimates computed from comparisons with a fitted model.
MR is a method to solve the phase problem in X-ray crystallography, which has a lack of experimental phases in standard diffraction data. In MR, a model or a partial model located in the crystallographic cell is used to obtain some partial phases that can then be used to calculate an initial map. Since the initial map will not have good, if any, phase information, the problem is more difficult than EM molecular docking, and many inventive procedures have been used to solve such problems (see, for example, Colman & Fehlhammer, 1976). MR is routine if a model similar to the crystallized structure is available and is more challenging when there is not any available model known to be similar to the structure. A correct fit with even a partial fragment can provide sufficient phase information to enable, with further bootstrapping, using more fitting or direct methods, the solution of the correct high-resolution electron-density map to be obtained. For a more complete review of MR, see Evans & McCoy (2008) and references therein. A similarity of our EM method and MR is the use of correlation and convolution functions to obtain vectors to locate molecular fragments in a density map (Read & Schierbeek, 1988).
In this paper, we present a series of algorithm improvements for this intensive computational task of docking two densities. Our aim is to generate useful solutions fast enough that docking can be performed interactively by a user on a high-performance desktop workstation. This will allow them to explore and guide model building using additional information and allow them to test their hypotheses in real time.
Our second aim is to facilitate low-resolution model building for another case: where no known candidate homologous structures can be identified for a target map or some region of the map. We propose that with the speedup in computation that we are achieving, a full search of a set of nonredundant molecular fragments from the Protein Data Bank (https://www.wwpdb.org) could be performed. A similar strategy is used for molecular replacement in X-ray crystallography, for example in MoRDa (an automatic molecularreplacement pipeline; Vagin & Lebedev, 2015), which is part of the CCP4 online web services (Winn et al., 2011). Depending on the resolution and character of the structure in the map, the fragments will range in size from domains, or subdomain features, to short peptide-chain fragments of a few amino acids.
The set of candidate domains from the PDB could contain hundreds of thousands of potential structures; however, they may be reduced by removing those with similar features to generate a nonredundant set. Furthermore, we may select those with a given taxonomy, from a given organism, with a given polymer type, with a given enzyme classification etc. Some experimental maps have clearly defined domain boundaries, allowing us to split the whole map into smaller blocks, and will give constraints on the fragment size. Therefore, the user could select atomic models to fit based on an appropriate size range. This approach may allow us to limit the number of candidates to thousands. Fitting these models requires the rotation and translation of a large set of search maps in 3D, and is still a computationally intensive task. However, several approaches discussed in this paper allow us to reduce processing times.
In Section 2 we discuss how to define the similarity of two maps. For this purpose, we calculate maps of correlation coefficients and discrepancies. Without any restrictions these similarity scores have a simple relation. However, in the case of noisy data or low signals, extra restrictions can be introduced and the scores are changed. In Section 3 the problem of fitting a smaller 3D map onto a larger map is reviewed. Technical details of this problem with a discussion of methods for optimization are presented in Section 4. Quicker computation of good estimates of correlation maps can be found by reducing the sampling level at the inverse transform stage: see Section 5. Performance numbers for the DockEM program modified with the use of the proposed ideas and comparison with the PowerFit program are presented in Section 6.

Similarity of vectors for noisy data
Suppose we have two 1D vectors of size n: f i , g i , i = 0, . . . , n À 1. The similarity of these vectors can be found in several ways. We use notions of cross-correlation and discrepancy (see Appendix A). For a normalized signal g, i.e. P nÀ1 i¼0 g i ¼ 0 and P nÀ1 i¼0 g 2 i ¼ 1, and without any restrictions on the level of signal research papers g, the normalized correlation and discrepancy coefficients are defined as Real data contain noise and errors; therefore, instead of the exact signal f i we know only some approximationsf f i such that f f i ' f i . In some cases we know the error values, i.e. " i > 0 such that jf i Àf f i j < " i . In other cases, we may assume thatf f i has some distribution with given parameters.
Similarity scores are independent of the amplitude of a signal but are sensitive to the signal-to-noise ratio. In the case of strong signal values of f i small errors " i should not cause any issues when similarity scores are calculated, since " i /|f i | are small. This means that the scores found for erroneous dataf f i have similar values compared with (unknown) exact values f i . In the case of low signal, for example f i ' 0, " i /|f i | becomes large, so more solutions with the given error are possible and a higher variability of similarity scores is observed. As a result, we may find a lot of false matching of two signals. Therefore, it is reasonable to put some additional restrictions on signal f i .
Suppose a signal is given on a subvolume. To avoid false peaks for similarity values, we require the signal to be different from noise. However, as similarity scores do not change if we vary the mean values, we may obtain false peaks even when a signal is close to a nonzero constant. Thus, we need to force the signal to be different from its mean value for the given subvolume. Therefore, we may threshold the variance, i.e. set a number 0 > 0 and redefine the correlation coefficient as This means that for low-variance regions of the target map we set the correlation coefficient to zero. For the discrepancy we may require the scaling factor to be greater than a given number 0 . In this case, the minimum of (6) is attained at the point The times taken to calculate D and R similarity scores are almost the same.

Correlation map in 3D
Suppose we have a target map T (a density distribution after cryo-EM pre-processing and reconstruction). There is also a smaller search map S (this map is generated from known atomic structures). Our goal is to position the search map S onto the target map T in such a way that their common parts are similar. We also introduce a mask to confine the signal compared to a fixed region around the search object, reducing the noise contributed by unmatched parts of the target map (see Roseman, 2003). Let us consider a 1D case first.
There are vectors T i , i = 0, . . . , N À 1 and S j , j = 0, . . . , K À 1. In general N and K are different. We choose an index q and consider a subvector T q of length K of vector T: T q+j , j = 0, . . . , K À 1. The subvector T q and vector S are of the same length K. We may find similarity scores for those two vectors as described in the previous section. For example, the corresponding correlation coefficient is The length of vector S is K and the length of vector T is N; therefore, there are only P = N À K + 1 common segments when all elements of vector S and subvector T q are known. In principle, the search signal may be defined on multiple segments. Therefore, it is better to extend the search vector and define it for all integer indices. At the same time, we may introduce the mask vector M j such that M j = 1 when S j is known and M j = 0 otherwise. Note that S j = S j M j . We define the number of known elements as w, i.e. w P NÀ1 i¼0 M i . The correlation coefficient can be written as If there are two discrete (real-valued) functions f i , g i , i = À1, . . . , 1 then their cross-correlation is defined as ðf ? gÞ½j P 1 i¼À1 f i g iþj . We may set the T, S, M vectors to zero outside i = 0, . . . , N À 1; the correlation coefficient at point q can then be written as In a similar way, we may find R q TS and D q TS in the case of erroneous data by using (1) and (2). For instance, if ðS ? TÞ½q < 0 , then These expressions for similarity values written with the use of cross-correlation functions are left unchanged if we consider 3D maps. The main difference from the fast local correlation algorithm introduced for 2D maps (images) in Roseman (2000) is the use of cross-correlation operations instead of convolutions. While the results are the same in the case of symmetric maps, they are different for arbitrary-shaped search maps.
Let the target vector T be of length N and the search vector S be of length K; we then pad them with zeros and make them periodic with period L. If we use (7) to find the cross-correlation vector ðS ? TÞ½q then it will have the correct values only for q = 0, . . . , L À K.
Suppose we are given 3D maps and want to find similarity scores over some region of interest (ROI). We should expand the ROI by the size of the search map. We can then use a 3D DFT (discrete Fourier transform; see Appendix A) for the target/search maps on the extended ROI and find crosscorrelation with (7).

Numerical implementation
Here, we discuss technical details that help us to improve the performance of the algorithm defined by (3). A key feature of the procedure is that the correlation coefficient is computed in a possibly irregular, local area defined by a mask enclosing the search object.

Size of maps
Suppose we are given a target map T of size n T x Â n T y Â n T z . We also know an atomic model that we aim to fit at some points of the target map. The atomic model can be given as the coordinates of atoms in a PDB file. We then simulate the density distribution corresponding to the given atomic model, so that we have two equivalent-sized maps, which will allow the correlation calculation to be performed using FFTs.
An electron potential map is generated using a five-Gaussian approximation for each atom (see Appendix A). In this way, we can generate an S map for an arbitrary small voxel size. For now, we assume that the sampling of the T and S maps are the same. Suppose that there is a procedure to generate the mask map M from our search map S.
For given T, S and M maps we apply an algorithm to find the score values for each point of map T. However, by default we do not known the orientation of the atomic structure. Therefore, the algorithm to find score values should sample all orientations. Of course, the final results should not depend on the position of the centre of rotation of the atomic structure. In principle, we may choose any point as a centre of rotation, as long as the ROI is large enough to accommodate the rotations. A convenient and intuitive way is to use the centre of mass of the molecule. However, to keep the rotated maps in a smaller volume, it is more efficient to use a point close to the centre of the bounding sphere; see Gä rtner (1999) and Larsson (2008). In this way, the size and the shape of the corresponding S and M maps can be kept the same independent of an initial chosen orientation of the atomic structure.
In principle, the centre of rotation can be anywhere (not always at a corner of a voxel for S and M maps). However, keeping the centre at an integer point (an integer number of pixels for each direction) may help us to implement some extra procedures to improve performance.
Let maps S, M have size n S x Â n S y Â n S z (n S x , n S y , n S z are even numbers) and the centre of rotation be at ðn S x =2; n S y =2; n S z =2Þ. Suppose the maps are padded with zeros in such a way that no nonzero values will be outside these maps if they are arbitrarily rotated. We may want to find score values within the box [x s , , where x s , x e , y s , y e , z s , z e are integer numbers. To find the score values for this ROI, we need to use the following ROI for T: ½x s À n S x =2; x e þ n S x =2 À 1 Â ½y s À n S y =2; y e þn S y =2 À 1 Â ½z s À n S z =2; z e þ n S z =2 À 1, which should be inside ½0; n T x À 1 Â ½0; n T y À 1 Â ½0; n T z À 1. As the size of Fourier images for T, S and M should be the same, we need to pad them with zeros. The least size of the padded maps is ðx e À x s þ n S x Þ Â ðy e À y s þ n S y Þ Â ðz e À z s þ n S z Þ. There are many libraries providing users with DFT implementation. The general rule for the time required to perform a 1D DFT is OðN log NÞ, where N is the size of the vector. However, each implementation may provide different numbers depending on the computational architecture (CPU/ GPU, memory bandwidth). In Gambron & Thorne (2020) a performance comparison of most popular FFT libraries is made. The most critical factor is the domain size. We see that execution times vary significantly depending on N. When the prime decomposition of N contains prime numbers larger than 5 the time taken is disproportionately longer, but is otherwise approximately linear with N log N. Therefore, in some cases it is better to increase the domain (with zero padding) to achieve better run times. In any case, keeping the centre of rotation for S, M maps as close to the centre of the bounding sphere helps us to reduce the whole size of Fourier maps for T, S and M.

Rotation and Fourier space
To compute the correlation map we need to perform three cross-correlations: S ? T, M ? T and M ? T 2 . Therefore, we need to perform four forward DFTs, for M, S, T and T 2 , and three backward DFTs, for S ? T, M ? T and M ? T 2 . As maps T and T 2 are fixed, F ðTÞ and F ðT 2 Þ can be performed only once (before any rotations). Thus, for each rotation we need to perform five DFTs.
The uncertainty principle is valid for the Fourier transform. The smaller the support of a function g is (i.e. where it has nonzero values), the more spread out its Fourier transform F ðgÞ. The support of S and M is relatively small with respect to the extended ROI of T. Thus, we should expect the corresponding Fourier images of S and M to be dense maps. While finding a 3D DFT of a rotated object can be performed with lower dimension DFTs of the original (nonrotated) Fourier map (Paeth, 1990;Larkin et al., 1997), this is a quite timeconsuming operation which at the same time introduces extra research papers errors due to interpolation procedures. Therefore, it is faster to find S and M maps with finer sampling and then perform remapping with nearest-neighbour or linear interpolation for given Euler angles onto a map of the same voxel size as the T map.

Cubic ROIs and an integer centre of rotation
When we try to fit an atomic model (or some density search object) to an area of a target map, the general rule is to rotate the S and M maps while keeping the T map fixed. Search and mask maps can be found with high accuracy in a relatively short time; therefore, we may assume that there are almost no interpolation errors for rotated S and M maps.
Rotating target maps usually introduces interpolation errors; therefore, the maps of correlation or discrepancy scores may also be susceptible to these errors. However, we may avoid any interpolation errors if the centre of rotation is at an integer point (i.e. the centre of rotation of S, M is at a corner of one of the voxels of the T map). If we rotate by 90 around a coordinate axis, then the indices of the new map are integer numbers, i.e. are also indices of the original map. Therefore, we may find all possible perfect rotations of T and use multiple T maps for the same rotated S, M maps. If the ROIs are cubic, then all rotated maps have the same shape.
In the 2D case there are four such rotations for a square. The rotations and corresponding matrices are shown in Fig. 1. If the original map has a pixel with indices (i, j), then with the rotation matrix the new index is (i new , j new ) = (Àj, i). All indices should be from 0 to n À 1. Due to the periodicity of all maps (required to perform circular discrete correlation with DFT), then i new = n À j if j > 0 and i new = 0 if j = 0; j new = i for any i. In 3D we may use properties of P432 symmetry for a cube. There are 24 symmetry operations (see Aroyo, 2016). The corresponding rotation matrices are shown in Fig. 2.
Suppose for a given set of Euler angles defining rotation matrices we may combine them into a group of 24 matrices (each matrix is a matrix product of one matrix with one of the matrices defined in Fig. 2). If we do not combine these rotation matrices into groups of 24, then to find a correlation map we need to perform 5 Â 24 = 120 DFTs for each 24 rotations. However, if these matrices are combined then we just need to perform two forward DFTs for S and M and 3 Â 24 = 72 backward DFTs for S ? T, M ? T and M ? T 2 , i.e. 74 DFTs in total or 61.7% of 120.

Figure 2
Matrices for 3D orthogonal rotations to relate the different octants to each other. denominator in (3) are small numbers that are sensitive to small errors in the T, S and M maps. Therefore, in practical applications we first estimate the maximum value of for all possible rotations and translations of S, M maps. We then set 0 in (1) as a percentage of the maximum value of . We demonstrate our procedure by fitting an atomic structure (the intermediate domain, which at $9 kDa is the smallest of three defined domains in the GroEL subunit) into an EM map of the GroEL chaperonin complex (EMDB entry EMD-1997, resolution 7 Å , C7 symmetry; Clare et al., 2012). The domain was extracted from PDB entry 2c7e (residues 136-191 and 374-409; Ranson et al., 2001). We find the maximum value of and then set 0 as 50%, 30%, 10% and 1% of . For a slice perpendicular to the axis of symmetry we calculate R q TS (see Fig. 3). One may see that for 0 ! 0.1 max the local maximum values of the correlation maps are the same. On decreasing 0 more peaks start to appear in the central and outer parts of the map where the target map T is almost zero. When 0 is about 1% of max the central part of the map has higher correlation coefficient values than the real peaks. By setting 0 to a correct level we reduce extra checking of these false peaks as centres of the chosen atomic structure.
A similar idea can be used in the case of a discrepancy map when we need to set the minimum level of and apply formula (2).

Smooth functions
Maps T and S are both smooth functions, i.e. their neighbouring voxels should have similar values. Therefore, the S ? T map is also a smooth function. In many practical cases M is a set of a small number of convex or almost convex shapes, so we may also expect ðM ? T 2 Þ and ðM ? TÞ to be smooth functions.
To avoid false peaks in the R q TS map we set ! 0 . As a result, R q TS is also a smooth function of q for these areas. Thus, to identify areas where R q TS has peaks we may not need to find R q TS at each voxel, especially when we use a brute-force approach and have to check many rotation angles. For instance, we may find the map only at even voxels when all three indices of a voxel are even numbers. In this case the size of the output map is reduced eight times.
Of course, we may try to reduce the number of voxels for all maps (T, S and M). However, as some original voxels are skipped this led to maps that differed from the original ones. Our goal is to have the same accuracy of the output maps but just find values at a reduced number of voxels.

Reduced maps
Suppose we have an N-vector G k in Fourier space and N = mK, m and K are integer numbers. We want to find the values of the corresponding vector g but only at indices q = mn, n = 0, . . . , K À 1. By the definition of backward DFT we find ðsK þ kÞ Á mn mK ! : Taking into account that exp(2isn) = 1 we obtain  Correlation maps obtained for different levels of standard deviation of the signal (50%, 30%, 10%, 1%).
i.e. each mth element of vector g is the backward DFT of vectorG G. Similar formulae are valid for 2D and 3D DFTs. We just need to add up the values for corresponding slices of a 3D map, thus reducing the size m times. The procedure is then continued for rows and columns, so the final volume is reduced m 3 times. This procedure is straightforward in the case of complex input/output maps.

Real and complex DFTs
We usually have real-valued T, S and M maps, so in many software libraries more compact data storage can be used to store the corresponding complex maps F ðTÞ, F ðSÞ and F ðMÞ in Fourier space. If g n , n = 0, . . . , N À 1 is a real-valued vector, then for the corresponding G k vector in Fourier space we obtain so G NÀk ¼ G k , G k is the complex conjugate of G k . Therefore, we may need to store almost half of vector G. This leads to more processing if we want to reduce the output maps. However, we may work with complex maps (both for input and output maps). There are several ways to combine real maps into complex maps.
(iii) If several maps T are given, i.e. 12 pairs of 24 prerotated T, then (T 1 , T 2 ), ðT 2 1 ; T 2 2 Þ are cross-correlated with (S, 0) and (M, 0). Finding F ðS; 0Þ and F ðM; 0Þ with the use of DFT for a complex map takes twice the time needed for the real-valued version of DFT for F ðSÞ and F ðMÞ. However, we may just use F ðSÞ and F ðMÞ maps created in a compact form and double their sizes with the horizontal flipping of corresponding rows, therefore (depending on the given CPU/GPU architecture and the libraries available) this operation should not take a lot of time. The processing time for backward DFTs should be roughly proportional to the number of backward DFTs multiplied by the factor 1/m 3 .
DFT algorithms are used to find convolution or crosscorrelation maps in a faster way when the corresponding elements of maps in Fourier space are processed in some way, for example multiplied. There is usually no need to access neighbouring elements in Fourier space. As a result, storage schemes for data in Fourier space may be organized in such a way as to achieve the highest throughput. This may lead to the neighbouring elements in memory not being the neighbouring elements for the data in Fourier space. Therefore, the datareduction process in Fourier space may not be straightforward as is described above and may depend on a particular implementation of a DFT algorithm.
Assuming (i) that finding a DFT for a complex map (A, B) takes twice the time and space compared with a DFT for a real-valued map A or B, (ii) that there is no time wasted in reducing the output map in Fourier space and (iii) that the time to process the reduced map is 1/m 3 of the original time, then we obtain theoretical times and storage requirements to process N G (N G is an even number) maps T (see Table 1). Therefore, we can estimate the time required to find the correlation maps for N G rotations: For example, for a DockEM implementation (Windows machine, six-core Intel Core i7-3930K CPU, Intel's performance IPP and MKL libraries) the time to process N G = 24 rotations and provide a reduced output map (m = 2) should be approximately 2 + [(3 Â 24)/2 3 ] = 11 compared with the m = 1 case of 74, i.e. 15%. However, in practice it was only possible to achieve 50% due to the extra time needed to process the Fourier images of original maps in order to obtain Fourier images of the reduced maps. Nevertheless, the combined performance improvement when 24 rotations are combined into one group was about three times compared with a general case of single rotation with a full-size output map.

Run times
The DockEM code considers all possible 3D rotations without any restrictions. There are several approaches (to be discussed in other papers) to generate an optimal number of rotations for a given angular resolution. Suppose that there is a set of rotations, and for any other possible rotation we try to find the nearest rotation from the given set. The maximum possible angular distance for any possible rotation is the angular resolution of the given set of rotations. As the best performance is achieved for rotations that are combined into groups, for each near rotation we also have other 23 rotations. Table 2 shows the number of rotation groups for a given angular resolution.
To illustrate a couple of use cases, we used the EMD-1997 map of GroEL (7 Å resolution, C7 symmetry, sampled at 2.02 Å per pixel) and the atomic structure from PDB entry 2c7e mentioned before (the intermediate domain, which is the smallest and the most difficult to fit), and ran the code on a ten-core processor (Intel Core i9-7900X, 3.30 GHz). Searches were run at 7 Å resolution. In the first example we used a target map region size of 70 pixels, corresponding to a full search of the whole molecule. The size of the mask defining the search object was 28 3 . The angular resolution was set to 3 , i.e. we need 19 382 groups of 24 rotations. It took 1707 s (28.5 min). We can estimate this time using Table 3. For a 24 3 mask and a 64 3 target we obtain 1.44 min for 1000 groups, so 19.382 Â 1.44 = 27.9 min.
DockEM is one of many algorithms for atomic fitting. A list of most the popular methods can be found in Villa & Lasker (2014). A full comparison with other methods will be performed in future papers, as many factors related to performance, quality of fitting, resolution range and choice of orientations need to be taken into account. PowerFit (van Zundert & Bonvin, 2015) is one of several programs based on similar ideas to DockEM. Here, we provide performance numbers to compare these programs. In van Zundert & Bonvin (2015) for a 128 3 problem (the combined size of the target and search maps) a fine (4.71 ) rotational search was performed on an Intel Core i7-3632QM CPU (four cores, 2.20 GHz frequency). The authors used 70 728 pre-calculated orientations. It took them 6 h 23 min or 383/70.728 = 5.41 min for 1000 orientations. According to Table 3, for a 128 3 problem (e.g. a 96 3 target map and 32 3 search map) we obtain 3.84 min for 1000 groups of 24 rotations (or 0.16 min per 1000 rotations) on an Intel Core i9-7900X (ten cores, 3.30 GHz). As different CPUs were used, we assume that PowerFit run times are inversely proportional to the number of cores and the CPU frequency, so on an Intel Core i9-7900X CPU we would expect PowerFit to process 1000 rotations in 5.41 Â 2.2 Â 4/ (3.3 Â 10) = 1.44 min. This means that DockEM performs 1.44/ 0.16 = 9 times faster than PowerFit.
The current version of the code can be used for interactive fitting. Suppose that we have roughly positioned the rotation centre of the search object within a 16-pixel cube, and the size of the search object can also be placed in a 16-pixel cube. We then use a 32 3 region of the target map to obtain correct score values with the 16 3 volume. If the angular resolution is 4 , then according to Table 2 we need to use 7568 groups of rotations. It took us 9.3 s. For a coarser angular fitting the processing time is within the range 3-4 s.

Conclusions
Fitting an atomic structure into a medium-or low-resolution map may require several processing steps.
The approach used depends on the research question and the data available. For example, there may be models or known atomic structures of fragments or domains for your structure in the PDB, or you may have a homology model. The problem is essentially then a 3D puzzle to assemble the known fragments into the 3D density map.
In another case, there may be no identifiable sequence homology with any solved structure. A search of selected candidates from the PDB could then be performed. Filters on criteria such as mass or secondary-structure composition could be applied to reduce the number of domain structures to search with.
EM density maps have some finite resolution which allows us to estimate the smoothness of maps for similarity scores. To avoid false peaks of similarity values we may put some restrictions on the level of signal. This helps to save time in checking some positions and orientations of the atomic structure and ill-defined, or solvent, regions of the target map.
Smoothness of the given and simulated maps allows us to skip some neighbouring orientations as well as to avoid finding score values at all points of the original map.
Positioning the centre of rotation of a given atomic structure at specific points enables us to combine several (up to 24) rotations in one group, which increases performance. Combining several real maps into complex maps may also be an efficient approach but depends on particular data-storage schemes and software libraries, which will dictate how efficiently a map-reduction algorithm can be implemented.
We have shown that we can obtain significant improvement in the efficiency and speed of the correlation searches without  Table 2 Angular resolution as a function of number of rotation groups.
Angle ( ) 16  12  10  9  8  6  4  3  2   Groups  109 257 439 598 871 2125 7568 19382 87718   Table 3 Time (in minutes) to process 1000 groups of rotations. losing accuracy. These can be used to implement real-time interactive programs or tools that will be useful for researchers. In addition, we are optimistic that we can apply these techniques to interpret lower resolution cryo-EM maps (which are not suitable for other model-building methods) by identifying and positioning matching fragments from the PDB.
APPENDIX A

A1. Correlation coefficient and discrepancy
Let there be two vectors of size n: f i , g i , i = 0, . . . , n À 1. The correlation coefficient is defined as where f f and g g are the mean values for the vectors, i.e. f f 1 n P nÀ1 i¼0 f i . For a given arbitrary vectorg g i , we may define vector g i such that g i g g i À 1 n P nÀ1 k¼0g g k P nÀ1 p¼0g g p À 1 n P nÀ1 k¼0g g k 2 " # 1=2 : In this case g g ¼ 0 and P nÀ1 i¼0 g 2 i ¼ 1, and the original formula (5) can be rewritten as The correlation coefficient has a range [À1, 1]. If f i = g i then R fg = 1, and in the case of f i = Àg i , R fg = À1. We allow a signal to be scaled and translated by constant values. For constants and a discrepancy is defined as d fg P nÀ1 i¼0 jf i À g i À j 2 : We try to vary the parameters , so the discrepancy attains its minimal value. The smaller the value of the discrepancy is, the more similar the vectors f and g are. If, as above, we assume g g ¼ 0, P nÀ1 i¼0 g 2 i ¼ 1, then is a quadratic function of and . To find its minimum, we need to solve @ @ d fg ¼ 0, @ @ d fg ¼ 0. The optimal values are ¼ P nÀ1 i¼0 f i g i and ¼ f f and the minimum value of d fg is The discrepancy value is a non-negative number. It is better to normalize it so the range is [0, 1], as D fg d fg P nÀ1 i¼0 f 2 i À n f f 2

:
If no restrictions are put on the values of and , then the correlation coefficient and the normalized discrepancy have a simple relation:
In the case of periodic signals, i.e. when g n+N = g n for any n, we may use the important property of the DFT where f is the complex conjugate of f.

A3. Atomic scattering factors
According to Bai (2003), for a spherical radial density model the density distribution can be written as where f (q) is an atomic scattering factor. In the range of scattering vectors q < 25 Å À1 the atomic scattering factors can be approximated by a sum of Gaussian functions, where the coefficients a i , b i , i = 1, . . . , 4 and c can be found in Ibers & Hamilton (1974). Using the above formula and with either tabulated B factors from a PDB file or a known resolution for the density distribution, the process of finding the density distribution for the whole molecule is fast and straightforward.