crystal lattices
Approximating lattice similarity
^{a}Ronin Institute, 9515 NE 137th Street, Kirkland, WA 980341820, USA, ^{b}Ronin Institute, c/o NSLSII, Brookhaven National Laboratory, Upton, NY 119735000, USA, and ^{c}Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
^{*}Correspondence email: larry6640995@gmail.com
A method is proposed for choosing unit cells for a group of crystals so that they all appear as nearly similar as possible to a selected cell. Related unit cells with varying cell parameters or indexed with different lattice centering can be accommodated.
Keywords: lattice matching; Delaunay; Delone; Niggli; Selling.
1. Introduction
A common problem in crystallography is to provide a list of the unit cells of several (or many) crystals so that they can be visually compared, making it easier to identify meaningful clusters of crystals of related morphology. Collections of experimental unitcell parameters have been created based on similarity of morphology [for example, see Donnay et al. (1963)] and, in recent years, the clustering of unit cells from the myriad of images in serial crystallography has become increasingly important (Keable et al., 2021). We have created a method to group unit cells to serve these needs and have addressed this problem in the space S^{6} (Andrews et al., 2019b).
2. Background and notation
2.1. The space S^{6}
Andrews et al. (2019b) introduced the space S^{6} as an alternative representation of crystallographic lattices. The space is defined in terms of the `Selling scalars' used in Selling reduction (Selling, 1874) and by Delaunay (1932; note that in his later publications, Boris Delaunay used the more accurately transliterated version of his surname, Delone) for the classification of lattices. A point s in S^{6} is defined by
where d = −a − b − c. As a mnemonic to remember the order, the terms involve, in order, α, β, γ, a, b, c.
2.2. Similarity
In Euclidean geometry, two objects are described as `similar' if they are identical except for a scale factor; see Euclid's work as translated by Heath (1956) and a longer description in Wikipedia (https://en.wikipedia.org/w/index.php?title=Similarity_(geometry)&oldid=1097100366). In crystallography, we can say that all facecentered cubic unit cells are similar (assuming that they are in the same presentation). On the other hand, not all primitive orthorhombic unit cells are similar. In a metric space, we refer to two objects as `approximately similar' if the distance between them after scaling to the same size is, in some sense, small, e.g. commensurate with the experimental errors in determination of the unit cells. The algorithm below attempts to find the representation of one cell that is nearest to similar to some other cell. For a given reference cell, the probe cell will be transformed to other choices of that would generate the probe's lattice and the closest match to the reference will be chosen for the result. Finally, the lattice centering of the reference cell will be restored (if necessary).
3. Algorithm
We start with a collection of experimental unit cells. From among them, we select or create the `reference' cell; that is, the one to which all the rest will be matched as closely as possible.
We transform the reference cell by many operations in the course of exploring alternative lattice representations. For each newly generated lattice representation, we accumulate the transformations needed to convert back to the original reference cell. All of these operations are performed in S^{6}. (The alternative space, G^{6}, is less convenient because the G^{6} fundamental unit is nonconvex.) To avoid duplication, for each step we only accumulate transformations that have not already been found.
To begin, each input cell is transformed to the S^{6} representation and then Sellingreduced [see Delaunay (1932) and Andrews et al. (2019a), the latter of which discusses the lesser complexity of Selling reduction and includes pseudocode]. As there is a need to be able to reverse the reduction, the reduction transformation is saved for use in later stages.
The following transformations of the reference will be done in three stages.
First, the 24 S^{6} reflections are applied (Andrews et al., 2019b) and the results stored. The store of S^{6} vectors and their generating matrices holds 24 entries each at that point.
Because the 24 operations defining reflection are unitary, and in S^{6} they are simply perturbations of the six values, they retain the values and signs of the six values, simply rearranging the six scalars.
Next, the boundary (reduction) transformations (Andrews et al., 2019b) are applied to the results of the previous step. The 24 reflections are then applied again. In each step, only newly found results are stored. These last two steps are repeated at least once in order to gain better coverage of possibly useful transformations. The counts of entries for each iteration are 24, 1566, 45 876 and finally 1 016 726. Three iterations, i.e. 45 876 entries, have been sufficient in test cases to date.
Although the six scalars are all negative for Sellingreduced unit cells, the boundary transformations are not unitary and so do not retain the six negative values.
Finally, all the accumulated transformed representations of the reference cell must be rescaled and the saved transformations inverted. The S^{6} vectors are all scaled to the same length (see Section 4) and the transformation matrix attached to each vector is inverted, thereby yielding the operation to return a lattice to the vicinity of the original reference cell. For more efficient searching in this final step, it is helpful to use a nearestneighbor search function such as NearTree (Andrews & Bernstein, 2016).
4. Why must the S^{6} vectors be scaled?
All similar lattices lie on lines that go through the origin of S^{6}. Fig. 1 shows the distinction between the case where the transformed points are all scaled to be at the same distance from the origin as the reference point [Fig. 1(a)] and the case where they are not [Fig. 1(b)]. Fig. 2 illustrates the way in which scaling all the reference points to the same 6spherical surface defines the zones of approximate similarity. Any nonzero scale factor will produce the same correct result. In S^{6} the reflections maintain the distance from the origin but the boundary transformations may not. To repeat: the only way to guarantee that the separation line for two regions goes through the origin is to have all the points at the same radius.
5. Angular measure of fit
Because the measure of similarity is independent of scale, projecting points onto a spherical surface does not modify the similarity. The angle between a probe point and the reference point is a meaningful measure of how similar the two points are.
6. Generating the approximation
The following operations are performed for each of the probe lattices in the original list. For a given probe lattice, the closest approximation among all of the transformed reference points is found. If there are multiple representations of the reference point that are equally close, then all should be examined. For the case of multiples, a method must be used to find the preferred one. For our purposes, we have found it convenient to choose the one for which the unreduced G^{6} distances to the transformed reference are the smallest. Other choices might be useful for other purposes.
Once the preferred result has been found, the corresponding inverted transformation is used to place the vector in the region of the original reduced reference cell. Finally, the inverse of the reduction operation that was performed on the reference cell is used to create the best match to the original reference. If it is desirable to restore lattice centering, then that operation must also be performed; the search returns a primitive representation of the unit cell.
7. Examples
7.1. A rhombohedral example
Le Trong & Stenkamp (2007) cite several structures for phospholipase A2 (krait neurotoxin) that were reported as different structures but were actually all the same structure (Bernstein et al., 2020). Expanding their search using the program SAUC (McGill et al., 2014), we find a total of six structures, four of which are identical in two pairs. Table 1 lists the unit cells as reported in the Protein Data Bank (PDB; Bernstein et al., 1977; Berman et al., 2000). In Tables 2, 3 and 4, the first entry in each table is used as the reference, and the following five entries are matched as closely as possible to the presentation of the reference cell. In Table 2, a rhombohedral presentation with PDB ID 1dpy was chosen as the reference. In Table 3, a Ccentered cell with PDB ID 1g2x was chosen as the reference. In Table 4, the hexagonal cell 1u4j was chosen as the reference. In each case, the probe cells were returned in the same presentation, including lattice centering as the reference cell. So the resulting centerings were hR, mC and hP, respectively, for each matched cell, regardless of the input centering, which had been determined by crystallographic analysis.




7.2. Adenosine receptor A2A
Unit cells were determined automatically from frames from serialcrystallography data collection for adenosine receptor A2A, PDB ID 5nlx (Weinert et al., 2017).
Three example unit cells were chosen from several hundred indexed data frames. Two are Ccentered and one is primitive. Table 5 gives the reported data, and Tables 6, 7 and 8 are the approximate similarity matches.




7.3. Points along a line in S^{6}
Tables 9 and 10 present two views of artificial data. A line of points in S^{6} was created from the Ccentered [80.95, 80.95, 57.10, 90, 90.35, 90] to the Acentered [57.10, 80.95, 80.95, 90, 90, 90.35] representation of the same cell of phospholipase A2. The series of intervening points interpolated in S^{6} are shown in Table 9 (each as the reduced except for the endpoints) and the latticematched results are shown in Table 10.


In Table 10, the first line is the reference cell, which is also the Ccentered cell in the first row of Table 9. The final cell is the same cell but in the Acentered presentation. The points between are equally spaced in S^{6} between those two centered points. Table 9 presents the list of points as generated and Table 10 lists the same cells in the latticematching presentation. Because the initial cell was Ccentered, the following cells are also in that presentation, although the intermediate cells are not Ccentered.
7.4. Examples from the PDB
The program SAUC (McGill et al., 2014) was used to query the PDB. The search started from the Ccentered of PDB entry 1rgx (resistin) requesting the nearest 50 cells; 26 unique cells resulted. Because there was no limit on how far the points could be from the probe, some cells differ significantly from the search cell. The results are listed in Table 11 in their published representation. Table 12 lists the same cells in the same order as in Table 11, but with the same lattice centering as 1rgx, which is the first, reference, entry.


8. Summary
A method is proposed for transforming unit cells for a group of crystals so that they all appear as similar as possible to a selected cell. The search for cells similar to the reference cell is done using the S^{6}. At the end, the lattice centering of the reference cell is restored.
and comparing with other possible unit cells nearby in the space9. Availability of code
The C++ code for lattice matching in S^{6} is available on github.com at https://github.com/duck10/LatticeRepLib.git.
Acknowledgements
Careful copyediting and corrections by Frances C. Bernstein are gratefully acknowledged. Our thanks to Jean Jakoncic and Alexei Soares for helpful conversations and access to data and facilities at Brookhaven National Laboratory. We thank Ronald Stenkamp for pointing us to the paper by Le Trong & Stenkamp (2007). We gratefully acknowledge Jörg Standfuss for permission to use the adenosine receptor A2A data. Richard Gildea helped in securing data for examples.
Funding information
Funding for this research was provided in part by: US Department of Energy, Office of Biological and Environmental Research and Office of Basic Energy Sciences (grant Nos. KP1607011 and DESC0012704, and in earlier years grant No. DEAC0298CH10886); US National Institutes of Health (grant No. P30GM133893, and in earlier years grant Nos. P41RR012408, P41GM103473, P41GM111244, R01GM117126 and 1R21GM129570); and in earlier years Dectris Ltd.
References
Andrews, L. C. & Bernstein, H. J. (2016). J. Appl. Cryst. 49, 756–761. Web of Science CrossRef CAS IUCr Journals Google Scholar
Andrews, L. C., Bernstein, H. J. & Sauter, N. K. (2019a). Acta Cryst. A75, 115–120. Web of Science CrossRef IUCr Journals Google Scholar
Andrews, L. C., Bernstein, H. J. & Sauter, N. K. (2019b). Acta Cryst. A75, 593–599. Web of Science CrossRef IUCr Journals Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542. CrossRef CAS PubMed Web of Science Google Scholar
Bernstein, H. J., Andrews, L. C., Diaz, J. A. Jr, Jakoncic, J., Nguyen, T., Sauter, N. K., Soares, A. S., Wei, J. Y., Wlodek, M. R. & Xerri, M. A. (2020). Struct. Dyn. 7, 014302. Web of Science CrossRef PubMed Google Scholar
Delaunay, B. N. (1932). Z. Kristallogr. 84, 109–149. CrossRef CAS Google Scholar
Donnay, J. D. H., Donnay, G., Cox, E. X., Kennard, O. & King, M. V. (1963). Editors. Crystal Data: Determinative Tables, 2nd ed. Buffalo: American Crystallographic Association. Google Scholar
Heath, T. L. (1956). Translator. The Thirteen Books of Euclid's Elements, 2nd ed., unabridged. North Chelmsford, Massachusetts, USA: Courier Corporation. Google Scholar
Keable, S. M., Kölsch, A., Simon, P. S., Dasgupta, M., Chatterjee, R., Subramanian, S. K., Hussein, R., Ibrahim, M., Kim, I.S., Bogacz, I., Makita, H., Pham, C. C., Fuller, F. D., Gul, S., Paley, D., Lassalle, L., Sutherlin, K. D., Bhowmick, A., Moriarty, N. W., Young, I. D., Blaschke, J. P., de Lichtenberg, C., Chernev, P., Cheah, M. H., Park, S., Park, G., Kim, J., Lee, S. J., Park, J., Tono, K., Owada, S., Hunter, M. S., Batyuk, A., Oggenfuss, R., Sander, M., Zerdane, S., Ozerov, D., Nass, K., Lemke, H., Mankowsky, R., Brewster, A. S., Messinger, J., Sauter, N. K., Yachandra, V. K., Yano, J., Zouni, A. & Kern, J. (2021). Sci. Rep. 11, 21787. Web of Science CrossRef PubMed Google Scholar
Le Trong, I. & Stenkamp, R. E. (2007). Acta Cryst. D63, 548–549. Web of Science CrossRef CAS IUCr Journals Google Scholar
McGill, K. J., Asadi, M., Karakasheva, M. T., Andrews, L. C. & Bernstein, H. J. (2014). J. Appl. Cryst. 47, 360–364. Web of Science CrossRef CAS IUCr Journals Google Scholar
Selling, E. (1874). J. Reine Angew. Math. 1874(77), 143–229. Google Scholar
Weinert, T., Olieric, N., Cheng, R., Brünle, S., James, D., Ozerov, D., Gashi, D., Vera, L., Marsh, M., Jaeger, K., Dworkowski, F., Panepucci, E., Basu, S., Skopintsev, P., Doré, A. S., Geng, T., Cooke, R. M., Liang, M., Prota, A. E., Panneels, V., Nogly, P., Ermler, U., Schertler, G., Hennig, M., Steinmetz, M. O., Wang, M. & Standfuss, J. (2017). Nat. Commun. 8, 542. Web of Science CrossRef PubMed Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.