new algorithms workshop
Unitcell determination from randomly oriented electrondiffraction patterns
^{a}Department of Biophysical Structural Chemistry, Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC Leiden, The Netherlands, and ^{b}Kavli Institute, TU Delft, Lorentzweg 1, 2628 CJ Delft, The Netherlands
^{*}Correspondence email: l.jiang@chem.leidenuniv.nl
Unitcell determination is the first step towards the structure solution of an unknown crystal form. Standard procedures for unitcell determination cannot cope with data collections that consist of single diffraction patterns of multiple crystals, each with an unknown orientation. However, for beamsensitive nanocrystals these are often the only data that can be obtained. An algorithm for unitcell determination that uses randomly oriented electrondiffraction patterns with unknown angular relationships is presented here. The algorithm determined the unit cells of mineral, pharmaceutical and protein nanocrystals in orthorhombic high and lowsymmetry space groups, allowing (well oriented) patterns to be indexed.
1. Introduction
Elastic diffraction provides the information for atomic ). However, practical problems in data collection and data processing prevent the use of electrons for the threedimensional crystallographic of organic molecules such as proteins and pharmaceuticals. Here, we address one of these practical problems: the determination of an unknown from random diffraction patterns.
However, the majority of electrons or Xrays impinging on a sample scatter inelastically and these inelastically scattered quanta induce radiation damage. Relative to the total elastic diffraction, highenergy (300 keV) electrons deposit approximately 1000 times less energy in thin biological samples than Xrays and hence induce less radiation damage after normalizing for the elastically diffracted quanta. In theory, electrons should therefore be more suited for if radiation damage is the limiting factor (Henderson, 1995In electron crystallography, the ) proposed a simple twodimensional latticereconstruction method based on tilt series, in which the d* values for the nontilt axis were plotted against the tilt angle.
is determined from electron diffraction tilt series. For this purpose, threedimensional diffraction data are collected by tilting a crystal about a selected crystallographic axis and recording a set of oriented diffraction patterns (a tilt series) at various (preferably main) crystallographic zones. Vainshtein (1964Recently, a method of unitcell parameter determination based on a tomography tilt series of diffraction patterns has been presented (Kolb et al., 2008).
A different algorithm is implemented in the program TRICE (Zou et al., 2004), which determines the in two steps. Firstly, the position and the intensities of each diffraction reflection in the individual electron diffraction patterns from the tilt series are determined and refined. For this purpose, any three reflections that do not lie in the same line are selected and are assigned a twodimensional index, assuming a The positions of diffraction spots and the angles between the diffraction patterns are then used to identify the shortest threedimensional vectors defining the unitcell parameters and the crystal orientation. The angle θ between two electron diffraction patterns of a single crystal, oriented with a doubletilt holder at the angles (α_{1}, β_{1}) and (α_{2}, β_{2}), is given by
The concept of the Niggli cell and the cellreduction technique are well established algorithms in electron crystallography. A ) to minimize the errors implies the generation of a list of lattice vectors sorted on length, together with angles between pairs of them. In addition to the conventional algorithms, GrosseKunstleve et al. (2004) implemented two numerically stable algorithms to generate the reduced cell.
can be characterized by the choice of `reduced' cell. There are 44 primitive reduced (Niggli) cells corresponding to 14 Bravais lattices. The determination of the is performed by first determining the reduced direct and then transforming it to a The recognition and interpretation of the reduced form are often difficult and are aggravated by errors in the cell parameters or rounding errors in calculations. Thus, procedures aimed at reducing these errors need to be performed. An approach suggested by Clegg (1981However, all these methods require the collection of at least two diffraction patterns from one single crystal, each collected at precisely known angles. This is not always possible. For instance, in the case of threedimensional organic crystals of proteins and pharmaceuticals, the high beamsensitivity of the materials often does not allow the collection of a tilt series from a single nanocrystal. So far, this has limited the application of electron diffraction to the study of beamsensitive molecules.
Here, we present an algorithm for unitcell determination from randomly oriented electron diffraction patterns of different but similar crystals. These diffraction patterns may be noisy, their centre may be poorly defined and their lowresolution reflections (which are of prime importance for unitcell determination) may be obscured by a beam stop or be outshone by the central beam. To deal with these problems, we first calculate the autocorrelation pattern of the diffractograms. Because of the low curvature of the vice versa; see also Fig. 1). Furthermore, autocorrelation patterns have an inversion centre, whereas the beam centre of a diffractogram may be unknown. Identifying the peak positions in the autocorrelation pattern is similar to the approach taken by the indexing program REFIX (Kabsch, 1993), which calculates the lowresolution spacings between observed spots.
the spots of the diffractogram overlap with all spots of the autocorrelation pattern (but notThe lowresolution peaks in the autocorrelation pattern form a twodimensional lattice (Fig. 1) which is defined by a pair of independent vectors. From this vector pair we construct a facet which is characterized by three numbers: the lengths of the two basis vectors and the angle between them. A facet is a rotationinvariant feature of a twodimensional lattice. Each planar intersection of a threedimensional lattice along a principal zone also generates a twodimensional lattice and hence defines a corresponding facet. Our algorithm is based on matching the observed crystal facets to model facets extracted from a simulated threedimensional lattice. Briefly, our procedure involves the following steps (see also Fig. 2).

2. Methods
2.1. Data collection
Potassium penicillin G and sodium oxacillin were available as white crystalline powders. To obtain thin crystals suitable for electronmicroscopy studies, the powder was crushed in a mortar. A small amount of the sample was placed on a 300 mesh holey carbon electronmicroscopy grid. Crystals suitable for electron diffraction studies (in terms of size, thickness and crystallinity) were selected. Diffraction experiments were performed under cryogenic conditions to increase the stability of the sample in the beam. Diffraction patterns were collected from randomly oriented crystals with a CM30T LaB6 microscope operating at 300 keV in microdiffraction mode. A condenser aperture (C2) of 30 µm and spot size 8 were used (the diameter of the beam on the crystal was approximately 1 µm). The data were recorded at a camera length of 420 mm on DITABIS image plates and digitalized at a resolution of 0.025 mm per pixel with the DITABIS Micron imagingplate readout system.
2.2. Data preprocessing and determining the crystal facets
Firstly, the digitized diffraction patterns were processed. The approximate centre of the diffraction patterns was found, the central beam or backstop shadow was removed, the resolutiondependent background was subtracted, the autocorrelation patterns were determined and the beam centre was refined. Peak positions were automatically extracted from the autocorrelation patterns using the automated particlepicking tool of the Cyclops software suite (Plaisier et al., 2007). At low resolution, the peak positions of the diffractogram coincide with those of the autocorrelation pattern (see Fig. 1). From these peak positions, we calculated a lowresolution facet for each diffraction pattern and stored these in list 1.
In the absence of a beam stop, the centre of a diffraction pattern was found by a search for the most intense connected spot using an adaptation of a standard peak search. When a beam stop occluded the direct beam, the centre was located by crosscorrelating the autocorrelation pattern of the diffractogram with the diffractogram itself and by making use of the
of the lowresolution reflections (the is caused by the low curvature of the Ewald sphere).The crystal facet describing the lattice of the autocorrelation pattern was determined by locating the two peaks close to the centre, ensuring that the angle they defined together with the centre was between π/2 and π/3. These two peaks can be located interactively or automatically in our algorithm. Visual inspection ensured that the facet indeed correlated to the twodimensional lattice of the autocorrelation pattern and that it did not correspond to lowresolution noise peaks.
2.3. Simulating a threedimensional reflection lattice and extracting lowresolution model facets
Six cell parameters (axes a, b and c, and angles α, β and γ) define a Using these six parameters, a systematic set of possible unit cells can be simulated in a grid search of axes and angles. Good guesses for the dimensions of the parameters and the step size can be made on the basis of the observed spacings and angles in the experimentally determined crystal facets, but we also allow the user to select the search range and step size.
From a set of cell parameters, a reciprocal cell matrix C can be constructed. The crystal orientation can be defined by a rotation matrix R and from these matrices R and C a matrix M = CR is constructed. The position of any reflection point p of the threedimensional reflection lattice in Fourier space for a given and crystal orientation can be calculated using the equation
Here h = (h, k, l) is an index vector containing the integral indices of p. M is defined by the unitcell parameters and the crystal orientation. The indices that satisfy p for a chosen resolution range can be found by imposing the boundary conditions
where d_{min} is the lower boundary of the resolution range and d_{max} is the upper boundary resolution. Given these equations and boundary conditions, we implemented an algorithm to quickly generate all possible positions of reflection spots in threedimensional Fourier space. From this collection of simulated threedimensional spot positions, we generated a list of all unique model facets, i.e. model facets differing from all others by less than a specified tolerance.
2.4. Calculating residuals
In the ideal case, all facets from the experimental data exactly match the facets of one specific model
In practice, however, the limited accuracy of determining the centroids of autocorrelation peaks, small variations in unitcell parameters of different crystals and the uncertainty of the crystal orientation prevent such ideal fits. Therefore, function approximation needs to be performed, in which a function is selected that matches a target function as closely as possible.The `squared difference function' is used to calculate the leastsquares error of fitting two facets. If we assume that p_{0} and p_{1} define the twodimensional vector pair of the observed facet and q_{0} and q_{1} define the simulated facet, then the square error is defined as
where R is the rotation matrix that minimizes r. This function can be solved analytically for R, thus speeding up its computation. In order to improve accuracy, but at the expense of computational speed, multiple vectors of the autocorrelation image can also be matched.
However, it is not sufficient to accumulate the residual defined in (3) for all observed facets. We need to take into account that by choosing an arbitrarily large (resulting in a very dense modelled reciprocal lattice) this residual can be decreased at will. We tested several weighting schemes and found that the one which most consistently produces good unit cells was
where the weighting factors h_{q0} and h_{q1} are the integral indices of the vectors (h, k, l) of q_{0} and q_{1} of the simulated facet. For instance, the indices q_{0} and q_{1} of a facet might be {[0, 1, 1], [1, 0, 0]}, in which case h_{q0}^{2} would be 2 and h_{q1}^{2} would be 1. If a simulated dense lattice oversamples the observed lattice N times, the r value in (3) is statistically 1/N^{2} smaller, whereas the length of the indices vector of the fitted facet is N times larger, so the weighting factor of the square of indices length in (4) corrects the overfitting problem of oversampling.
3. Results
3.1. Unitcell determination of mayenite from electron diffraction data
The algorithm was tested on randomly oriented electron diffraction data from mayenite (Ca_{12}Al_{14}O_{33}), a cubic inorganic mineral (Fig. 3). Our algorithm suggested a unitcell parameter of 11.9 Å, which is in line with a reported value from the literature of 11.98 Å (Boysen et al., 2007). We could index the diffraction patterns of certain zones satisfactory (Fig. 3), with r.m.s.d.s between observed and predicted spot positions of about 0.5%. We considered data from eight diffractograms in this analysis. In order to test the accuracy of our method and the potential for false minima, we performed a finegrid search (Fig. 3b). Here, we found a broader second minimum around 17 Å. This is, within a few percent, a factor of 2^{1/2} times larger than the known of about 12 Å and hence represents an oversampling of exactly the same lattice.
3.2. Unitcell determination of potassium penicillin G and sodium oxacillin from electron diffraction data
Electron diffraction data of potassium penicillin G (C_{16}H_{17}KN_{2}O_{4}S) and sodium oxacillin (C_{19}H_{18}N_{3}NaO_{5}S.H_{2}O) were analysed using our new algorithm. The unitcell parameters that our algorithm suggested are given in Table 1, together with Xray diffraction data taken from the literature (Dexter & van der Veen, 1978; Gibon et al., 1988). On the basis of these unit cells, we could index two main zones (001) and (011) in the case of potassium penicillin G using the program PhIDO (Calidris, Solentuna, Sweden; https://www.calidrisem.com ; see Fig. 4). We considered data from 13 diffractograms for potassium penicillin G and 11 for sodium oxacillin in the analysis.

3.3. Unitcell determination of orthogonal lysozyme
In the case of orthogonal nanocrystals of hen eggwhite lysozyme, our algorithm did not produce a et al., 2005; Biswal et al., 2000; see Fig. 1 for an example of a diffractogram and corresponding autocorrelation pattern and Table 2 for reported unit cells and the determined by our algorithm). For this calculation we used 19 different crystals. The crystals adopt preferred orientations on the EM grid and hence we also collected diffractograms at various random tilt angles to obtain more samplings of spacings that preferred to point in the direction normal to the EM grid. Overweighting crystals with such rare orientations made the cell determination more robust, but in general very similar results were obtained if we did not include this weighting. We do not exclude the possibility that the nanocrystals of lysozyme correspond to a new polymorph, but it may also be that the algorithm for some reason produces a large error of up to around 4% for large unit cells. Table 2 gives an overview of the unit cells of some known polymorphs of lysozyme, together with the produced by our new algorithm.
that was known from the literature (Saijo4. Discussion and conclusions
Our new algorithm for unitcell determination is independent of knowledge about the angular relationship between experimentally determined diffraction patterns. It does assume that all diffraction patterns share a similar threedimensional lattice. Because it can deal with a limited number of outliers, it is fairly robust. Because our algorithm uses autocorrelation patterns rather than the original data, precise knowledge of the position of the beam centre is not required, as autocorrelation patterns are always centred by definition. Using autocorrelation patterns for unitcell determination would fail at higher diffraction angles, but since the wavelength of the electrons used (approximately 0.013 Å) was two to three orders of magnitude smaller than the highest resolutions we used for our analyses (between 1 and 4 Å), this did not impose any serious problems in practice.
For the smallmolecule crystals, which belonged to orthorhombic or cubic space groups and hence had three or less
in their unitcell parameters, the algorithm performed well, reproducing literature values within a few percent. We do not expect a higher level of accuracy, as the method is based on the lowresolution spacings. In a subsequent indexing and unitcell step, which will use the original diffraction pattern, we assume that these small errors can be reduced.Somewhat surprising was the b axis and a significantly longer c axis than the unit cells reported in the literature. The unitcell volume of the largest known orthorhombic polymorph of hen eggwhite lysozyme was about 13% smaller than that of our nanocrystals (Table 2). Unfortunately, our nanocrystals could not be grown to a larger size. Hence, we could not corroborate the new by Xray analysis and in the absence of independent proof we cannot exclude the possibility that our algorithm failed to identify the correct of nanocrystalline lysozyme. It may be that the combination of randomly oriented diffraction patterns, a relatively large and a potentially anisotropic rocking curve frustrates our algorithm and we are further investigating potential improvements. However, using the large we were able to index well aligned diffraction patterns using the program ELD (Zou et al., 1993), yet we failed to index these patterns if we used the unit cells of known orthorhombic polymorphs of hen eggwhite lysozyme (Fig. 5). Furthermore, all the known unit cells of lysozyme gave considerably worse residuals as defined by (4) and therefore were not supported by our experimental data. In this light, we propose that the nanocrystals are a new polymorph of lysozyme that was induced by the on human hair as described in Georgieva et al. (2007).
we found for orthorhombic lysozyme, which had a significantly shorterHow many diffractograms are needed to estimate the unit cell? There is not a straightforward answer to this question, but in general it is better to include as many data in the analysis as possible. If the crystals have a favoured orientation on the grid (as the lysozyme crystals did), then it is important to collect tilted data, as otherwise the possibility exists that one of the spacings is not observed. However, there are also other issues that influence the robustness of our algorithm, for instance the symmetry of the
(higher symmetry gives better results) or peculiarities of a specific combination of unitcell parameters: if, for instance, in an orthorhombic the (100) and (021) directions have similar lengths, indexing may become confused.With our new algorithm we have made progress in enabling
by electron diffraction of beamsensitive threedimensional nanocrystals. Subsequent steps involve testing our algorithm on lower symmetry space groups (monoclinic and triclinic), refining the unitcell parameters, indexing the electron diffraction patterns, integrating the diffraction intensities, merging the data and phasing. However, these subsequent steps crucially depend on knowledge of the and in many cases we can use algorithms and programs developed for Xray crystallography.Footnotes
‡These authors contributed equally to this work.
Acknowledgements
The authors would like to thank Dr Ulrike Zeise and Tom de Kruijff for technical support, Dr S. Nikoloupulos and Professor C. Giacovazzo for providing the pharmaceutical powders for the electron diffraction experiments, M. W. A. Kok for making Fig. 3(b), Professor X. Zou and Eleni Sarakinou for help with the program PhIDO and Dr Rag de Graaff, Vikas Kumar and Qiang Xu for fruitful discussions.
References
Biswal, B. K., Sukumar, N. & Vijayan, M. (2000). Acta Cryst. D56, 1110–1119. Web of Science CrossRef CAS IUCr Journals Google Scholar
Boysen, H., Lerch, M., Stys, A. & Senyshyn, A. (2007). Acta Cryst. B63, 675–682. Web of Science CrossRef CAS IUCr Journals Google Scholar
Clegg, W. (1981). Acta Cryst. A37, 913–915. CrossRef CAS IUCr Journals Web of Science Google Scholar
Dexter, D. D. & van der Veen, J. M. (1978). J. Chem. Soc. Perkin Trans. 1, pp. 185–190. CSD CrossRef Web of Science Google Scholar
Georgieva, D. G., Kuil, M. E., Oosterkamp, T. H., Zandbergen, H. W. & Abrahams, J. P. (2007). Acta Cryst. D63, 564–570. Web of Science CrossRef CAS IUCr Journals Google Scholar
Gibon, V., Norberg, B., Evrard, G. & Durant, F. (1988). Acta Cryst. C44, 652–654. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
GrosseKunstleve, R. W., Sauter, N. K. & Adams, P. D. (2004). Acta Cryst. A60, 1–6. Web of Science CrossRef CAS IUCr Journals Google Scholar
Henderson, R. (1995). Q. Rev. Biophys. 28, 171–193. CrossRef CAS PubMed Web of Science Google Scholar
Kabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kolb, U., Gorelik, T. & Otten, M. T. (2008). Ultramicroscopy, 108, 763–772. Web of Science CrossRef PubMed CAS Google Scholar
Plaisier, J. R., Jiang, L. & Abrahams, J. P. (2007). J. Struct. Biol. 157, 19–27. Web of Science CrossRef PubMed CAS Google Scholar
Saijo, S., Yamada, Y., Sato, T., Tanaka, N., Matsui, T., Sazaki, G., Nakajima, K. & Matsuura, Y. (2005). Acta Cryst. D61, 207–217. Web of Science CrossRef CAS IUCr Journals Google Scholar
Vainshtein, B. K. (1964). Structure Analysis by Electron Diffraction. Oxford: Pergamon. Google Scholar
Zou, X. D., Hovmöller, A. & Hovmöller, S. (2004). Ultramicroscopy, 98, 187–193. Web of Science CrossRef PubMed CAS Google Scholar
Zou, X. D., Sukharev, Y. & Hovmöller, S. (1993). Ultramicroscopy, 49, 147–158. CrossRef CAS Web of Science Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.