research papers
New methods for indexing multilattice diffraction data
^{a}Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, England, ^{b}STFC Rutherford Appleton Laboratory, Didcot OX11 0QX, England, ^{c}CCP4, Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot OX11 0FA, England, ^{d}Division of Structural Biology, The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, England, and ^{e}Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
^{*}Correspondence email: graeme.winter@diamond.ac.uk
A new indexing method is presented which is capable of indexing multiple crystal lattices from narrow wedges of diffraction data. The method takes advantage of a simplification of Fourier transformbased methods that is applicable when the unitcell dimensions are known a priori. The efficacy of this method is demonstrated with both semisynthetic multilattice data and real multilattice data recorded from crystals of ∼1 µm in size, where it is shown that up to six lattices can be successfully indexed and subsequently integrated from a 1° wedge of data. Analysis is presented which shows that improvements in dataquality indicators can be obtained through accurate identification and rejection of overlapping reflections prior to scaling.
Keywords: indexing; multilattice data.
1. Introduction
A fundamental limitation of conventional macromolecular crystallography is the necessity of obtaining one or more crystals of sufficient size and quality to record a reasonably complete data set. The development of microfocus beamlines has allowed data to be collected from smaller crystals than ever before [see the recent reviews of the history and capabilities of microfocus beamlines by Evans et al. (2011) and Smith et al. (2012)]. Frequently, particularly in the cases of viruses and membrane proteins, only small, poorquality crystals may be available and it may only be possible to collect a highly incomplete data set over a small oscillation range for each individual crystal before the diffraction quality is affected by radiation damage.
While an individual crystal may only give an incomplete partial data set, a complete data set may be obtained by merging data from many tens or hundreds of crystals (Grimes et al., 1998; Wang et al., 2012; Hanson et al., 2012) (although in certain circumstances very incomplete data sets may suffice; see, for example, Hadfield et al., 1995). The advent of serial femtosecond crystallography using Xray freeelectron lasers (XFELs) has recently encouraged further interest in the development of serial crystallography using synchrotrons (Gati et al., 2014; Rossmann, 2014; Stellato et al., 2014).
For crystals as small as a few micrometres in size it may not be possible to resolve individual crystals using the beamline onaxis viewing system, in which case gridscan analysis (Song et al., 2007; Cherezov et al., 2009; Bowler et al., 2010; Aishima et al., 2010; Axford et al., 2012) may be necessary to identify sample locations prior to data collection. Such grid scans usually score the diffraction quality by the number and intensity of diffraction spots, suggesting positions where these are maximized. While this is generally a reliable procedure, in cases where the samples are substantially smaller than the beam it is likely that positions will be selected where multiple samples are illuminated, such that multiple independent diffraction patterns will be visible in the resulting data. Unfortunately, indexing methods in the more commonly used integration packages can become unreliable when multiple similarly strong lattices are visible in narrow wedges of data.
In XDS (Kabsch, 2010b) indexing generally works well when there is a single dominant lattice (Kabsch, 1993); however, it may work less well or not at all when two or more equally strong lattices are present. Within MOSFLM (Leslie & Powell, 2007) it is now possible to index as many as four independent lattices (Powell et al., 2013); however, this requires the use of at least two images well spaced in rotation and may also require careful adjustment of parameters. For the multilattice indexing in LABELIT (Sauter & Poon, 2010) it is assumed that one main lattice may be assigned which identifies (by default) at least 40% of the reflections. Paithankar et al. (2011) described the application of the GrainSpotter program (Sørensen et al., 2012; Schmidt, 2014) to multilattice macromolecular crystallography, but GrainSpotter does not currently appear to be in widespread use within the macromolecular crystallography community.
Here, we present a new indexing method to address this challenge of indexing multiple similarly strong lattices within narrow wedges of data. The algorithms for these methods have been developed within the DIALS framework (Waterman et al., 2013), which builds on cctbx (GrosseKunstleve et al., 2002) and dxtbx (Parkhurst et al., 2014) to offer tools for the analysis of Xray diffraction data. In the context of indexing diffraction patterns, this offers spot finding, handling of constraints (GrosseKunstleve et al., 2004) and tools for exporting the results to, for example, XDS. The methods presented here are therefore implemented in the program dials.index.
2. Notation
For clarity, the following notation will be used in this manuscript for the mathematical operations. A more complete description, including a discussion of the various coordinate frames used, may be found in Appendix A and also in the description by Parkhurst et al. (2014) of the experimental models used by dxtbx. Throughout this manuscript we use the term `sweep' to refer to a contiguous sequence of rotation images measured with a constant wavelength, distance and dose per image. A `wedge' typically refers to a small sweep, i.e. one that samples a small part of reciprocal space.
λ: Xray wavelength.
h: h, k, l; h′ is its realvalued approximation.
U, B and A: crystal orientation matrix, reciprocalspace orthogonalization matrix and setting matrix, respectively, where A = UB = (a* b* c*) and a*, b*, c* are the reciprocalspace unitcell vectors.
A^{−1}: indexing matrix, where A^{−1} = (a b c)^{T} and a, b, c are the realspace unitcell vectors.
φ: rotation angle around the goniostat rotation axis.
R: goniostat rotation matrix.
d_{x}, d_{y}: basis vectors for coordinates in the detector plane.
d_{0}: vector from the origin of the laboratory frame to the origin of coordinates for the detector plane.
D: detector projection matrix, where D = d^{−1} = (d_{x} d_{y} d_{0})^{−1} (Bricogne, 1987).
x_{px}, y_{px} and x_{mm}, y_{mm}: detector pixel coordinates and coordinates in millimetres in a virtual detector plane, respectively.
v: virtual detector coordinates, where v = (x_{mm}, y_{mm}, 1).
s_{0}, s_{1}: incident and scattered beam vector.
r_{φ}: reciprocallattice vector on the surface of the at rotation angle φ, where r_{φ} = RAh.
r: reciprocallattice vector in Cartesian (i.e. fixed with respect to the laboratory frame), where r = Ah.
3. Methods
Indexing methods conventionally take a list of spot centroid positions (whether threedimensional centroids or twodimensional image centroids and frame numbers) and some description of the experimental geometry to (i) convert the spot positions to the laboratory frame, (ii) convert these positions to the corresponding set of reciprocalspace vectors {r_{j}} and (iii) analyse the set of vectors {r_{j}} for periodicity and hence find the reciprocallattice basis vectors and the corresponding set of integer {h_{j}}. In mathematical terms the first two steps are common to all indexing methods as follows (Pflugrath, 1997).
A sequence of diffraction images are analysed to find a list of candidate Bragg reflections using a spotfinding routine. This returns a list of spot centroids in the form of x_{px}, y_{px} pixel coordinates in the detector plane and image numbers (which may be nonintegral for threedimensional spotfinding), which are then mapped to x_{mm}, y_{mm} positions in the detector coordinate system (1) and a rotation angle φ. Consequently, these are mapped onto the surface of the to give the scattered beam wavevector, s_{1}, normalized to length 1/λ (2–4), where λ is the wavelength, such that the end point of the vector is on the surface of the with radius 1/λ. The reciprocallattice vector in diffracting condition, r_{φ}, is obtained as the difference between the diffracted wavevector s_{1} and the incident beam vector s_{0} (5). The reciprocallattice vector in Cartesian r, is obtained by rotating the vector r_{φ} by the angle −φ about the vector defined by the rotation axis of the goniometer (6).
Analysis of the set of reciprocallattice vectors {r_{j}} to determine the basis vectors may use a variety of algorithms. In XDS (Kabsch, 1988) the set of short difference vectors {r_{j} − r_{k}} are calculated to build up loworder multiples of lattice vectors on a histogram, which is subsequently analysed to determine a unique basis. Other methods rely on the Fourier transform relationship between real and to provide a route for simultaneously determining both the unitcell and crystalorientation parameters from a set of observed spot centroids (Bricogne, 1986; Otwinowski & Minor, 1997; Steller et al., 1997; Campbell, 1998). Methods have been developed utilizing both onedimensional (Steller et al., 1997; Powell, 1999; Sauter et al., 2004) and threedimensional (Campbell, 1998; Otwinowski et al., 2012) fast Fourier transforms (FFT) to identify the likely directions and magnitudes of the reciprocallattice vectors.
These published Fourier methods utilize the knowledge that the maxima of the function
where x represents a point in are the solutions giving integer triples, h_{j}, to the set of equations
where
and a, b and c are the initially unknown unitcell basis vectors. The vectors x that give the maximum values of F(x) correspond, therefore, to these realspace unitcell basis vectors or some linear combination thereof. A threedimensional fast Fourier transform may be used to calculate this function on a relatively coarse threedimensional uniform grid, which is then searched to find the approximate maxima of (7) (Campbell, 1998; Otwinowski et al., 2012). Alternatively, the maxima may be found by carrying out a series of onedimensional FFTs after projecting the reciprocal basis vectors onto various directions covering a hemisphere of (Steller et al., 1997).
The methods described above simultaneously determine both the direction and magnitude of the basis vectors. However, if the unitcell dimensions are known then the magnitudes of the basis vectors are also known, leaving only the directions of the basis vectors to be determined. From the knowledge of the magnitude of the basis vectors, we know that each local maximum of (7) must lie on the surface of a sphere whose radius is determined by the magnitude of the basis vectors. Therefore, we propose to perform a twodimensional, rather than a threedimensional, search for maxima of (7) by varying the direction of x only, i.e.
where
x is set equal to the length of one of the realspace unitcell vectors and defines a unit vector with spherical coordinates ψ, θ. The search directions ψ, θ are chosen to be evenly spaced within a hemisphere, using a method similar to that described by Steller et al. (1997). The resulting set of vectors are sorted by decreasing value of F(x), and vectors that are approximately collinear with a vector higher in the list are eliminated. The top 30 vectors in this reduced list are analysed to find suitable combinations of basis vectors which are consistent (within userdefined relative length and absolute angular tolerances) with the known (Hattne et al., 2014). This gives a set of candidate crystal setting matrices which are further analysed to choose the one which is most consistent with the set of observed centroids, i.e. the one which indexes as many observed centroids as possible. Although this relatively simple metric appears to work well in this study, work is ongoing to devise a more robust metric that takes into account the quality of the fit between the calculated and predicted centroids, such as that described by Sauter et al. (2004). The unitcell dimensions of the primitive setting of the are used in the search for the initial set of candidate basis vectors, although the algorithm should be equally applicable in the case of the reduced basis, reference setting or some other nonstandard setting.
Each reciprocallattice vector is then expressed in terms of the reciprocal basis vectors according to
The nonintegral h′ are rounded to give the integer h. Only those reflections are used where the norm of the difference between the integer and realvalued i.e. h′ − h, is less than some tolerance (in this work a tolerance of 0.3 was used).
The unitcell and crystalorientation parameters are then refined using the positions of the indexed reflections (§3.3). Once has converged, any remaining unindexed reflections may be analysed for further lattices. In subsequent iterations, joint of the crystal lattices is performed. This process may be repeated until either an insignificant number of unindexed reflections remain or no further lattices can be identified. If at any stage does not converge, the most recently identified lattice is discarded and only those lattices which were refined successfully are reported.
3.1. Assigning indices to reflections in the presence of multiple lattices
Initially, each reflection is assigned a potential h′ − h. A further check is made to ensure that two reflections are not assigned to the same lattice with the same If this is the case, then the one that gives the smallest value of the norm h′ − h is used and the remaining reflections are rejected as outliers.
as described above for the case of a single lattice. The reflection is assigned to the lattice that gives the with the smallest norm 3.2. Spotfinding
Spotfinding was performed using dials.find_spots (unpublished work), which is based on the algorithms described by Kabsch (2010a). This determines spot centroids in three dimensions, as well as estimates of the centroid variances, which are a valuable input to the step (§3.2).
dials.find_spots provides options to filter the initial list of strong spots based on the minimum number of contiguous pixels within the spot, the minimum and maximum resolution limits, the maximum peaktocentroid separation (for example to reject split peaks) and the rejection of spots that are close to an ice ring or within an untrusted region of the detector (for example behind the beamstop shadow).
3.3. Refinement
dials.refine (unpublished work) which includes a completely general approach to the of the experimental geometry. This minimizes via weighted least squares the discrepancy between the observed spot centroids and the central impacts calculated from the current model of the crystal orientation, beam direction and detector position and orientation. Parameters that affect the shape of the spot, such as the mosaic spread, are not refined at this stage.
was performed with3.4. Outlier rejection
Even for the case of a single lattice, outlier rejection can be important for accurate ). While Sauter & Poon (2010) propose a more elaborate statistical treatment of outliers, in this work we simply provide userconfigurable parameters to control the maximum acceptable deviations between the observed and calculated spot position in x and y in the detector frame and in the rotation angle φ. This is similar to the behaviour of the equivalent XDS parameters (MAXIMUM_ERROR_OF_SPOT_POSITION= and MAXIMUM_ERROR_OF_SPINDLE_POSITION=), which have default values of three pixels and 2°, respectively (https://xds.mpimfheidelberg.mpg.de/html_doc/xds_parameters.html).
of the crystal and experimental parameters. In the presence of multiple lattices, outlier rejection becomes critical for correctly assigning reflections to the separate lattices (Sauter & Poon, 20103.5. The importance of accurate experimental geometry for indexing
The mapping of the positions of diffraction maxima from image to
is necessarily sensitive to the accuracy of the experimental description. For many singlelattice data sets, particularly spanning many degrees of rotation, assumptions may be made about the initial experimental geometry, for example assuming that the beam is perpendicular to the rotation axis and that this axis is coincident with the fast or slow direction on the detector. In most cases the deviation from these assumptions will be small and well within the radius of convergence of the indexing algorithms.Hattne et al. (2014) demonstrated using XFEL still shots that poorly determined detector geometry can adversely affect both the indexing success rate and the quality of the integrated data, particularly at high resolution. Similarly, in the case of narrow wedges of synchrotron rotation data there is much less unique information to use in the of the geometry. Combined with the presence of multiple lattices, which make outlier rejection more challenging, indexing methods become much less tolerant of errors in the recorded geometry. This is ideally addressed by (i) storing an accurate model of the experimental geometry in the image headers or (ii) having a goodquality and complete rotation data set recorded from a test crystal using the same experimental geometry. In many cases the latter of these is more easily achieved as a user, so dials.index allows the input of this refined geometric information from a previous processing run.
In some cases it may be found after the experiment is complete that the geometry recorded in the image headers is insufficiently accurate. In this situation it may be necessary to make some assumptions about the initial experimental geometry as above and attempt to discover, for example, a more accurate estimate of the beam centre (Sauter et al., 2004). Of course, once the initial indexing is successful full may proceed as described above.
3.6. Integration with XDS
The resulting crystal and experimental geometry parameters were exported in XDS format and the separate lattices were integrated individually using XDS. Standard XDS practices were followed including, for example, running the INTEGRATE step a second time using the GXPARM.XDS output by the CORRECT step and the refined values for beam divergence and mosaicity as input (https://strucbio.biologie.unikonstanz.de/xdswiki/index.php/Optimisation).
3.7. Identification of overlapping reflections
Ideally, overlapping reflections would be identified prior to integration in order that they can be excluded during determination of the spot profile model and taken into account during calculation of the background around each reflection. As XDS does not currently support integration of multiple lattices, analysis of overlapping reflections is performed after integration of the individual lattices with XDS, allowing overlapping reflections to be excluded from subsequent scaling.
In order to identify overlapping reflections, the extent of the reflections in detector/rotation space is first calculated as a bounding box that fully encloses each reflection's peak region. In DIALS, the bounding box is created using a profile model as used in XDS and described by Kabsch (2010a): the XDS σ_{b} and σ_{m} parameters are used to specify the size of each reflection on the detector and in rotation, respectively. The profile model assumes a Gaussian spot profile in a reciprocalspace coordinate system local to each reflection. The extent of the spot in this coordinate system is taken as N_{σ} standard deviations from the origin. The bounding box is then calculated by mapping the spot profile back into detector/rotation space and finding a threedimensional box that fully encloses it. A mask is created for each reflection that specifies, for each pixel in the boundingbox region, which pixels are part of the peak and which are background. Overlapping reflections are then found in a twostage procedure: the bounding boxes are first processed to extract a list of pairs of potentially overlapping reflections and these pairs are then checked to determine whether the peak regions overlap.
A spacepartitioning algorithm is used to extract a list of overlapping bounding boxes. The algorithm uses a kd tree to recursively partition the space along each dimension and query the number of objects intersecting a given range. Since the list of bounding boxes is used both to construct the tree and to provide the list of query ranges, an optimization is performed to allow these steps to be performed in a single pass. The algorithm has a time complexity of O(NlogN), where N is the number of reflections. Each pair of potential overlaps is then analysed to determine whether their peak regions overlap. This is performed by iterating over the pixels in the intersection between the bounding boxes of two reflections; a pair of reflections which contains one or more pixels that are labelled as peak in both reflections are marked as overlapping.
3.8. Resolution of indexing ambiguities
For several space groups, the ). For these space groups it is necessary to ensure that indexing is consistent across all crystals when merging data from multiple crystals to form a single data set. Programs such as POINTLESS (Evans, 2006) can typically resolve indexing ambiguities by comparing data from each crystal against a reference data set (which may be one of the data sets being scaled together). However, as the wedges of data being scaled together become narrower (e.g. 1°) this approach may no longer work reliably. If available, a more complete but lowresolution data set may be used as a reference, or calculated structure factors may be used if the structure is already known.
contains two or four symmetry elements that are not in the resulting in alternative indexing possibilities (Dauter, 1999Brehm & Diederichs (2014) introduced an elegant way of breaking the indexing ambiguity in XFEL data sets, which may comprise many thousands to hundreds of thousands of very incomplete partial data sets taken from still shots. The approach rests upon a comparison of pairs of images. Regardless of the fact that still shots from two randomly oriented crystals will share only a few common the between those shared structurefactor intensities will be highest if the two images have been indexed with the same sense. An implementation of algorithm 2 of Brehm & Diederichs (2014) was developed in the context of cctbx.xfel (Sauter et al., 2013), and in §4.2 we demonstrate the application of the algorithm to synchrotron data in I23.
4. Results and discussion
4.1. Semisynthetic multilattice data sets
Assessing the accuracy of the indexing method with multiple lattices present is straightforward if the correct orientation matrices are known a priori. To meet this requirement, semisynthetic multilattice data sets were created by the pixelwise addition of small wedges of data recorded from a crystal of bovine pancreatic trypsin (∼0.1 × 0.1 × 0.2 mm in size) on beamline I04 at Diamond Light Source at arbitrary φ and κ offsets of a minikappa goniometer. Each wedge of data was recorded over a total range of 10° with 0.1° and 0.1 s per image, with a relatively low transmission (5%) to minimize the effects of radiation damage. The data were recorded at a wavelength of 0.97949 Å at a distance of 214 mm on a PILATUS2 6M detector. A total of 12 data sets were recorded (a–l) and combined to create data sets with two, three, four and six lattices visible by adding the intensity at pixel x, y on image z from, for example, data sets a, e and i for the pixel at x, y on image z for data set aei. For a given number of lattices each of the original image sets was used only once. For each and every permutation of data sets, the orientations of all crystals were successfully identified and refined using the methods described in this paper. Each lattice was integrated individually using XDS (Kabsch, 2010b) before the 12 integrated partial data sets were scaled together using AIMLESS (Evans & Murshudov, 2013). It is important to note that by default AIMLESS adjusts the intensity standard deviations automatically as σ′(I) = SdFac[σ(I)^{2} + SdB × I + (SdAdd × I)^{2}]^{1/2}: this was applied here.
Data were integrated to the corners of the detectors (∼1.0 Å), but the resolution was truncated at 1.3 Å (the resolution of the inscribed circle on the detector) for subsequent analysis. Scaling was performed both with and without the identification of overlapping reflections; datareduction statistics are presented in Tables 1 and 2. Figs. 1 and 2 show datareduction statistics as a function of resolution, whilst Fig. 3 shows the fraction of overlapping spots as a function of resolution. Datareduction statistics were calculated using phenix.merging_statistics (Adams et al., 2010).

It is important to note that as a result of combining images from different orientations to create semisynthetic multilattice data sets, the background will be between two and six times higher for the multilattice data sets than for the original singlelattice data sets. This is reflected in the standard deviation correction parameters applied to the measurements by AIMLESS, which had a relatively wide range of values for the 12 individual data sets a–l: SdFac from 0.49 to 0.79, SdB from −2.55 to 8.95 and SdAdd from 0.0277 to 0.0798. These were generally increased when determined for the sixlattice data, although the interpretation of the values is complicated by the interdependence of the parameters. The change in the mean values for these parameters from single lattice to six lattice processing was SdFac increasing from around 0.57 to 0.75, SdB decreasing slightly from 3.04 to 2.84 and SdAdd remaining constant at around 0.047. The increase in SdFac explains, at least in part, the reduction in mean I/σ(I) as the number of lattices increases in Tables 1 and 2. In reality, for a true multilattice data set the background will be similar, or potentially even lower if the extra crystals displace solvent in the beam.
4.1.1. Overlapping reflections
One potential concern when faced with the presence of multiple lattices is the effect of overlapping reflections on the quality of the reduced data. In order to address this concern, we examined the fraction of overlapping reflections as a function of resolution (Fig. 3) using a value of N_{σ} = 3, where N_{σ} is the number of standard deviations used to calculate the reflection mask (§3.7). In contrast to Paithankar et al. (2011), we observe that the overlap fraction increases with resolution, which we attribute to the more sophisticated identification of integrated pixels based on the XDS profile model in comparison to the fixed reflection spot size assumed by Paithankar and coworkers.
Similarly to Buts et al. (2004), we observe that excluding overlapping reflections from scaling dramatically reduces the number of observations rejected as outliers (Tables 1 and 2). We note that there is an improvement in several dataquality indicators [in particular R_{meas}, R_{p.i.m.} and 〈I/σ(I)〉] when excluding overlapping reflections from scaling, at the cost of a reduction in completeness and multiplicity (Tables 1 and 2 and Figs. 1 and 2).
Inspection of the diffraction images suggests qualitatively that many of the overlaps identified through the procedure described above in fact only involve the overlap of a few pixels from each reflection (see, for example, Fig. 4). This was confirmed by a histogram of the fraction of overlapping pixels (Fig. 5), indicating that the majority of overlapping reflections overlap only in the tails of the peak region. This suggests that reducing the number of standard deviations, N_{σ}, used to calculate the reflection mask profiles would ensure that only pairs of reflections that are overlapping in the central peak region are rejected, with minimal impact on data quality. Scaling was repeated excluding overlapping reflections calculated using N_{σ} = 2. The resulting merging statistics were similar to those obtained using N_{σ} = 3, but with higher values of completeness and multiplicity, particularly for the sixlattice data set (Table 3 and Fig. 6). The choice of an optimal value of N_{σ} to be used in the identification of overlapping reflections is likely to involve a compromise between data quality and completeness. A more advanced approach would require the modification of integration software such that it is aware of the presence of multiple lattices, enabling the exclusion from background determination and profile fitting of pixels belonging to overlapping reflections (Fry et al., 1993). Alternatively, peak deconvolution procedures during integration such as those described by Bourgeois et al. (1998) or Schreurs et al. (2010) may work well in such cases.

4.1.2. Very narrow wedges
It is widely recognized that the robustness of current indexing algorithms can be increased by using data from images that are widely separated in et al., 1997; Sauter et al., 2004; Powell et al., 2013; Winter et al., 2013). Therefore, this can make the indexing of multiple lattices from narrow wedges of data (e.g. 1° rotation images or XFEL still shots) especially challenging.
(for example, separated by a rotation of 90°), particularly for more problematic cases (StellerIn order to further test our multilattice indexing algorithm, we ran dials.index using just the first 1° of images for the semisynthetic multilattice data sets described above. In all cases, from the twolattice data sets to the sixlattice data sets, all 12 lattices were successfully identified and the crystal orientation refined to within less than 0.05° of the orientation obtained from the full 10° of singlelattice data.
We then tested the performance of the algorithm using just the first image from each sweep and found that all six lattices were successfully identified from a single image of each sixlattice data set. This result demonstrates the applicability of the algorithm to very narrow wedges of data and potentially also to XFEL data, where the nature of the current sampledelivery systems can result in multiple lattices being present in the beam simultaneously (Hattne et al., 2014; Sawaya et al., 2014).
4.1.3. Comparison to existing methods
In order to assess the performance of our methods in comparison to existing methods, indexing was attempted with the recent implementation of multilattice indexing in iMosflm (Powell et al., 2013). When provided with the first 1° of images from the sixlattice data sets, iMosflm identified five lattices (only two of which had the correct unit cell) for one of the data sets and five lattices (only one of which had the correct unit cell) for the second data set. When indexing was attempted using only the first image of each data set, iMosflm identified four lattices (of which three had the correct unit cell) for the first data set and only one lattice (with an incorrect unit cell) for the second data set.
Whilst XDS itself does not implement multilattice indexing, it is possible to extract the list of unindexed spots from the output of XDS indexing and use these as input to a subsequent run of the IDXREF step in order to attempt indexing of a further lattice (https://strucbio.biologie.unikonstanz.de/xdswiki/index.php/Indexing). Using this approach, we attempted indexing with XDS using just the first image of each of the sixlattice data sets. For one of the data sets (acegik) XDS was able to successfully identify all six lattices. However, for the other data set (bdfhjl) XDS apparently identified five lattices, but on further inspection an incorrect was chosen by XDS in spite of the known and symmetry being provided as input.
Whilst this is not intended to be a rigorous comparison of the robustness of different indexing algorithms and implementations, it serves to demonstrate that the availability of a variety of algorithms can be beneficial to the user community as each algorithm has a its own set of strengths and weaknesses.
4.2. Polyhedra microcrystal data
Polyhedra are naturally formed protein microcrystals produced by cypoviruses and baculoviruses, in which virus particles are embedded as part of an infectious cycle targeting insects (Chiu et al., 2012). The polyhedra protect the virus particles against hostile conditions and allow them to survive for long periods prior to ingestion and particle release within the insect gut. These crystals typically only grow within the insect cells to a few micrometres in size (with the maximum size depending on the virus species). Early synchrotron studies used powder diffraction to show that although their biological structure varies substantially, with little similarity in their aminoacid sequences, they form virtually identical crystal lattices in I23 with very similar unitcell dimensions (a ≃ 100 Å; Anduleit et al., 2005). Recent studies have successfully used microfocus beamlines at thirdgeneration synchrotron sources to obtain molecular structures from single crystals (Coulibaly et al., 2007, 2009; Ji et al., 2010). The crystals studied to date have typically been on the order of 5–12 µm, but Axford et al. (2014) have recently reported highquality data obtained from crystals of only 4–5 µm in size.
Data for a previously unstudied polyhedrin were collected on the I24 beamline at Diamond Light Source from crystals on the order of 1 µm in size (Fig. 7) using an Xray beam with a of ∼4 × 4 µm at the sample. Individual crystals could not be resolved with the beamline onaxis viewing system; therefore, data were collected at locations identified using grid scans (Aishima et al., 2010) with the help of the DISTL software (Zhang et al., 2006). Diffraction was extremely weak (Fig. 8) and therefore required very long exposures per frame, and as a result only 1° of data could be collected per crystal. 420 data sets (20 × 0.05° images) were collected, but automated data processing using the XDS pipeline within xia2 proved problematic, with few data sets processing successfully (160 out of 420).
Analysis of the number of spots per data set found using dials.find_spots compared with the number expected based on the unitcell dimensions gave a clear indication of the presence of multiple lattices (Fig. 9). dials.index was used in multilattice mode on the output of dials.find_spots, identifying a total of 997 lattices, of which 768 were integrated successfully with XDS, representing a significant improvement compared with that obtained using XDS via xia2. The majority of sweeps were found to have more than one lattice present, with up to five lattices successfully integrated in some cases (Fig. 10). The I23 can be indexed in one of two ways; hence, it was necessary to ensure that all lattices were indexed in a consistent manner. This was achieved using the algorithm of Brehm & Diederichs (2014) as described in §3.8, which showed a clear separation of the two indexing modes (Fig. 11).
4.3. Applications
4.3.1. xia2
While the algorithms described above are useful in isolation, when faced with data sets consisting of many tens or even hundreds of sweeps some level of automation becomes critical. The dials.index tool and associated spotfinding and commands have been incorporated into xia2 (Winter, 2010) and used with XDS for integration. While this works well for data sets with single lattices visible on the images, the design of xia2 is such that processing multiple lattices is currently not possible: for sweeps with multiple lattices only the first lattice identified will be processed. Work is ongoing to redesign this aspect of xia2 to offer the user an automated tool for processing multilattice data.
4.3.2. Diamond Light Source
As these algorithms are not yet fully integrated into xia2, they have been added to the automatic processing scripts that are running following data collection on Diamond MX beamlines (Winter & McAuley, 2011) to provide the user with feedback about (i) whether the data can be indexed and (ii) the number of lattices present. While the former of these may be used to provide data via XDS to BLEND (Foadi et al., 2013), the latter provides useful feedback on the sample density and can guide subsequent sample preparation.
5. Availability
DIALS is available for download from https://sourceforge.net/projects/dials and the source code is available under a nonrestrictive opensource BSD license. The program dials.index also includes an implementation of the onedimensional FFT indexing methods of Steller et al. (1997) derived from the opensource components of LABELIT (Sauter et al., 2004) and an implementation of threedimensional FFT indexing methods (Bricogne, 1986; Campbell, 1998), both of which do not require prior knowledge of the unit cell.
The original trypsin images and the semisynthetic multilattice images are publicly available at https://zenodo.org/record/10820 (Gildea & Winter, 2014).
6. Conclusions
New indexing algorithms have been presented which aid the analysis of microcrystal Xray diffraction data by overcoming some of the key indexing challenges, namely handling narrow sweeps of data containing spots from multiple crystal lattices. These algorithms have been developed within the DIALS framework but may be applied with other integration software such as XDS. In dealing with experimental data where multiple lattices are present it was demonstrated that the treatment of overlapping peaks was necessary to obtain goodquality data; however, doing so required the development of additional tools within the DIALS framework. Given the similarities between the serial crystallography discussed here and XFEL data collection, it is only fitting that the algorithms may be shared: we anticipate that the indexing algorithms presented here may be equally applicable to XFEL data.
APPENDIX A
A1. Coordinate frames
A1.1. The diffractometer equation
We use the vector h to describe a position in fractional in terms of the reciprocallattice basis vectors a*, b* and c*,
The special positions at which h, k and l are integer define the reciprocallattice vectors for which (hkl) are the Miller indices.
The basic diffractometer equation relates a position h to a position r_{φ} in Cartesian This space is defined so that its axes coincide with the axes of the laboratory frame. The distinction is necessary because distances in are measured in units of Å^{−1}. However, for convenience it is often acceptable to refer to either Cartesian or the realspace laboratory frame as the `laboratory frame', when the correct choice is clear by context. The diffractometer equation is
where R is the goniostat rotation matrix and A is the crystal setting matrix, while its inverse A^{−1} is referred to as the indexing matrix. The product Ah may be written as r, which is a position in the φaxis frame, a Cartesian frame that coincides with the laboratory frame at a rotation angle of φ = 0. This makes clear that the setting matrix does not change during the course of a rotation experiment (notwithstanding small `misset' rotations).
For an experiment performed using the rotation method, we use φ to refer to the angle about the actual axis of rotation, even when this is effected by a differently labelled axis on the samplepositioning equipment (such as an ω axis of a multiaxis goniometer).
A2. Orthogonalization convention
Following Busing & Levy (1967), we may decompose the setting matrix A into the product of two matrices, conventionally labelled U and B. We name U the orientation matrix and B the reciprocalspace orthogonalization matrix. These names are in common, but not universal, use. In particular, some texts (for example, Paciorek et al., 1999) refer to the product (i.e. our setting matrix) as the `orientation matrix'.
Of these two matrices, U is a pure rotation matrix and is dependent on the definition of the laboratory frame, whilst B is not dependent on this definition. B does depend, however, on a choice of orthogonalization convention, which relates h to a position in the crystalfixed Cartesian system. The basis vectors of this orthogonal Cartesian frame are fixed to the via this convention.
Although there is no single unique way that A may be decomposed into a pair UB, it is always possible to extract the unitcell dimensions irrespective of the orthogonalization conventions, since A^{T}A = B^{T}B , which is the reciprocal metric matrix. The symbolic expression of B is simplest when the crystalfixed Cartesian system is chosen to be aligned with the crystal realspace or reciprocalspace axes. For example, Busing & Levy (1967) use a frame in which the basis vector i is parallel to reciprocallattice vector a*, while j is chosen to lie in the plane of a* and b*. Unfortunately, this convention is then disconnected from the standard realspace orthogonalization convention, usually called the PDB convention (Protein Data Bank, 1992). This standard is essentially universal in crystallographic software for the transformation of fractional crystallographic coordinates to positions in orthogonal space, with units of Å. In particular, it is the convention used in cctbx (GrosseKunstleve et al., 2002). The convention states that the orthogonal coordinate x is determined from a fractional coordinate u by
where the matrix O is the realspace orthogonalization matrix. This matrix transforms to a crystalfixed Cartesian frame that is defined such that its basis vector i is parallel to the realspace lattice vector a, while j lies in the (a, b) plane. The elements of this matrix made explicit in a compact form are
It is desirable to specify our reciprocalspace orthogonalization convention in terms of this realspace orthogonalization convention. Giacovazzo (2002) derives relationships between real and Of particular interest from that text, we have
By analogy, equate x*′ with h and B with M^{−1}. Also equate M^{T} with O and x′ with u. We then see that
where F is designated the realspace fractionalization matrix.
A3. Orientation matrix
The matrix U `corrects' for the orthogonalization convention implicit in the choice of B. As the crystalfixed Cartesian system and the φaxis frame are both orthonormal Cartesian frames with the same scale, it is clear that U must be a pure rotation matrix. Its elements are clearly dependent on the mutual orientation of these frames.
Acknowledgements
The authors would like to thank David Hall for beam time on Diamond beamline I04 for collecting the test data used in this project and Carina Lobley for the preparation of trypsin samples. This development effort is supported by Diamond Light Source, CCP4 and BiostructX project No. 283570 of the EU FP7. NKS acknowledges support from US National Institutes of Health grant GM095887. DIS and GS are supported by the MRC (grant No. G1000099).
References
Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Aishima, J., Owen, R. L., Axford, D., Shepherd, E., Winter, G., Levik, K., Gibbons, P., Ashton, A. & Evans, G. (2010). Acta Cryst. D66, 1032–1035. Web of Science CrossRef CAS IUCr Journals Google Scholar
Anduleit, K., Sutton, G., Diprose, J. M., Mertens, P. P., Grimes, J. M. & Stuart, D. I. (2005). Protein Sci. 14, 2741–2743. Web of Science CrossRef PubMed CAS Google Scholar
Axford, D., Ji, X., Stuart, D. I. & Sutton, G. (2014). Acta Cryst. D70, 1435–1441. Web of Science CrossRef CAS IUCr Journals Google Scholar
Axford, D. et al. (2012). Acta Cryst. D68, 592–600. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bourgeois, D., Nurizzo, D., Kahn, R. & Cambillau, C. (1998). J. Appl. Cryst. 31, 22–35. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bowler, M. W., Guijarro, M., Petitdemange, S., Baker, I., Svensson, O., Burghammer, M., MuellerDieckmann, C., Gordon, E. J., Flot, D., McSweeney, S. M. & Leonard, G. A. (2010). Acta Cryst. D66, 855–864. Web of Science CrossRef CAS IUCr Journals Google Scholar
Brehm, W. & Diederichs, K. (2014). Acta Cryst. D70, 101–109. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bricogne, G. (1986). Proceedings of the EEC Cooperative Workshop on PositionSensitive Detector Software (Phase III), p. 28. Paris: LURE. Google Scholar
Bricogne, G. (1987). Proceedings of the CCP4 Daresbury Study Weekend. Computational Aspects of Protein Crystal Data Analysis, edited by J. R. Helliwell, P. A. Machin & M. Z. Papiz, pp. 120–145. Warrington: Daresbury Laboratory. Google Scholar
Busing, W. R. & Levy, H. A. (1967). Acta Cryst. 22, 457–464. CrossRef IUCr Journals Web of Science Google Scholar
Buts, L., DaoThi, M.H., Wyns, L. & Loris, R. (2004). Acta Cryst. D60, 983–984. Web of Science CrossRef CAS IUCr Journals Google Scholar
Campbell, J. W. (1998). J. Appl. Cryst. 31, 407–413. Web of Science CrossRef CAS IUCr Journals Google Scholar
Cherezov, V., Hanson, M. A., Griffith, M. T., Hilgart, M. C., Sanishvili, R., Nagarajan, V., Stepanov, S., Fischetti, R. F., Kuhn, P. & Stevens, R. C. (2009). J. R. Soc. Interface, 6, S587–S597. Web of Science CrossRef PubMed CAS Google Scholar
Chiu, E., Coulibaly, F. & Metcalf, P. (2012). Curr. Opin. Struct. Biol. 22, 234–240. Web of Science CrossRef CAS PubMed Google Scholar
Coulibaly, F., Chiu, E., Gutmann, S., Rajendran, C., Haebel, P. W., Ikeda, K., Mori, H., Ward, V. K., SchulzeBriese, C. & Metcalf, P. (2009). Proc. Natl Acad. Sci. USA, 106, 22205–22210. Web of Science CrossRef CAS PubMed Google Scholar
Coulibaly, F., Chiu, E., Ikeda, K., Gutmann, S., Haebel, P. W., SchulzeBriese, C., Mori, H. & Metcalf, P. (2007). Nature (London), 446, 97–101. Web of Science CrossRef PubMed CAS Google Scholar
Dauter, Z. (1999). Acta Cryst. D55, 1703–1717. Web of Science CrossRef CAS IUCr Journals Google Scholar
Diederichs, K. & Karplus, P. A. (1997). Nature Struct. Biol. 4, 269–275. CrossRef CAS PubMed Web of Science Google Scholar
Echols, N., Hattne, J., Gildea, R. J., Adams, P. D. & Sauter, N. K. (2012). Comput. Crystallogr. Newsl. 3, 14–17. Google Scholar
Evans, G., Axford, D., Waterman, D. & Owen, R. L. (2011). Crystallogr. Rev. 17, 105–142. Web of Science CrossRef Google Scholar
Evans, P. (2006). Acta Cryst. D62, 72–82. Web of Science CrossRef CAS IUCr Journals Google Scholar
Evans, P. R. & Murshudov, G. N. (2013). Acta Cryst. D69, 1204–1214. Web of Science CrossRef CAS IUCr Journals Google Scholar
Foadi, J., Aller, P., Alguel, Y., Cameron, A., Axford, D., Owen, R. L., Armour, W., Waterman, D. G., Iwata, S. & Evans, G. (2013). Acta Cryst. D69, 1617–1632. Web of Science CrossRef CAS IUCr Journals Google Scholar
Fry, E., Acharya, R. & Stuart, D. (1993). Acta Cryst. A49, 45–55. CrossRef Web of Science IUCr Journals Google Scholar
Gati, C., Bourenkov, G., Klinge, M., Rehders, D., Stellato, F., Oberthür, D., Yefanov, O., Sommer, B. P., Mogk, S., Duszenko, M., Betzel, C., Schneider, T. R., Chapman, H. N. & Redecke, L. (2014). IUCrJ, 1, 87–94. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Giacovazzo, C. (2002). Fundamentals of Crystallography. Oxford University Press. Google Scholar
Gildea, R. & Winter, G. (2014). Semisynthetic MultiLattice Diffraction Data. doi:10.5281/zenodo.10820. Google Scholar
Grimes, J. M., Burroughs, J. N., Gouet, P., Diprose, J. M., Malby, R., Ziéntara, S., Mertens, P. P. & Stuart, D. I. (1998). Nature (London), 395, 470–478. Web of Science CAS PubMed Google Scholar
GrosseKunstleve, R. W., Sauter, N. K. & Adams, P. D. (2004). Acta Cryst. A60, 1–6. Web of Science CrossRef CAS IUCr Journals Google Scholar
GrosseKunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hadfield, A. T., Oliveira, M. A., Kim, K. H., Minor, I., Kremer, M. J., Heinz, B. A., Shepard, D., Pevear, D. C., Rueckert, R. R. & Rossmann, M. G. (1995). J. Mol. Biol. 253, 61–73. CrossRef CAS PubMed Web of Science Google Scholar
Hanson, M. A., Roth, C. B., Jo, E., Griffith, M. T., Scott, F. L., Reinhart, G., Desale, H., Clemons, B., Cahalan, S. M., Schuerer, S. C., Sanna, M. G., Han, G. W., Kuhn, P., Rosen, H. & Stevens, R. C. (2012). Science, 335, 851–855. Web of Science CrossRef CAS PubMed Google Scholar
Hattne, J. et al. (2014). Nature Methods, 11, 545–548. Web of Science CrossRef CAS PubMed Google Scholar
Ji, X., Sutton, G., Evans, G., Axford, D., Owen, R. & Stuart, D. I. (2010). EMBO J. 29, 505–514. Web of Science CrossRef PubMed CAS Google Scholar
Kabsch, W. (1988). J. Appl. Cryst. 21, 67–72. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (2010a). Acta Cryst. D66, 133–144. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kabsch, W. (2010b). Acta Cryst. D66, 125–132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Karplus, P. A. & Diederichs, K. (2012). Science, 336, 1030–1033. Web of Science CrossRef CAS PubMed Google Scholar
Leslie, A. G. & Powell, H. R. (2007). Evolving Methods for Macromolecular Crystallography, edited by R. J. Read & J. L. Sussman, pp. 41–51. Dordrecht: Springer. Google Scholar
Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326. CrossRef CAS PubMed Web of Science Google Scholar
Otwinowski, Z., Minor, W., Borek, D. & Cymborowski, M. (2012). International Tables for Crystallography, Vol. F, edited by E. Arnold, D. M. Himmel & M. G. Rossmann, pp. 282–295. Dordrecht: Kluwer Academic Publishers. Google Scholar
Paciorek, W. A., Meyer, M. & Chapuis, G. (1999). Acta Cryst. A55, 543–557. Web of Science CrossRef CAS IUCr Journals Google Scholar
Paithankar, K. S., Sørensen, H. O., Wright, J. P., Schmidt, S., Poulsen, H. F. & Garman, E. F. (2011). Acta Cryst. D67, 608–618. Web of Science CrossRef CAS IUCr Journals Google Scholar
Parkhurst, J. M., Brewster, A. S., FuentesMontero, L., Waterman, D. G., Hattne, J., Ashton, A. W., Echols, N., Evans, G., Sauter, N. K. & Winter, G. (2014). J. Appl. Cryst. 47, 1459–1465. Web of Science CrossRef CAS IUCr Journals Google Scholar
Pflugrath, J. W. (1997). Methods Enzymol. 276, 286–306. CrossRef CAS Web of Science Google Scholar
Powell, H. R. (1999). Acta Cryst. D55, 1690–1695. Web of Science CrossRef CAS IUCr Journals Google Scholar
Powell, H. R., Johnson, O. & Leslie, A. G. W. (2013). Acta Cryst. D69, 1195–1203. Web of Science CrossRef CAS IUCr Journals Google Scholar
Protein Data Bank (1992). Atomic Coordinate and Bibliographic Entry Format Description. https://www.wwpdb.org/documentation/PDB_format_1992.pdf. Google Scholar
Rossmann, M. G. (2014). IUCrJ, 1, 84–86. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Sauter, N. K., GrosseKunstleve, R. W. & Adams, P. D. (2004). J. Appl. Cryst. 37, 399–409. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sauter, N. K., Hattne, J., GrosseKunstleve, R. W. & Echols, N. (2013). Acta Cryst. D69, 1274–1282. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sauter, N. K. & Poon, B. K. (2010). J. Appl. Cryst. 43, 611–616. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sawaya, M. R. et al. (2014). Proc. Natl Acad. Sci. 111, 12769–12774. Web of Science CrossRef CAS PubMed Google Scholar
Schmidt, S. (2014). J. Appl. Cryst. 47, 276–284. Web of Science CrossRef CAS IUCr Journals Google Scholar
Schreurs, A. M. M., Xian, X. & KroonBatenburg, L. M. J. (2010). J. Appl. Cryst. 43, 70–82. Web of Science CrossRef CAS IUCr Journals Google Scholar
Smith, J. L., Fischetti, R. F. & Yamamoto, M. (2012). Curr. Opinion Struct. Biol. 22, 602–612. Web of Science CrossRef CAS Google Scholar
Song, J., Mathew, D., Jacob, S. A., Corbett, L., Moorhead, P. & Soltis, S. M. (2007). J. Synchrotron Rad. 14, 191–195. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sørensen, H. O., Schmidt, S., Wright, J. P., Vaughan, G., Techert, S., Garman, E. F., Oddershede, J., Davaasambuu, J., Paithankar, K. S., Gundlach, C. & Poulsen, H. F. (2012). Z. Kristallogr., 227, 63–78. Google Scholar
Stellato, F. et al. (2014). IUCrJ, 1, 204–212. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Steller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036–1040. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wang, X. et al. (2012). Nature Struct. Mol. Biol. 19, 424–429. Web of Science CrossRef CAS Google Scholar
Waterman, D. G., Winter, G., Parkhurst, J. M., FuentesMontero, L., Hattne, J., Brewster, A. & Sauter, N. K. (2013). CCP4 Newsl. Protein Crystallogr. 49, 16–19. Google Scholar
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals Google Scholar
Winter, G. (2010). J. Appl. Cryst. 43, 186–190. Web of Science CrossRef CAS IUCr Journals Google Scholar
Winter, G., Lobley, C. M. C. & Prince, S. M. (2013). Acta Cryst. D69, 1260–1273. Web of Science CrossRef CAS IUCr Journals Google Scholar
Winter, G. & McAuley, K. E. (2011). Methods, 55, 81–93. Web of Science CrossRef CAS PubMed Google Scholar
Zhang, Z., Sauter, N. K., van den Bedem, H., Snell, G. & Deacon, A. M. (2006). J. Appl. Cryst. 39, 112–119. Web of Science CrossRef CAS IUCr Journals Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.