## research papers

## The indexing ambiguity in serial femtosecond crystallography (SFX) resolved using an expectation maximization algorithm

^{a}Physics Department, Arizona State University, Tempe, AZ 85282, United States, and ^{b}Beijng Computational Science Research Center, 3 Heqing Rd, Haidian, Beijing 100084, People's Republic of China^{*}Correspondence e-mail: spence@asu.edu

Crystallographic auto-indexing algorithms provide crystal orientations and unit-cell parameters and assign *P*6_{3} (which supports a indexing ambiguity), the method is validated.

Keywords: indexing ambiguity; serial femtosecond crystallography (SFX); XFELs; protein crystallography; expectation maximization algorithm.

### 1. Introduction

An X-ray *et al.*, 2014; Boutet *et al.*, 2012; Chapman *et al.*, 2011; Liu *et al.*, 2013; Redecke *et al.*, 2013). At the same time, SFX has created many challenges for data analysis, since patterns from thousands of nanocrystals of different sizes and orientations must be merged after indexing, and also because of the large shot-to-shot variations in pulse intensity. Here, we discuss one of the most important issues in SFX, namely the indexing ambiguity which occurs when the Bravais symmetry is higher than the space-group symmetry. In its simplest form, the indexing ambiguity arises when a correctly indexed crystal might also be indexed as its twin. We do not suggest here that the nanocrystals themselves are twinned, only that data from different single crystals may be accidently merged in twin-related orientations. Thus, merged data from patterns subject to a index ambiguity are similar to those from physically twinned crystals. The process of resolving this ambiguity has thus been referred to as detwinning.

Auto-indexing software, including the widely used *MOSFLM* (Leslie, 1999), *DIRAX* (Duisenberg, 1992) and *LABELIT* (Sauter *et al.*, 2004) programs, are capable of determining both the unit-cell parameters and the orientation of a crystal, based on Bragg spot locations in a diffraction pattern. Indexing ambiguities are not a severe problem in conventional crystallography, because (i) a series (or even a full set) of diffraction patterns with known relative rotational relations are measured from each crystal; (ii) the reflections measured at each Bragg spot are narrowly distributed and represent the full diffraction intensities, especially if the oscillation or rotation technique is applied during each exposure. In SFX, a crystal is completely destroyed immediately after interacting with the X-ray laser pulse. The rotation of a nanocrystal can be ignored during the femtosecond exposure, so that the recorded intensity covers only a small fraction of the angular width of a reflection generated by mosaicity. This fraction is set by the range of wavelengths within each pulse (typically 0.1% monochromaticity), so that full structure factors cannot be extracted from a single diffraction pattern. The orientational relationship between diffraction patterns is unknown because the crystals are oriented randomly from shot to shot, and the auto-indexing methods cannot distinguish between, for example, twin-related indexing schemes based on Bragg spot locations alone. In such a case, if data were merged based on indexing results, one would expect half the data to be merged incorrectly by chance for the cases where two indexing modes are equivalent, resulting in a merged data set that appears to be from physically twinned crystals. On the other hand, the intensities of full reflections at Bragg spots can provide the extra information needed to resolve this ambiguity.

In SFX indexing a second challenge arises which is that, for mosaic crystals, the measured snapshot XFEL Bragg spots are only partial reflections. This is a result of the collimation and monochromaticity of the X-ray laser, which samples only a fraction of the full angular width of each reflection, which is broadened by mosaicity (Hattne *et al.*, 2014). Mosaicity models differ according to the type of crystal, and a model based, for example, on continuous elastic deformation may apply in some cases [see Snell *et al.* (2003) for a review].

Partiality has been defined in several ways, often very loosely as the angle by which a given diffracted beam misses the exact Bragg condition. More precisely, we will define the partiality of a reflection as the fraction of a full reflection that is recorded in one experimental pattern, with a maximum value of unity corresponding to a full reflection. A different situation arises for the second case of the smallest nanocrystals which consist of a single mosaic block (with possible elastic deformation), smaller than the width of the coherent beam, where the full Fourier transform of the external shape of the nanocrystal (the so-called shape transform) is laid down around every *et al.*, 2011; Spence *et al.*, 2011) and the angular distribution of scattering around the Bragg condition can no longer be modelled as a smooth narrow peak. We may also have a third case, when using the smallest sub-micron diameter XFEL beams, where the beam size may be smaller than one `mosaic block' or isolated nanocrystal. This case of coherent convergent-beam diffraction is treated by Spence *et al.* (2014) and will not be considered here. The fourth case of a large perfect crystal (such as a semiconductor silicon wafer) in which dynamic scattering is important will also not be considered here, since it does not occur in protein crystallography.

Since the intensity variation due to partiality variation may be much greater than that due to changes in the choice of indexing scheme, one has a chicken-and-egg problem, where full reflections are needed to resolve the indexing ambiguity, but these cannot be obtained until indexing is correctly assigned to allow data merging. The method of partial reflection analysis was developed by Rossmann & Erickson (1983) [see also Rossmann *et al.* (1979)] and has been used as a form of post-refinement ever since, in conjunction with modern goniometer-based data collection at synchrotrons. Recently, Brehm & Diederichs (2014) developed algorithms (BD algorithms) to resolve the indexing ambiguity by clustering the patterns based on pairwise similarities, measured using Pearson's Here, we describe an expectation maximization (EM) algorithm to establish the relation between each diffraction pattern and iteratively construct the full correctly indexed three-dimensional diffraction volume. The important difference between the BD and EM algorithms lies in their evaluation functions. EM does not use the pairwise relationship between two sets of partial reflections (corresponding to the information recorded in two diffraction patterns). Rather, it utilizes the relationship between reflections from any pattern and a merged full reflection model, which is built up iteratively using the EM algorithm. This repeated comparison with an iteratively improved model greatly reduces the time to convergence and increases the accuracy of the method. We have implemented this algorithm within the framework of *CrystFEL* (White *et al.*, 2012) and tested it using both simulated patterns and experimental patterns from photosystem I crystals.

### 2. Method

#### 2.1. Existing methods and the Pearson correlation

For two diffraction patterns, Pearson's )

is defined as (Brehm & Diederichs, 2014where {*I _{i}*} and {

*I*} are the intensities measured from the two patterns

_{j}*i*and

*j*, and {

*h*} are the indices for the common reflections. and are the mean intensities calculated using common reflections from the two patterns. Clustering algorithms were then devised to group the patterns into the correct classes to assign indices consistently. Specifically, Brehm and Diederichs demonstrated success in classifying diffraction patterns to indexing modes by mapping the patterns into hyperspace, using the BD algorithm. For any pair of diffraction patterns, two quantities are defined to describe their relationship. One is the Pearson distance, which is defined as (1.0 −

*r*

_{ij}), where

*r*

_{ij}is the Pearson correlation efficient [see equation (1)]. The second relation is the Euclidean distance (or the angle) between these two patterns in the hyperspace where the diffraction patterns are embedded and represented as points (or vectors from the origin, when considering the angles between patterns). In the BD algorithm, the difference between the Pearson distance and the Euclidean distance (or angles) is minimized. This procedure eventually maps closely related patterns to closer locations in hyperspace, thereby separating the indexing modes.

Pairwise correlations between diffraction patterns that record only partial intensities may not be accurate enough to discriminate one indexing mode from the alternatives (especially for the smallest nanocrystals), yet the success of the BD algorithm indicates the feasibility of clustering by exploiting pairwise correlations between patterns. It is also found that the average Pearson correlation between one pattern and all other patterns can be used to resolve the indexing ambiguity, and eventually leads to indexing in a consistent manner. The method described in the following also uses Pearson's *i* and a model consisting of full reflections, {*I*_{full}}, defined as follows:

where *r*_{i} is the and all other symbols are the same as in equation (1). We now describe how to construct the full reflection data set iteratively from the partial reflection data set, using an expectation maximization approach.

#### 2.2. Expectation maximization algorithm and full reflection construction

Full intensities in three-dimensional *I*_{full}} can then be used as a reference to recover orientation information for each diffraction pattern. In practice, {*I*_{full}} is often not known for a macromolecule, and it can only be determined by merging diffraction patterns after recovering the correct orientations or indexing modes. This comes back to our `chicken-and-egg' problem, *i.e.* the `correct indexing and full reflection data set', which is solved using the expectation maximization (EM) method. The EM approach has been applied to a related problem: determining the orientation of scattering patterns from single-particle experiments, and merging the scattered intensity into the three-dimensional scattering intensity volume. We have adapted the implementation of Tegze & Bortel (2012), which utilizes the Pearson correlation coefficients to assign orientations. Details of the EM method and its implementation can be found elsewhere (Loh & Elser, 2009; Tegze & Bortel, 2012). In brief, the algorithm works as follows.

At the *n*th iteration, each two-dimensional experimental diffraction pattern (consisting of partial reflections) is compared with a three-dimensional model of the full reflection intensities on the Correlation coefficients {*r*_{i}^{t}} are computed between each pattern *i* and the model for each indexing mode *t*, where *t* enumerates all possible indexing possibilities (two, for the case of a twin-like ambiguity). The full set of two-dimensional patterns are then merged with each other into a three-dimensional diffraction volume, based on the indexing mode which shows the highest correlation with the current model. This newly merged model is then used as the reference for the next iteration, in which this process is repeated, and this is continued until the merged model converges to a stable solution. The model used for the first iteration consists of random real numbers representing Bragg spot intensities. In this way the number of correlations is greatly reduced from *N*^{n} (where each pattern is compared with all others) to 〈*m*〉*Nn* (through our use of a cumulative model), for *N* patterns and *n* indexing possibilities, where 〈*m*〉 is the expected number of iterations. This is the real power of the EM algorithm and it results from the use of a model where the output from one iteration is fed back to the following iteration to improve the outcome. For the case of *P*6_{3}, there are two indexing modes ({*h*, *k*, *l*} and {}) that cannot be distinguished without considering the full reflection intensity information. The algorithm is outlined in the form of pseudocode in the scheme below and as a flow chart in Fig. 1. Using this algorithm, we aim to solve the following optimization problem

As for the Monte Carlo integration method (Kirian *et al.*, 2010), the intensities from the diffraction patterns are merged into the diffraction volume by averaging multiple measurements of the same Bragg spots, and in this way partiality is dealt with. This means a larger number of measurements (or diffraction patterns) should lead to a more accurate approximation to the real model of full reflections, with the error reducing as the inverse square root of the number of patterns.

#### 2.3. Error metrics

For simulated data, the full reflection model is known and can be used as a reference to check whether the detwinning is successful. It is straightforward to compute the correlation between the detwinned model and the reference model. For the purposes of clarity, we label the indexing mode that is consistent with the reference model as the consistent indexing mode (CIM); the other indexing modes are labelled alternative indexing modes (AIM). For the case where there are multiple AIMs, a subscript can be used, such as AIM_{1}, AIM_{2} *etc*. For the *P*6_{3} there is only one AIM, so we label the two indexing modes as CIM and AIM when comparing merged models with the reference model. Because the model indexed in one mode can be mapped to the other modes, the CIM and AIMs are interchangeable. To facilitate the following discussion, the indexing mode for the target model that yields the best agreement with the reference model is labelled as the CIM. These reference intensities from the known structure and its indexing mode are used only for comparison purposes to monitor errors in the analysis of experimental data, and they are never used as inputs to guide the detwinning at any stage.

For experimental data, we used Photosystem I (PSI) data collected during SFX experiments at the Linac Coherent Light Source (LCLS; Stanford, California, USA). The atomic coordinates of PSI [PDB (Berman *et al.*, 2000) code 1jb0 ; Jordan *et al.*, 2001] were used to compute the theoretical model reflection intensities, which were then used as a reference model to evaluate detwinning accuracy.

The model building process was monitored using a target score, defined as

where *N* is the number of patterns. This score will increase as the merged model is improved on, by incorporating more diffraction patterns into the merged model in a consistent manner. This target score will reach a maximum when all the patterns are correctly indexed.

### 3. Results

#### 3.1. Performance evaluation using experimental data

Photosystem I (PSI) belongs to *P*6_{3}, which has an indexing ambiguity of {*h*, *k*, *l*} and {}, so that there are two ways to merge any two diffraction patterns. A data set for PSI consisting of 17 106 indexed patterns has been collected at LCLS. After auto-indexing analysis, the unit-cell parameters were identified and found to be consistent with the model solved using synchrotron data from macrocrystals (PDB code 1jb0 ). Our EM algorithm was applied to this data set and the output model compared with the theoretical form factors calculated from the known atomic model (PDB code 1jb0 ). The detwinning process is summarized in Fig. 2. The merged model at each iteration was compared with the reference model (theoretical form factors of the PDB structure) at two indexing modes (the consistent and alternative indexing modes, CIM and AIM). After six iterations, the merged model converged to a stable solution. Fig. 2(*a*) shows that the initial correlations are nearly zero because the starting model consists of random positive numbers (hence no correlations are expected). The correlations between the output model from the first iteration and the reference model (1jb0 protein intensity) reached about 0.62 for both indexing modes. The orientations of the 17 106 patterns used in this test are classified into two groups of about the same size after the first iteration. After the second iteration, the correlation coefficients between the merged model and the reference model started to diverge and stabilized at 0.71 and 0.41 for the two indexing modes. The monitoring score [equation (4)] also increased to about 0.42, as shown by the red curve in Fig. 2(*a*). Because the patterns were obtained experimentally, the actual orientations were unknown beforehand, so it is difficult to evaluate the number of patterns that have their orientations recovered correctly. Nevertheless, using the theoretical intensity as a reference, a most likely indexing mode for each pattern can be assigned based on the correlations described in the *Methods*, §2.1, equation (2). Our EM algorithm recovered 94.6% of the diffraction patterns (16 188 out of 17 106) in a manner consistent with the indexing modes assigned using the theoretical model as the reference. In Fig. 2(*b*), an intensity slice through the plane *l* = 0 in is used to demonstrate the differences between the merged intensities before and after detwinning. For clarity, only the first quadrant with *h* < 20, *k* < 20 is shown. As mentioned, for the *P*6_{3} the indexing ambiguity occurs for {*h*, *k*, *l*} and {}. In the plane *l* = 0, a symmetric distribution is expected for the twinned intensity, because the {*h*, *k*, 0} and {*k*, *h*, 0} are equivalent before resolving the indexing ambiguity. This symmetry is clearly observed in the merged data from the original experimental data set (top of Fig. 2*b*), while the bottom of Fig. 2(*b*) shows the result of the detwinning, showing that the artificial symmetry due to effects is not present after applying our EM detwinning algorithm.

#### 3.2. Performance evaluation using simulated data

To quantify the performance of the algorithm further, two types of simulation were carried out. The first type of simulation was conducted following the same approach as that of Brehm & Diederichs (2014). The theoretical intensities were computed using the *sfall* program in the *CCP4 package* (Winn *et al.*, 2011), based on PSI in *P*6_{3} with unit-cell parameters from the PDB model (PDB code 1jb0 ). The simulated patterns were based on experimental data collected at the LCLS, as described in the previous section. For each pattern, the Bragg peaks were re-indexed following either indexing mode with equal probability. The experimental intensities were replaced with the calculated values, and the partial reflections were modelled using partialities drawn randomly from a uniform distribution (from 0 to 1). Shot noises, following the were added to the simulated intensities. For the simulated data set, where information about the indexing mode for each diffraction pattern is known, the performance of the algorithm can be evaluated directly by counting the number of correctly recovered indexing modes. The results from a typical test run are summarized in Fig. 3, showing that the vast majority of the patterns have been indexed consistently with respect to the reference model.

The detwinning process is similar for the case of the experimental data set discussed in §3.1. The starting model was not correlated with the reference model in either indexing mode, as indicated by a zero Then, after the first iteration, the patterns fall into two classes based on the likelihood of being in each indexing mode and half of the patterns are correctly assigned in this iteration, as expected. As the iteration continues, the indexing ambiguity breaks down, reflected by the growing percentage of correctly recovered indexing modes. The target score increases and reaches a stable value of about 0.85 after a few iterations. The correlation between the merged model and the reference model (*i.e.* the theoretical intensity calculated from 1jb0 ) is also plotted in Fig. 3. The merged model is compared with the reference model at the two indexing modes for *P*6_{3}. As the detwinning progresses, the information embedded in the diffraction patterns is incorporated into the merged model, the correlation of which with the reference model increases to 0.71 for the CIM and 0.69 for the AIM after the first iteration (Fig. 3). This initial small propensity towards one indexing mode over the other leads to an eventual breakdown of the indexing ambiguity. We note that the between the reference and merged models increases to 0.88 after six iterations in the CIM, while the coefficient for the model in the AIM gradually reduces to about 0.40.

The partiality of intensity is not randomly distributed but depends on many factors, including crystal size, orientation and mosaicity and the monochromaticity of the radiation. Here, we have simplified the problem by assuming that the crystal sizes are about the same, which is achievable using experimental crystal size selection, or data screening. For the second type of simulated dataset, the partiality was modelled using the program *partial_sim* in the *CrystFEL* suite (White *et al.*, 2012). Fig. 4(*a*) shows the relationship between partiality and resolution. Because the Ewald shell is thicker at higher resolution, the larger overlaps between the Ewald shell and the Bragg spots result in better intensity measurement, reflected in larger partialities at higher scattering angles. The performance of the algorithm for a data set of 10 000 patterns compiled using this approach is summarized in Fig. 4(*b*). In that case, the model intensity reconstruction converges faster, and the indexing ambiguity is resolved after just two iterations. The final compared with the theoretical intensity is about 0.99, with only two patterns (out of 10 000) indexed incorrectly.

#### 3.3. Consistency and robustness of the algorithm

The detwinning results are consistent with each other. For the same set of experimental data described in §3.1, the EM algorithm was applied ten times with different randomly generated starting models. The ten detwinning processes are shown in Fig. 5(*a*), indicating the convergence of detwinning within ten iterations. Although the convergence speeds vary, the final results are consistent: (i) the correlations with the reference model computed from the PDB structure are very similar in both indexing modes; (ii) the target scores converge to almost the same value. The consistency between the final models from the ten runs was also evaluated by computing pairwise correlations. It is also worth comparing the EM detwinning results with those from the method implemented in the *CrystFEL* package, the program *ambigator*, which utilizes the average values of the correlations between one pattern and an ensemble composed of other patterns. We also included the merged intensities without detwinning operations in the pairwise comparisons. For these comparisons, the correlations were computed for the subset of reflections within resolution shells between 3 and 5 Å. This subset of reflections contains about 55% of the total reflections (111 452 out of 204 092), well representing the reflections in and reducing the influence of extremely large intensities at low resolutions. The results are summarized in Fig. 5(*b*), where the correlation between each model pair is computed twice, considering the two indexing modes: one for the CIM and the other for the AIM. The lower triangular matrix shows the correlations between the models when they are indexed in the same modes (CIM), and the upper triangular matrix contains the correlations between models that are in alternative indexing modes (AIM). The heat map shows that the ten models, when indexed in the same modes, have very large correlations (>0.90), indicating good self-consistency. On the other hand, the correlations between models are smaller than 0.55 when they are indexed in different modes. The results from the *ambigator* program are also strongly correlated with all ten models obtained from the EM algorithm, with correlations from 0.90 to 0.95. By contrast, the twinned intensities are not strongly correlated with the detwinned results. Based on the correlations between the twinned model and all other detwinned results, the two indexing modes are barely distinguishable for the twinned model (with correlations between 0.70 and 0.80), as shown in the heat map (Fig. 5*b*), where the last row and the last column correspond to correlations between detwinned results and the twinned data for the two indexing modes.

The EM algorithm is fast and has the advantage of linear scaling with respect to the number of patterns. It took about 18 min to process the experimental data set composed of over 17 000 patterns on a MacBook Pro with a 2.8 GHz Intel Core i7 two-core processor. The computation time is plotted as a function of the number of patterns in Fig. 6(*a*), showing a linear dependence relation. The current implementation of the algorithm utilizes a single computing core, and detwinning can be speeded up by parallelizing the correlation calculations that can be distributed to many computing cores when a larger dataset needs to be processed.

The EM algorithm has also been evaluated using smaller data sets and the results suggest that the algorithm is very robust. From an experimental data set composed of over 17 000 diffraction patterns of PSI crystals, subsets of the patterns were randomly selected to form data sets of 500, 1000, 2000, 3000 and 10 000 patterns. The percentage of consistently indexed patterns is summarized in Fig. 6(*b*). When the data set was composed of more than 5000 diffraction patterns, we found that more than about 90% of the patterns were indexed consistently. When the data set was composed of a smaller number of patterns, the algorithm became less reliable, indicated by the smaller percentage and larger fluctuations. This may be because the merged intensities could not accurately approximate the full reflections if the data set were not large enough. This can be understood from the fundamental properties of Monte Carlo integration, which requires sufficient sampling of each Bragg spot. Another extreme case is that the consistently indexed patterns are close to 50% (failed in detwinning) when the number of patterns in the test data set is too small (500 in this test case), as shown in Fig. 6(*b*).

### 4. Discussion and conclusions

We have demonstrated that the expectation maximization (EM) algorithm can be applied to solve the indexing ambiguity problem which arises in serial femtosecond crystallography (SFX) and other X-ray diffraction experiments where `still' reflections are collected without a goniometer, such as the use of a lipid cubic phase nanocrystal injector at a synchrotron. In SFX, the experimental diffraction pattern from a nanocrystal depends on several factors, including crystal orientation, size, mosaicity and shape, the X-ray beam profile, and the interaction region between the crystal and the X-ray pulse (for example, crystals may be only partially illuminated by a highly focused beam). Submicrometre-sized crystals can result in shape-transform effects if the coherent beam is wider than the crystal. The effect of these will be investigated in a subsequent paper devoted to the smallest crystals. The X-ray beam profile varies from shot to shot, but seeding methods can improve the beam quality. In the original Monte Carlo sampling approach for nanocrystals at an XFEL (Kirian *et al.*, 2010), it was expected that this approach would average over all stochastic experimental variables, and this behaviour has been confirmed by the *N*^{−1/2} fall-off in the error of measurement with the number *N* of diffraction patterns. It has been shown that the pairwise correlation between diffraction patterns can be used to break the indexing ambiguity. Although the correlation function calculated from partial intensities may not be highly accurate, the information is sufficient to identify the correct orientation from incorrectly indexed twins. In such operations, qualitative results are critical. As long as the correct indexing mode yields slightly higher correlations compared with incorrect indexing modes, then the correlation-based algorithm should work. This would explain the success of the method for distinguishing the correct indexing mode from its twin in an experimental data set where *r*_{score} is less than 1.0 (*r*_{score} = 1.0 only in the case where full reflections are recorded in each pattern).

The partiality of reflection intensity does not prevent recovery of the correct indexing mode. If the patterns are randomly indexed in two possible twin-related modes for photosystem I crystals, the merged (twinned) full intensity will have a *a*).

Using patterns simulated using the *partial_sim* program with the same parameters that were used to generate the data set in Fig. 4, we conducted a comparison between two measures: (i) using correlations between all patterns and a randomly selected pattern (*i.e.* each is taken between two sets of partial reflections), and (ii) using correlations between patterns and full intensities (the correlation between partial and full reflections). The results shown in Fig. 7 indicate that our correlation method based on the EM approach is more reliable for distinguishing indexing modes. As the iteration progresses, the merged intensities move closer to the full intensities and the separation of two indexing modes becomes more pronounced, eventually moving to the state of Fig. 7(*b*).

Real experimental data often suffer from imperfect measurement, such as saturated pixels at very bright spots at low resolution, or low intensity at very high resolutions. Such imperfections introduce errors in the correlations. One strategy to reduce the influence of these saturated pixels and low-signal pixels is to exclude these data points from the correlation calculations. In the current implementation of the EM detwinning program, the very low-resolution data were ignored for correlation calculations, because extremely large values at low resolution will make the correlations less sensitive to the differences due to different indexing modes.

In summary, we have found that an algorithm based on the expectation maximization approach can effectively resolve the indexing ambiguity problem, which is unavoidable in serial femtosecond crystallography for many space groups. The method is implemented within the *CrystFEL* framework, and the source code is available upon request from the authors.

### Acknowledgements

This research was supported by funding from the Human Frontier Science Program (HFSP) (award No. 024940) and the NSF STC (award No. 1231306). The authors acknowledge discussions with Dr T. A. White and Dr N. A. Zatsepin.

### References

Barends, T. R., Foucar, L., Botha, S., Doak, R. B., Shoeman, R. L., Nass, K., Koglin, J. E., Williams, G. J., Boutet, S., Messerschmidt, M. & Schlichting, I. (2014). *Nature*, **505**, 244–247. Web of Science CrossRef CAS PubMed Google Scholar

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalow, I. N. & Bourne, P. E. (2000). *Nucleic Acids Res.* **28**, 235–242. Web of Science CrossRef PubMed CAS Google Scholar

Boutet, S. *et al.* (2012). *Science*, **337**, 362–364. CrossRef CAS PubMed Google Scholar

Brehm, W. & Diederichs, K. (2014). *Acta Cryst.* D**70**, 101–109. Web of Science CrossRef CAS IUCr Journals Google Scholar

Chapman, H. N. *et al.* (2011). *Nature*, **470**, 73–77. Web of Science CrossRef CAS PubMed Google Scholar

Duisenberg, A. J. M. (1992). *J. Appl. Cryst.* **25**, 92–96. CrossRef CAS Web of Science IUCr Journals Google Scholar

Hattne, J. *et al.* (2014). *Nat. Methods*, **11**, 545–548. Web of Science CrossRef CAS PubMed Google Scholar

Jordan, P., Fromme, P., Witt, H. T., Klukas, O., Saenger, W. & Krauss, N. (2001). *Nature*, **411**, 909–917. Web of Science CrossRef PubMed CAS Google Scholar

Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). *Opt. Express*, **18**, 5713–5723. Web of Science CrossRef PubMed Google Scholar

Leslie, A. G. W. (1999). *Acta Cryst.* D**55**, 1696–1702. Web of Science CrossRef CAS IUCr Journals Google Scholar

Liu, W. *et al.* (2013). *Science*, **342**, 1521–1524. Web of Science CrossRef CAS PubMed Google Scholar

Loh, N.-T. D. & Elser, V. (2009). *Phys. Rev. E*, **80**, 026705. Web of Science CrossRef Google Scholar

Redecke, L. *et al.* (2013). *Science*, **339**, 227–230. Web of Science CrossRef CAS PubMed Google Scholar

Rossmann, M. G. & Erickson, J. W. (1983). *J. Appl. Cryst.* **16**, 629–636. CrossRef CAS Web of Science IUCr Journals Google Scholar

Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). *J. Appl. Cryst.* **12**, 570–581. CrossRef CAS IUCr Journals Web of Science Google Scholar

Sauter, N. K., Grosse-Kunstleve, R. W. & Adams, P. D. (2004). *J. Appl. Cryst.* **37**, 399–409. Web of Science CrossRef CAS IUCr Journals Google Scholar

Snell, E. H., Bellamy, H. D. & Borgstahl, G. E. (2003). *Methods Enzymol.* **368**, 268–288. Web of Science CrossRef PubMed CAS Google Scholar

Spence, J. C. H., Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., White, T., Barty, A., Chapman, H. N., Marchesini, S. & Holton, J. (2011). *Opt. Express*, **19**, 2866–2873. Web of Science CrossRef CAS PubMed Google Scholar

Spence, J. C. H., Zatsepin, N. A. & Li, C. (2014). *Philos. Trans. R. Soc. B*, **369**, 20130325. CrossRef Google Scholar

Tegze, M. & Bortel, G. (2012). *J. Struct. Biol.* **179**, 41–45. Web of Science CrossRef CAS PubMed Google Scholar

White, T. A., Kirian, R. A., Martin, A. V., Aquila, A., Nass, K., Barty, A. & Chapman, H. N. (2012). *J. Appl. Cryst.* **45**, 335–341. Web of Science CrossRef CAS IUCr Journals Google Scholar

Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A. & Wilson, K. S. (2011). *Acta Cryst.* D**67**, 235–242. Web of Science CrossRef CAS IUCr Journals Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.