research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

IUCrJ
Volume 7| Part 6| November 2020| Pages 1151-1167
ISSN: 2052-2525

Beyond integration: modeling every pixel to obtain better structure factors from stills

aMolecular Biophysics and Integrated Bioimaging Division (MBIB), Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, bStanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA, and cDepartment of Biochemistry and Biophysics, UC San Francisco, San Francisco, CA 94158, USA
*Correspondence e-mail: dermen@lbl.gov, nksauter@lbl.gov

Edited by F. Maia, Uppsala University, Sweden (Received 3 June 2020; accepted 23 September 2020; online 24 October 2020)

Most crystallographic data processing methods use pixel integration. In serial femtosecond crystallography (SFX), the intricate interaction between the reciprocal lattice point and the Ewald sphere is integrated out by averaging symmetrically equivalent observations recorded across a large number (104−106) of exposures. Although sufficient for generating biological insights, this approach converges slowly, and using it to accurately measure anomalous differences has proved difficult. This report presents a novel approach for increasing the accuracy of structure factors obtained from SFX data. A physical model describing all observed pixels is defined to a degree of complexity such that it can decouple the various contributions to the pixel intensities. Model dependencies include lattice orientation, unit-cell dimensions, mosaic structure, incident photon spectra and structure factor amplitudes. Maximum likelihood estimation is used to optimize all model parameters. The application of prior knowledge that structure factor amplitudes are positive quantities is included in the form of a reparameterization. The method is tested using a synthesized SFX dataset of ytterbium(III) lysozyme, where each X-ray laser pulse energy is centered at 9034 eV. This energy is 100 eV above the Yb3+ L-III absorption edge, so the anomalous difference signal is stable at 10 electrons despite the inherent energy jitter of each femtosecond X-ray laser pulse. This work demonstrates that this approach allows the determination of anomalous structure factors with very high accuracy while requiring an order-of-magnitude fewer shots than conventional integration-based methods would require to achieve similar results.

1. Introduction

The accuracy of structure factor estimation remains a central experimental focus as we approach the tenth anniversary of serial femtosecond X-ray crystallography (SFX) for biological structure determination at X-ray free-electron lasers (XFELs, Chapman et al., 2011[Chapman, H. N., Fromme, P., Barty, A., White, T. A., Kirian, R. A., Aquila, A., Hunter, M. S., Schulz, J., DePonte, D. P., Weierstall, U., Doak, R. B., Maia, F. R. N. C., Martin, A. V., Schlichting, I., Lomb, L., Coppola, N., Shoeman, R. L., Epp, S. W., Hartmann, R., Rolles, D., Rudenko, A., Foucar, L., Kimmel, N., Weidenspointner, G., Holl, P., Liang, M., Barthelmess, M., Caleman, C., Boutet, S., Bogan, M. J., Krzywinski, J., Bostedt, C., Bajt, S., Gumprecht, L., Rudek, B., Erk, B., Schmidt, C., Hömke, A., Reich, C., Pietschner, D., Strüder, L., Hauser, G., Gorke, H., Ullrich, J., Herrmann, S., Schaller, G., Schopper, F., Soltau, H., Kühnel, K., Messerschmidt, M., Bozek, J. D., Hau-Riege, S. P., Frank, M., Hampton, C. Y., Sierra, R. G., Starodub, D., Williams, G. J., Hajdu, J., Timneanu, N., Seibert, M. M., Andreasson, J., Rocker, A., Jönsson, O., Svenda, M., Stern, S., Nass, K., Andritschke, R., Schröter, C., Krasniqi, F., Bott, M., Schmidt, K. E., Wang, X., Grotjohann, I., Holton, J. M., Barends, T. R. M., Neutze, R., Marchesini, S., Fromme, R., Schorb, S., Rupp, D., Adolph, M., Gorkhover, T., Andersson, I., Hirsemann, H., Potdevin, G., Graafsma, H., Nilsson, B. & Spence, J. C. H. (2011). Nature, 470, 73-77.]). Ultrafast X-ray pulses from such facilities provide unique opportunities to investigate functional, room-temperature protein states, while probing enzyme dynamics on time scales from femtoseconds to milliseconds (Alonso-Mori et al., 2016[Alonso-Mori, R., Asa, K., Bergmann, U., Brewster, A. S., Chatterjee, R., Cooper, J. K., Frei, H. M., Fuller, F. D., Goggins, E., Gul, S., Fukuzawa, H., Iablonskyi, D., Ibrahim, M., Katayama, T., Kroll, T., Kumagai, Y., McClure, B. A., Messinger, J., Motomura, K., Nagaya, K., Nishiyama, T., Saracini, C., Sato, Y., Sauter, N. K., Sokaras, D., Takanashi, T., Togashi, T., Ueda, K., Weare, W. W., Weng, T. C., Yabashi, M., Yachandra, V. K., Young, I. D., Zouni, A., Kern, J. F. & Yano, J. (2016). Faraday Discuss. 194, 621-638.]; Pande et al., 2016[Pande, K., Hutchison, C. D. M., Groenhof, G., Aquila, A., Robinson, J. S., Tenboer, J., Basu, S., Boutet, S., DePonte, D. P., Liang, M., White, T. A., Zatsepin, N. A., Yefanov, O., Morozov, D., Oberthuer, D., Gati, C., Subramanian, G., James, D., Zhao, Y., Koralek, J., Brayshaw, J., Kupitz, C., Conrad, C., Roy-Chowdhury, S., Coe, J. D., Metz, M., Xavier, P. L., Grant, T. D., Koglin, J. E., Ketawala, G., Fromme, R., rajer, V., Henning, R., Spence, J. C. H., Ourmazd, A., Schwander, P., Weierstall, U., Frank, M., Fromme, P., Barty, A., Chapman, H. N., Moffat, K., van Thor, J. J. & Schmidt, M. (2016). Science, 352, 725-729.]; Thomaston et al., 2017[Thomaston, J. L., Woldeyes, R. A., Nakane, T., Yamashita, A., Tanaka, T., Koiwai, K., Brewster, A. S., Barad, B. A., Chen, Y., Lemmin, T., Uervirojnangkoorn, M., Arima, T., Kobayashi, J., Masuda, T., Suzuki, M., Sugahara, M., Sauter, N. K., Tanaka, R., Nureki, O., Tono, K., Joti, Y., Nango, E., Iwata, S., Yumoto, F., Fraser, J. S. & DeGrado, W. F. (2017). Proc. Natl Acad. Sci. USA, 114, 13357-13362.]; Stagno et al., 2017[Stagno, J. R., Liu, Y., Bhandari, Y. R., Conrad, C. E., Panja, S., Swain, M., Fan, L., Nelson, G., Li, C., Wendel, D. R., White, T. A., Coe, J. D., Wiedorn, M. O., Knoska, J., Oberthuer, D., Tuckey, R. A., Yu, P., Dyba, M., Tarasov, S. G., Weierstall, U., Grant, T. D., Schwieters, C. D., Zhang, J., Ferré-D'Amaré, A. R., Fromme, P., Draper, D. E., Liang, M., Hunter, M. S., Boutet, S., Tan, K., Zuo, X., Ji, X., Barty, A., Zatsepin, N. A., Chapman, H. N., Spence, J. C. H., Woodson, S. A. & Wang, Y. (2017). Nature, 541, 242-246.]; Tosha et al., 2017[Tosha, T., Nomura, T., Nishida, T., Saeki, N., Okubayashi, K., Yamagiwa, R., Sugahara, M., Nakane, T., Yamashita, K., Hirata, K. et al. (2017). Nat. Commun. 8, 1-9.]; Nogly et al., 2018[Nogly, P., Weinert, T., James, D., Carbajo, S., Ozerov, D., Furrer, A., Gashi, D., Borin, V., Skopintsev, P., Jaeger, K., Nass, K., Bath, P., Bosman, R., Koglin, J., Seaberg, M., Lane, T., Kekilli, D., Brunle, S., Tanaka, T., Wu, W., Milne, C., White, T., Barty, A., Weierstall, U., Panneels, V., Nango, E., Iwata, S., Hunter, M., Schapiro, I., Schertler, G., Neutze, R. & Standfuss, J. (2018). Science, 361, eaat0094.]; Kern et al., 2018[Kern, J., Chatterjee, R., Young, I. D., Fuller, F. D., Lassalle, L., Ibrahim, M., Gul, S., Fransson, T., Brewster, A. S., Alonso-Mori, R., Hussein, R., Zhang, M., Douthit, L., de Lichtenberg, C., Cheah, M. H., Shevela, D., Wersig, J., Seuffert, I., Sokaras, D., Pastor, E., Weninger, C., Kroll, T., Sierra, R. G., Aller, P., Butryn, A., Orville, A. M., Liang, M., Batyuk, A., Koglin, J. E., Carbajo, S., Boutet, S., Moriarty, N. W., Holton, J. M., Dobbek, H., Adams, P. D., Bergmann, U., Sauter, N. K., Zouni, A., Messinger, J., Yano, J. & Yachandra, V. K. (2018). Nature, 563, 421-425.]; Nango et al., 2019[Nango, E., Kubo, M., Tono, K. & Iwata, S. (2019). Appl. Sci. 9, 5505.]; Dasgupta et al., 2019[Dasgupta, M., Budday, D., de Oliveira, S. H. P., Madzelan, P., Marchany-Rivera, D., Seravalli, J., Hayes, B., Sierra, R. G., Boutet, S., Hunter, M. S., Alonso-Mori, R., Batyuk, A., Wierman, J., Lyubimov, A., Brewster, A. S., Sauter, N. K., Applegate, G. A., Tiwari, V. K., Berkowitz, D. B., Thompson, M. C., Cohen, A. E., Fraser, J. S., Wall, M. E., van den Bedem, H. & Wilson, M. A. (2019). Proc. Natl Acad. Sci. USA, 116, 25634-25640.]; Ibrahim et al., 2020[Ibrahim, M., Fransson, T., Chatterjee, R., Cheah, M. H., Hussein, R., Lassalle, L., Sutherlin, K. D., Young, I. D., Fuller, F. D., Gul, S., Kim, I. S., Simon, P. S., de Lichtenberg, C., Chernev, P., Bogacz, I., Pham, C. C., Orville, A. M., Saichek, N., Northen, T., Batyuk, A., Carbajo, S., Alonso-Mori, R., Tono, K., Owada, S., Bhowmick, A., Bolotovsky, R., Mendez, D., Moriarty, N. W., Holton, J. M., Dobbek, H., Brewster, A. S., Adams, P. D., Sauter, N. K., Bergmann, U., Zouni, A., Messinger, J., Kern, J., Yachandra, V. K. & Yano, J. (2020). Proc. Natl Acad. Sci. USA, 117, 12624-12635.]), yet largely avoiding radiation damage (Chapman et al., 2014[Chapman, H. N., Caleman, C. & Timneanu, N. (2014). Philos. Trans. R. Soc. B, 369, 20130313.]; Spence, 2017[Spence, J. C. H. (2017). IUCrJ, 4, 322-339.]; Fransson et al., 2018[Fransson, T., Chatterjee, R., Fuller, F. D., Gul, S., Weninger, C., Sokaras, D., Kroll, T., Alonso-Mori, R., Bergmann, U., Kern, J., Yachandra, V. K. & Yano, J. (2018). Biochemistry, 57, 4629-4637.]). New protein science discoveries commonly arise at the extreme limit of what the signal-to-noise of the diffraction data can support, as illustrated by our recent experiences with photosystem II (PSII, Kern et al., 2018[Kern, J., Chatterjee, R., Young, I. D., Fuller, F. D., Lassalle, L., Ibrahim, M., Gul, S., Fransson, T., Brewster, A. S., Alonso-Mori, R., Hussein, R., Zhang, M., Douthit, L., de Lichtenberg, C., Cheah, M. H., Shevela, D., Wersig, J., Seuffert, I., Sokaras, D., Pastor, E., Weninger, C., Kroll, T., Sierra, R. G., Aller, P., Butryn, A., Orville, A. M., Liang, M., Batyuk, A., Koglin, J. E., Carbajo, S., Boutet, S., Moriarty, N. W., Holton, J. M., Dobbek, H., Adams, P. D., Bergmann, U., Sauter, N. K., Zouni, A., Messinger, J., Yano, J. & Yachandra, V. K. (2018). Nature, 563, 421-425.]; Ibrahim et al., 2020[Ibrahim, M., Fransson, T., Chatterjee, R., Cheah, M. H., Hussein, R., Lassalle, L., Sutherlin, K. D., Young, I. D., Fuller, F. D., Gul, S., Kim, I. S., Simon, P. S., de Lichtenberg, C., Chernev, P., Bogacz, I., Pham, C. C., Orville, A. M., Saichek, N., Northen, T., Batyuk, A., Carbajo, S., Alonso-Mori, R., Tono, K., Owada, S., Bhowmick, A., Bolotovsky, R., Mendez, D., Moriarty, N. W., Holton, J. M., Dobbek, H., Brewster, A. S., Adams, P. D., Sauter, N. K., Bergmann, U., Zouni, A., Messinger, J., Kern, J., Yachandra, V. K. & Yano, J. (2020). Proc. Natl Acad. Sci. USA, 117, 12624-12635.]). There, the electron density revealed small time-dependent changes, including the appearance of a single substrate oxygen atom at the catalytic site against the backdrop of a 23-polypeptide protein complex. Efforts to assign rigorous uncertainties in atomic positions (Ibrahim et al., 2020[Ibrahim, M., Fransson, T., Chatterjee, R., Cheah, M. H., Hussein, R., Lassalle, L., Sutherlin, K. D., Young, I. D., Fuller, F. D., Gul, S., Kim, I. S., Simon, P. S., de Lichtenberg, C., Chernev, P., Bogacz, I., Pham, C. C., Orville, A. M., Saichek, N., Northen, T., Batyuk, A., Carbajo, S., Alonso-Mori, R., Tono, K., Owada, S., Bhowmick, A., Bolotovsky, R., Mendez, D., Moriarty, N. W., Holton, J. M., Dobbek, H., Brewster, A. S., Adams, P. D., Sauter, N. K., Bergmann, U., Zouni, A., Messinger, J., Kern, J., Yachandra, V. K. & Yano, J. (2020). Proc. Natl Acad. Sci. USA, 117, 12624-12635.]) showed significant structural changes, yet a clear desire remained to utilize the weaker data at the limiting resolution in order to gain further atomic insights. Ultimately, our interest lies in single-electron transfers at the four Mn positions of the PSII catalytic cofactor. Spatially resolving the K absorption edge individually for each Mn center has the potential to elucidate the electronic environment of each Mn atom (Sauter et al., 2020[Sauter, N. K., Kern, J., Yano, J. & Holton, J. M. (2020). Acta Cryst. D76, 176-192.]). Such challenging measurements would require quantifying structure factor intensities between Friedel pairs [(|F_h|^2\; versus\; |F_{\bar h}|^2]) and among different states at the 1% level of uncertainty.

Despite this experimental need for accurate interpretation of weak signals, a close look at data analysis pipelines has revealed stubborn and inherent difficulties specific to XFEL diffraction. In general terms, the structure factor amplitude is proportional to the square root of the Bragg spot intensity recorded for Miller index h = (h, k, l),

[|F_{\bf h}| \propto \left(I_{\bf h} \right)^{1/2}, \eqno (1)]

with the proportionality modulated by factors such as incident X-ray intensity, crystal size, polarization and absorbance (Darwin, 1922[Darwin, C. G. (1922). London Edinb. Dubl. Philos. Mag. J. Sci. 43, 800-829.]; Holton & Frankel, 2010[Holton, J. M. & Frankel, K. A. (2010). Acta Cryst. D66, 393-408.]). In particular, the interplay of incident beam divergence, X-ray spectrum and crystal mosaicity produces a distribution of diffracted intensities around regions in reciprocal space which satisfy Bragg's condition. The success of the rotation method of data acquisition (Arndt & Wonacott, 1977[Arndt, U. W. & Wonacott, A. J. (1977). The Rotation Method in Crystallography. Amsterdam: North-Holland.]), long practiced at synchrotron sources, rests on the ability to fully rotate the crystal through the angular range of this distribution, termed the `rocking curve', while summing the diffracted intensity. In this way, the spectral shape and the crystal's mosaic disorder do not contribute to the measurement error, as they are integrated out. Other factors, such as the intensity profile of the incident beam and the size and shape of the illuminated volume vary smoothly with the crystal rotation, hence scaling virtually eliminates errors due to these effects (Evans & Murshudov, 2013[Evans, P. R. & Murshudov, G. N. (2013). Acta Cryst. D69, 1204-1214.]). In contrast, the lack of finite rotation during femtosecond exposure gives rise to partial observations that sample each Bragg spot's rocking curve at a single position, meaning the sources of error described above all contribute to the uncertainty in SFX data. As expressed in the work by Kirian et al. (2010[Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). Opt. Express, 18, 5713-5723.]), for SFX, averaging repeated measurements of the same Miller indices across different diffraction patterns can minimize these uncertainties, assuming each measurement samples a rocking curve at random. However, when considering an SFX dataset in its entirety, duplicate measurements of a given Miller index form a skewed distribution peaking near zero, resulting from the many partial observations of the rocking curve tails that are low-photon count and contribute mostly noise, making it difficult to derive an `average' intensity value that accurately represents |Fh|2; see especially Figs. 1 and 5 in the work by Sharma et al. (2017[Sharma, A., Johansson, L., Dunevall, E., Wahlgren, W. Y., Neutze, R. & Katona, G. (2017). Acta Cryst. A73, 93-101.]) and Fig. 8 in the work by Kroon-Batenburg et al. (2015[Kroon-Batenburg, L. M. J., Schreurs, A. M. M., Ravelli, R. B. G. & Gros, P. (2015). Acta Cryst. D71, 1799-1811.]). Further contributing to noise, the rocking curves sampled in each shot are slight variations of one another, owing to the stochastic nature of XFEL pulses and the morphological variations amongst measured crystals.

Indeed, Kirian et al. (2010[Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). Opt. Express, 18, 5713-5723.]) acknowledges that the baseline technique of simply averaging duplicate measurements, the so-called `Monte Carlo' approach, requires 104–105 individual diffraction patterns for the crystallographic R factor to converge; the 2010 paper thus ushered in ten years of technology development to identify improved algorithms. Most of these algorithms derive a physical model of the data, scored by one of three metrics: the ability to predict Bragg spots in the correct position, the self-consistency of equivalent Bragg spot intensities and, ultimately, the focus in our paper, the ability to predict not only the position of spots, but their size, shape and intensity profile.

Regarding the first metric, the ability to predict spots close to their observed positions requires knowledge of the unit-cell parameters and the crystal orientation in order to select those reciprocal lattice points in the diffracting condition (`on the Ewald sphere'), and also to know the mosaic parameters (mosaic domain size and mosaic rotational spread) to determine which lattice points offset from the Ewald sphere can still generate reflections (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]). Parameter refinement can optimize the positional match of data and model, which results in more accurate structure factors (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]; Yefanov et al., 2015[Yefanov, O., Mariani, V., Gati, C., White, T. A., Chapman, H. N. & Barty, A. (2015). Opt. Express, 23, 28459-28470.]; Waterman et al., 2016[Waterman, D. G., Winter, G., Gildea, R. J., Parkhurst, J. M., Brewster, A. S., Sauter, N. K. & Evans, G. (2016). Acta Cryst. 72, 558-575.]; White et al., 2016[White, T. A., Mariani, V., Brehm, W., Yefanov, O., Barty, A., Beyerlein, K. R., Chervinskii, F., Galli, L., Gati, C., Nakane, T., Tolstikova, A., Yamashita, K., Yoon, C. H., Diederichs, K. & Chapman, H. N. (2016). J. Appl. Cryst. 49, 680-689.]; Brewster et al., 2018[Brewster, A. S., Waterman, D. G., Parkhurst, J. M., Gildea, R. J., Young, I. D., O'Riordan, L. J., Yano, J., Winter, G., Evans, G. & Sauter, N. K. (2018). Acta Cryst. D74, 877-894.]).

To examine the self-consistency of symmetrically equivalent Bragg spot intensities, efforts have been made to estimate the `partiality' of each spot, i.e. a per-reflection scale factor that properly accounts for Ewald offset, where lattice points farther from ideal diffraction (and thus weaker) get multiplied by a larger factor to put duplicate observations on the `full spot' scale. Several programs have emerged to treat the problem of partiality, with cxi.merge (Sauter, 2015[Sauter, N. K. (2015). J. Synchrotron Rad. 22, 239-248.]), prime (Uervirojnangkoorn et al., 2015[Uervirojnangkoorn, M., Zeldin, O. B., Lyubimov, A. Y., Hattne, J., Brewster, A. S., Sauter, N. K., Brunger, A. T. & Weis, W. I. (2015). eLife, 4, e05421.]) and nXDS (Kabsch, 2014[Kabsch, W. (2014). Acta Cryst. D70, 2204-2216.]) using the cell and mosaicity parameters already mentioned, paired with a monochromatic beam; whereas ccpxfel (Ginn et al., 2016[Ginn, H. M., Evans, G., Sauter, N. K. & Stuart, D. I. (2016). J. Appl. Cryst. 49, 1065-1072.]) and partialator (White, 2014[White, T. A. (2014). Philos. Trans. R. Soc. B, 369, 20130330.]) offer a polychromatic model intended to model the spectral width produced by the self-amplified spontaneous emission (SASE) obtained at XFEL sources (Emma et al., 2004[Emma, P., Bane, K., Cornacchia, M., Huang, Z., Schlarb, H., Stupakov, G. & Walz, D. (2004). Phys. Rev. Lett. 92, 074801.]; Margaritondo & Ribic, 2011[Margaritondo, G. & Rebernik Ribic, P. (2011). J. Synchrotron Rad. 18, 101-108.]). All of these programs employ post-refinement, iteratively optimizing the parameters such that multiple partiality-corrected measurements of the same Bragg reflection yield the most consistent integrated intensity values after scaling.

Finally, the program EVAL exemplifies profile fitting (Duisenberg et al., 2003[Duisenberg, A. J. M., Kroon-Batenburg, L. M. J. & Schreurs, A. M. M. (2003). J. Appl. Cryst. 36, 220-229.]; Schreurs et al., 2010[Schreurs, A. M. M., Xian, X. & Kroon-Batenburg, L. M. J. (2010). J. Appl. Cryst. 43, 70-82.]; Kroon-Batenburg et al., 2015[Kroon-Batenburg, L. M. J., Schreurs, A. M. M., Ravelli, R. B. G. & Gros, P. (2015). Acta Cryst. D71, 1799-1811.]), utilizing a detailed physical description (source divergence, source dispersion, crystal size, mosaic block size and mosaic rotational spread) to faithfully model the size, shape and intensity profile of each Bragg spot. Along with other diffraction modeling programs such as SIM_MX (Diederichs, 2009[Diederichs, K. (2009). Acta Cryst. D65, 535-542.]), pattern_sim (White et al., 2012[White, T. A., Kirian, R. A., Martin, A. V., Aquila, A., Nass, K., Barty, A. & Chapman, H. N. (2012). J. Appl. Cryst. 45, 335-341.]) and nanoBragg (Holton et al., 2014[Holton, J. M., Classen, S., Frankel, K. A. & Tainer, J. A. (2014). FEBS J. 281, 4046-4060.]; Lyubimov et al., 2016[Lyubimov, A. Y., Uervirojnangkoorn, M., Zeldin, O. B., Zhou, Q., Zhao, M., Brewster, A. S., Michels-Clark, T., Holton, J. M., Sauter, N. K., Weis, W. I. & Brunger, A T. (2016). eLife, 5, e18740.]), EVAL allows us to explore how each of these physical parameters influences the appearance of Bragg spots (see also Nave, 1998[Nave, C. (1998). Acta Cryst. D54, 848-853.], 2014[Nave, C. (2014). J. Synchrotron Rad. 21, 537-546.]), whereas the EVAL paper goes further and uses the profile-derived corrections in a simplex minimization to simultaneously optimize unit cells, crystal orientations and per-shot scale factors, achieving an optimal set of corrections for already-integrated spots.

Conventional data processing involves a two-step approach, first measuring Bragg spots on individual images, and then scaling and merging them together. For the work presented in this paper, determination of optimal structure factors |Fh|, given the data, occurred in a single step, guided by a likelihood target function dependent on crystal orientations, unit cells, scale factors, mosaic parameters, photon spectra and, importantly, a starting list of structure factors provided by conventional data processing. The likelihood target received contributions from pixels across all images,1 with computation of per-pixel probabilities facilitated by the forward modeling program nanoBragg. A new program, diffBragg, provided the first derivatives of the forward model with respect to the various parameters, allowing effective navigation of the likelihood gradient in order to deduce the multi-parameter maximum likelihood. In using a likelihood formalism at the pixel level, we utilized an explicit error model for each pixel derived from first principles, hence optimization of the errors, which depend on the model, occurred at each refinement step. Also, the joint refinement of all parameters correctly accounted for covariance. For example, structure factor amplitudes and per-shot scale factors both directly increase the intensities of spots; however, in the global treatment, other shots measuring the same reflections constrain the structure factors. The integration methods described above all reduce the complicated spot profiles to single numbers; however, in the diffBragg approach, each pixel in the spot profile is tied directly to the model, significantly increasing the number of observations used during model optimization, and leading to higher parameter accuracy from fewer overall shots.

Pixel-level refinement can resolve extremely sensitive details, such as the dispersion corrections arising from two differently oxidized metal atoms, as shown with synthetic data in the work by Sauter et al. (2020[Sauter, N. K., Kern, J., Yano, J. & Holton, J. M. (2020). Acta Cryst. D76, 176-192.]). Having potential access to this level of detail from Bragg scattering opens up new and exciting experimental avenues to consider as the next decade of SFX science begins. Here, using the newly developed tool diffBragg on synthetic data, we further advanced the pixel refinement approach to extract more information from fewer recorded shots: we optimized a set of 13 704 structure factor amplitudes using 2023 XFEL diffraction patterns to a comparable accuracy to that achieved from conventional integration of 19 953 diffraction patterns, and we assumed no prior knowledge other than a positivity restraint applied to the structure factors. Furthermore, with the new protocol we combined positional refinement and post-refinement in a single framework. Sections 2.1[link] and 2.2[link] below describe the creation of synthetic data with nanoBragg; Section 2.3[link] introduces the new framework, diffBragg, for iterative parameter fitting. Section 3 details the improvement afforded by the new method, beyond the initial inputs from the program dials.stills_process [Brewster et al. (2016[Brewster, A. S., Waterman, D. G., Parkhurst, J. M., Gildea, R. J., Michels-Clark, T. M., Young, I. D., Bernstein, H. J., Winter, G., Evans, G. & Sauter, N. K. (2016). Comput. Crystallogr. Newslett. 7, 32-53.]), see also Brewster, Young et al. (2019[Brewster, A. S., Young, I. D., Lyubimov, A., Bhowmick, A. & Sauter, N. K. (2019). Comput. Crystallogr. Newslett. 10, 22-39.]) for a recent description of the graphical user interface], which represent conventional data analysis. The results presented here required significant computational resources to achieve; however, GPU-acceleration, a near-term goal, will help in this regard (see Section 3.2.3[link]). In addition to setting the ground work for using first-derivatives to perform more complicated refinements, this work reveals the potential to extract sensitive information from fewer recorded shots.

2. Methods

2.1. Components of synthetic data

To test the approach, we synthesized realistic lysozyme Yb3+ XFEL diffraction images with a mean XFEL pulse energy of 9034 eV, just above the Yb3+ L-III absorption edge (Fig. 1[link]). We chose an anomalous dataset as a stringent test, as anomalous differences are highly sensitive to errors in structure factor estimation.

[Figure 1]
Figure 1
Anomalous dispersion curve for ytterbium and representative XFEL pulse spectra. Shown by the solid gray line is the measured ytterbium 3+ anomalous correction term [imaginary component; Shapiro et al. (1995[Shapiro, L., Fannon, A. M., Kwong, P. D., Thompson, A., Lehmann, M. S., Grübel, G., Legrand, J.-F., Als-Nielsen, J., Colman, D. R. & Hendrickson, W. A. (1995). Nature, 374, 327-337.])]. This curve was used in the data synthesis to compute the anomalous protein structure factors. For each synthesized diffraction pattern, we used a unique XFEL pulse, with the mean energy spectrum given by the dashed orange line. The spectrum of each XFEL pulse was randomly selected from a set of 100 000 real SASE spectra that were measured at the Linac Coherence Light Source (LCLS) and shifted to a high-energy remote for ytterbium L-III absorption (9034 eV). A single SASE spectrum is also shown for reference, illustrating the energy variation and intensity variation observed on a per-shot basis at the XFEL. At this energy the anomalous scattering is essentially constant within the range of the SASE energy variation. If the experiment had been done with the XFEL tuned to the absorption edge (8944 eV), then each diffraction pattern would have a varying level of anomalous scattering as per the f′′ curve.
2.1.1. Protein

We used the program CCTBX [Computational Crystallography Toolbox, Grosse-Kunstleve et al. (2002[Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126-136.])] to generate structure factors [see equation 1 of Sauter et al. (2013[Sauter, N. K., Hattne, J., Grosse-Kunstleve, R. W. & Echols, N. (2013). Acta Cryst. D69, 1274-1282.]) for details] from PDB entry 4bs7, a room temperature lysozyme derivative structure with two Yb3+ sites (Pinker et al., 2013[Pinker, F., Brun, M., Morin, P., Deman, A.-L., Chateaux, J.-F., Oliéric, V., Stirnimann, C., Lorber, B., Terrier, N., Ferrigno, R. & Sauter, C. (2013). Cryst. Growth Des. 13, 3333-3340.]). Wavelength-independent scattering factors for all atoms were calculated using Cromer–Mann coefficients [as tabulated in the work by Brown et al. (2006[Brown, P. J., Fox, A. G., Maslen, E. N., O'Keefe, M. A. & Willis, B. T. M. (2006). International Tables for Crystallography, Vol. C, 1st online ed., Section 6.1.1, pp. 554-590. Chester: International Union of Crystallography. ])], and anomalous scattering factor corrections f′ and f′′ for all protein atoms were computed using the Henke tables (Henke et al., 1993[Henke, B. L., Gullikson, E. M. & Davis, J. C. (1993). At. Data Nucl. Data Tables, 54, 181-342.]). However, for ytterbium we specifically used f′ and f′′ from measured data (Shapiro et al., 1995[Shapiro, L., Fannon, A. M., Kwong, P. D., Thompson, A., Lehmann, M. S., Grübel, G., Legrand, J.-F., Als-Nielsen, J., Colman, D. R. & Hendrickson, W. A. (1995). Nature, 374, 327-337.]; Hendrickson & Ogata, 1997[Hendrickson, W. A. & Ogata, C. M. (1997). Methods Enzymol. 276, 494-523. ]). Fig. 1[link] shows the experimentally determined Yb profile of f′′, with a magnitude at the high-energy remote (9034 eV) of 10 electrons (Fig. 1[link]).

2.1.2. Crystal

We synthesized data from tetragonal lysozyme, with the unit cell a = b = 79.1, c = 38.4 Å. Each synthetic crystal had mosaic domains consisting of 1000 unit cells (10 along each crystal lattice axis). The crystal mosaic spread was computed by averaging scattering contributions from 100 equivalent mosaic domains. The misorientation of each mosaic domain with respect to the nominal orientation was taken from a normal distribution with a mean of 0° and standard deviation of 0.01° degrees, forming a mosaic texture. The mosaic texture was the same for all synthetic crystals, and the value 0.01° was assumed to be a typically observed value in real crystals (Bellamy et al., 2000[Bellamy, H. D., Snell, E. H., Lovelace, J., Pokross, M. & Borgstahl, G. E. O. (2000). Acta Cryst. D56, 986-995.]). For each shot, the total scattering from this mosaic average was multiplied by a random scale factor Z drawn from a normal distribution with mean μZ = 1150 and standard deviation σZ = 115 about the mean. This scale factor Z is the total number of mosaic domains in the crystal. The reason for using only 100 mosaic orientations for spread (instead of Z) was computational expediency. Random variation in the scale factor for each crystal was used to mimic variation in exposed crystal volume expected during each shot at the XFEL. The value 1150 was chosen by hand such that the contrast between low- and high-resolution spots was typical. We represented the crystal using a standard matrix convention (Busing & Levy, 1967[Busing, W. R. & Levy, H. A. (1967). Acta Cryst. 22, 457-464.]), where each crystal orientation is represented by two matrices: an upper-triangular matrix B, whose columns specify the lattice basis vectors in an aligned orientation defined according to the convention of Arndt & Wonacott (1977[Arndt, U. W. & Wonacott, A. J. (1977). The Rotation Method in Crystallography. Amsterdam: North-Holland.]), and a three-parameter rotation matrix Us which moves the crystal from its aligned position into its observed position in shot s. For the tetragonal system

[{\bf{B}} = \left [{\matrix{ a & 0 & 0 \cr 0 & a & 0 \cr 0 & 0 & c \cr } } \right], \eqno (2)]

where a and c are the real-space unit-cell edge lengths.

2.1.3. Background scattering

We synthesized the diffraction akin to that measured under vacuum at the Coherent X-ray Imaging (CXI, Liang et al., 2015[Liang, M., Williams, G. J., Messerschmidt, M., Seibert, M. M., Montanez, P. A., Hayes, M., Milathianaki, D., Aquila, A., Hunter, M. S., Koglin, J. E., Schafer, D. W., Guillet, S., Busse, A., Bergan, R., Olson, W., Fox, K., Stewart, N., Curtis, R., Miahnahri, A. A. & Boutet, S. (2015). J. Synchrotron Rad. 22, 514-519.]) instrument at the Linac Coherent Light Source (LCLS), employing a gas dynamic virtual nozzle (GDVN) for sample delivery (DePonte et al., 2008[DePonte, D. P., Weierstall, U., Schmidt, K., Warner, J., Starodub, D., Spence, J. C. H. & Doak, R. B. (2008). J. Phys. D Appl. Phys. 41, 195505.]). The GDVN uses water pressure on a piston to force sample through a capillary and out of a nozzle to produce a liquid jet of sample in the interaction region. We assumed the jet was focused to a 5 µm diameter by a helium gas sheath. For background scattering we modeled the irradiated water volume as that of a cylinder: 5 µm in length multiplied by the beam spot size, a circle with a diameter of 1 µm. We neglected scattering from the helium sheath, and otherwise the path between the interaction region and detector was assumed to be a vacuum.

2.1.4. Beam and detector

We synthesized measurements from a 32-panel Cornell–Stanford Pixel Array Detector (CSPAD) as set up at CXI (Hart et al., 2012[Hart, P., Boutet, S., Carini, G., Dubrovin, M., Duda, B., Fritz, D., Haller, G., Herbst, R., Herrmann, S., Kenney, C., Kurita, N., Lemke, H., Messerschmidt, M., Nordby, M., Pines, J., Schafer, D., Swift, M., Weaver, M., Williams, G., Zhu, D., Van Bakel, N. & Morse, J. (2012). Proc. SPIE, 8504, 85040C.]), where each panel consists of 185 × 388 pixels, each 109.92 µm × 109.92 µm in size. We used a 2 × 2 pixel-oversampling rate to minimize aliasing errors from Bragg peaks whose signal might change significantly across the physical pixel dimension. We used a realistic three-dimensional CSPAD geometry obtained from CXI, such that the individual panels had relative rotations [see Brewster et al. (2018[Brewster, A. S., Waterman, D. G., Parkhurst, J. M., Gildea, R. J., Young, I. D., O'Riordan, L. J., Yano, J., Winter, G., Evans, G. & Sauter, N. K. (2018). Acta Cryst. D74, 877-894.]) for a detailed description]. We defined a `pixel measurement' by both its position on the X-ray detector and the diffraction event it represents. Therefore, we let each pixel measurement be Xi,s where s refers to an X-ray event, or shot, and i is an index specifying the pixel position in the detector pixel array. The index i is equivalent to a triple index (panel, fast, slow) where panel is the CSPAD panel ID (0–31), fast is the panel fast-scan pixel coordinate (0–184) and slow is the panel slow-scan pixel coordinate (0–387). The CSPAD was placed 124 mm from the interaction region, giving a corner resolution of 1.7 Å; however, we only analyzed scattering out to 2.1 Å. For each synthesized diffraction event we used a unique SASE input spectrum that was a scaled and shifted version of real spectra recorded during an LCLS experiment (run 16, proposal number LS49) using an upstream spectrometer (Zhu et al., 2012[Zhu, D., Cammarata, M., Feldkamp, J. M., Fritz, D. M., Hastings, J. B., Lee, S., Lemke, H. T., Robert, A., Turner, J. L. & Feng, Y. (2012). Appl. Phys. Lett. 101, 034103.]). Fig. 1[link] shows a representative XFEL pulse spectrum used to generate synthetic data, as well as the average spectrum. We assumed a uniform flux of photons with an average of 8 × 1010 photons per pulse, though the total fluence was different for each synthesized shot. We ignored beam divergence effects, as these are typically small at XFELs.

2.2. nanoBragg: computing synthetic data

We used the program nanoBragg to compute, for each pixel during each shot, the expected Bragg scattering Ii,s,data and background scattering Ti,s,data, and also to apply a realistic noise model.

2.2.1. nanoBragg: generating Bragg scattering

To compute the Bragg scattering measured by each pixel, nanoBragg applies the kinematic theory where the expected number of Bragg-scattered photons in each pixel is the product of the incident X-ray fluence with the scattering cross section of the crystal and the solid angle ΔΩi of the pixel. In what follows, re is the classical electron radius, κi is the Kahn polarization factor for scattered light from a pre-polarized incident source (Azároff, 1955[Azároff, L. V. (1955). Acta Cryst. 8, 701-704.]; Kahn et al., 1982[Kahn, R., Fourme, R., Gadet, A., Janin, J., Dumas, C. & André, D. (1982). J. Appl. Cryst. 15, 330-337.]), Js(λ) is the fluence in photons per area at wavelength λ, Zs is the crystal scale factor randomly sampled for each shot from a normal distribution [{\cal N}\left({\mu = 1150,\,\sigma = 115} \right)], |Fh| is the structure factor amplitude of the protein in each unit cell and I0(Δhi,j,s, m) is the interference factor arising from the periodicity in the crystal lattice. The term Δhi,j,s is the Ewald offset at the pixel determined from the orientation of mosaic block j, and m3 is the total number of unit cells in the mosaic domain. Nj is the number of orientations used to form a mosaic texture, the scattering power of which is then normalized by Nj and scaled up by Zs for total scattering equivalent to a crystal that is made up of Zs mosaic blocks. For the synthetic data used here, the expected Bragg scattering was computed according to

[I_{i,s,{\rm data}} = \mathop \sum \limits_\lambda \,J_s\left(\lambda \right)\left[{{{Z_s} \over {{N_j}}}\mathop \sum \limits_{j = 1}^{{N_j}} \,r_e^2{{\left| {F_{\bf h}} \right|}^2}{I_0}\left({\Delta {\bf h}_{i, j,s},m} \right)\,{\kappa _i}} \right]{{\Delta}}{{{\Omega}}_i}, \eqno (3)]

where the expression inside the square brackets is the scattering cross section of the whole crystal for photons of wavelength λ. The interference term I0(Δhi,j,s, m) should be maximal when the pixel is exactly probing the diffraction condition (Δhi,j,s = 0), and it should fall off rapidly as Δhi,j,s increases. We let I0(Δhi,j,s, m) take the Gaussian form

[I_0 \left({\Delta {\bf h}_{i, j,s},m} \right) = {m^6}\exp \left({ - C\,{m^2}{\Delta {\bf h}}_{i, j,s}\cdot\Delta {\bf h}_{i, j,s}} \right), \eqno (4)]

where the constant C was chosen such that the full width at half-maximum of the principal peaks in I0 would be equal to that arising from a parallelepiped mosaic block [see Appendix A[link] for a derivation of equation (4)[link]; in the work here, we used C = 3.175]. The dimensionless Ewald offset Δhi,j,s is simply the residual between the fractional Miller index at each pixel with the nearest integer Miller index

[\Delta {\bf h}_{i, j, s} = {\left({{{\bf U}_{j,s}} {\bf B}} \right)^{\bi T}} {{\bf q}_{i}}(\lambda) - \left [{\matrix{ h \cr k \cr l \cr } } \right]. \eqno (5)]

Here qi(λ) is the momentum transfer from incident beam unit vector [{{\hat{\bf k}}}] to the scattered beam unit vector [{{\hat{\bf k}}_i}] ([{{\hat{\bf k}}_i}] points from the interaction region to the pixel):

[{\bf q}_i(\lambda) = {{\hat{\bf k}_i} - {\hat{\bf k}} \over {\lambda} }. \eqno (6)]

The transpose of UB in equation (5)[link] is necessary as the columns of B are what define the lattice translation vectors. Note the dependence of I0(Δhi,j,s, m) on m; a larger mosaic domain parameter yields a brighter maximum with a more rapid falloff. For the synthetic data, we let m = 10 and Nj = 100. For real data, we expect the true shape of the Bragg peak profiles to be well approximated by Gaussians, or a sum of Gaussians. The parameters describing the synthetic Bragg scattering data are summarized in Table 1[link].

Table 1
Parameters describing the expected number of photons scattered into each pixel

κi Kahn polarization factor
re Classical electron radius
Js(λ) Fluence (photons/area) at photon wavelength λ during shot s
Δhi,j,s Ewald offset of pixel i for scattering from mosaic domain j during shot s, i.e. the residual between the fractional Miller index and the integer Miller index h = h, k, l
I0(Δhi,j,s, m) The interference function arising from the diffraction of the crystal lattice
ΔΩi The solid angle subtended by the pixel
Zs Dimensionless per-shot scale factor accounting for variations in exposed crystal volume for each shot. This represents the total number of mosaic domains in the crystal
m Dimensionless quantity such that m3 specifies the number of unit cells within each mosaic domain. For the synthetic data, we let m = 10, hence we used mosaic blocks consisting of 1000 unit cells
|Fh| Structure factor amplitude of the protein at Miller index h = h, k, l
λ Photon wavelength
Uj,s Misseting matrix of mosaic domain j for the crystal from shot s
B Direct space unit cell matrix for each crystal (same unit cell for each crystal)
Nj Number of mosaic domains modeling texture. For the synthetic data, we let Nj = 100
qi(λ) Momentum transfer vector at pixel i for photons of wavelength λ
2.2.2. nanoBragg: generating background scattering

Background scattering was synthesized by estimating the total number of background molecules in the bulk background-scattering volume. Here, nanoBragg computed the expected scattering for liquid water:

[T_{i,s,{\rm data}} = {V_{{{\rm H}_2}{\rm O}}}{{\rho}_{{{\rm H}_2}{\rm O}}}\left({N_{\rm A}/{u_{{{\rm H}_2}{\rm O}}}} \right)\mathop \sum \limits_\lambda {J_s}\left(\lambda \right)r\,^2_{e}{\left| {{F_{{{\rm H}_2}{\rm O}}}\left({{\bf q}_i} \right)} \right|^2}{\kappa _i}{\rm{\Delta }}{{{\Omega}}_i}, \eqno (7)]

where VH2O, ρH2O and uH2O are the volume, density and molecular weight of water, respectively, NA is Avogadro's constant, and |FH2O(qi)| is the isotropic structure factor amplitude per molecule of liquid water which depends on the scattering angle of the pixel, 2θi = sin−1(qiλ/2). |FH2O(qi)| was measured independently at the Advanced Light Source beamline 8.3.1 (MacDowell et al., 2004[MacDowell, A. A., Celestre, R. S., Howells, M., McKinney, W., Krupnick, J., Cambie, D., Domning, E. E., Duarte, R. M., Kelez, N., Plate, D. W., Cork, C. W., Earnest, T. N., Dickert, J., Meigs, G., Ralston, C., Holton, J. M., Alber, T., Berger, J. M., Agard, D. A. & Padmore, H. A. (2004). J. Synchrotron Rad. 11, 447-455.]) and adjusted to match the absolute calibration in the work by Clark et al. (2010[Clark, G. N. I., Hura, G. L., Teixeira, J., Soper, A. K. & Head-Gordon, T. (2010). Proc. Natl Acad. Sci. USA, 107, 14003-14007.]).

2.2.3. nanoBragg: generating realistic measurement noise

With nanoBragg, we applied three types of measurement noise arising from the inherent randomness of photon counting, signal amplification and detector readout. These noise operations are computed sequentially, modeling the various stages of photon measurement. In the first measurement stage, shot noise produces counting error; the number of photons arriving in the pixel is a random number sampled from a Poisson distribution with mean Ii,s,data + Ti,s,data. In the second stage, the charge produced by detected photons is amplified by each pixel slightly differently. The amplifiers are fixed in the circuitry, however there is always error in their calibration (gain). Here, the photon gain for each pixel was assigned by randomly selecting numbers from a normal distribution with a mean of 28 ADUs per photon and standard deviation equivalent to 3% of the mean, or 0.84 ADUs. These per-pixel gain values are typical for the CSPAD, and once selected they are fixed for all shots. Note that this type of calibration error assumes all pixels are the same physical size; pixel-to-pixel non-uniformity could be computed by adjusting the expected number of photons before computing the Poisson deviates above. Lastly, there is noise associated with detector readout due to electronic switching during the readout event itself and dark-charge accumulation during the exposure. CSPADs are dominated by dark current noise, but here we make no distinction and lump all errors associated with readout into a single Gaussian process. We included readout noise of the detector by adding a random number to the final computed values, where that random number was drawn from a normal distribution with a mean of 0 and a standard deviation of 3 ADUs. We modeled dark-subtracted data, where an average dark signal had already been subtracted from each shot, hence why the readout error fluctuates about 0. Per-pixel readout and counting noise terms fluctuate across every pixel and every shot; however, pixel gain calibration errors, though different for each pixel, are constant for all shots. It is typical to divide the observed pixel values by a nominal gain value for the entire camera (28 in this case) so that a unit pixel increment is the same size as the signal from a single photon, hence we have for the pixel value

[X_{i,s} = \left({N_{i,s}/28} \right){g_i} + r_{i,s}\, ,\eqno (8)]

where the photon count Ni,s is randomly sampled from a Poisson distribution with the expectation value Ii,s,data + Ti,s,data, the pixel gains gi form a normal distribution [{\cal N}({\mu = 1,\sigma = 0.84/28})], and the readout noise ri,s is randomly sampled from a normal distribution [{\cal N}({\mu = 0,\sigma = 3/28})]. For a full description of the noise options available in nanoBragg, see the supplemental material from the work by Holton et al. (2014[Holton, J. M., Classen, S., Frankel, K. A. & Tainer, J. A. (2014). FEBS J. 281, 4046-4060.]).

2.3. Maximum-likelihood structure factor estimation using pixel data

Fig. 2[link] shows several examples of the Bragg reflections from the synthetic diffraction patterns, illustrating the variability of repeated Bragg reflection measurements performed at XFELs (see Fig. S1 of the supporting information for the same images in the absence of noise). This per-shot variation in observations is what makes the analysis of XFEL still shots with conventional protocols subject to large uncertainties. By summing together neighboring Bragg spot pixels, one loses the intricate per-pixel intensity variations that encode the incident photon spectra and crystal morphology. This information could otherwise be used to constrain complicated physical models. Rather than integrate Bragg spots we seek to use their pixelated profiles to optimize a model, thereby disentangling the various contributions to the scattering, and obtaining a more accurate measure of the structure factor amplitudes |Fh|.

[Figure 2]
Figure 2
Bragg spot profile variation observed in synthetic XFEL images. Each row contains five shoeboxes centered on the same Miller index, yet measured on separate shots (i.e. from different crystals). The resolution of each Miller index is shown in the far left of each row. In standard data analysis, integrated signals from shoeboxes with equivalent Miller indices are averaged together, leading to large uncertainties in the resulting structure factor estimates. Signal-to-noise ratio (SNR) estimates for the Bragg reflections inside each shoebox are provided for reference. We computed SNR following the work by Leslie (1999[Leslie, A. G. W. (1999). Acta Cryst. D55, 1696-1702.]) (using a weighted least-squares treatment to propagate the tilt-plane error). Pixels with negative readings are marked with a star. The background scattering was very low in the synthetic data, as we attempted to accurately replicate in-vacuum diffraction from crystals in a 5 µm liquid jet produced by a GDVN (DePonte et al., 2008[DePonte, D. P., Weierstall, U., Schmidt, K., Warner, J., Starodub, D., Spence, J. C. H. & Doak, R. B. (2008). J. Phys. D Appl. Phys. 41, 195505.]). Such low background in combination with per-shot readout errors gave rise to the negative pixels shown here. The gray scale represents Xi,s (observed pixel intensity in units of photons) as defined in equation (8)[link]. All sub-images in a row are on the same scale (indicated by the scale bars on the right in units of photons). Note, the middle row appears to have fewer negative readings, which we expect because it is on the water scattering ring and, as a result, those pixels receive more background scattering. Fig. S1 shows the same set of images before random noise was applied.
2.3.1. Maximum-likelihood target

The goal was to treat our synthetic images as an experimental dataset, and then use the pixel values Xi,s within all the Bragg-spot shoeboxes (Fig. 2[link]) to optimize a global model to obtain increased structure factor accuracy. We accomplished this through iterative parameter estimation, using a maximum-likelihood target. Let p(Xi,s|Θ) represent the probability of observing Xi,s given the full set of model parameters Θ needed to describe the observed pixel values. We will explicitly define Θ in Section 2.3.7[link]. The likelihood of the entire dataset is given by the product of the individual pixel probabilities

[f(\Theta) = \mathop \prod \limits_s \,\mathop \prod \limits_i p{\rm{(}}X_{i,s}|{{\Theta}}), \eqno (9)]

provided we neglect inter-pixel effects such as crosstalk: a fair approximation for the sake of defining an optimization target. Other inter-pixel effects such as point-spread are discussed in Section 4[link], but are not necessarily applicable. The set of parameters ΘML that maximizes the likelihood of the data

[\Theta_{\rm ML} = \mathop {\rm argmax}\limits_{{\Theta}} f(\Theta) \eqno (10)]

is called the maximum-likelihood estimate, or the most probable model, given the data.

2.3.2. Probability of observing pixel values

In order to express the likelihood of the data given by equation (9)[link], we must define p(Xi,s|Θ), the probability of observing Xi,s given a set of model parameters Θ describing the scattering. In this case we know the precise model that was used to generate Xi,s, but the arguments below are general and applicable to real data. In what follows, we let RX represent a random variable for a pixel measurement, where Xi,s is a sample from the distribution governing RX. We assume RX describes an observation after division by the nominal gain, and after subtraction of an average dark pedestal, as shown in equation (8)[link]. Using the algebra of random variables, we can define RX as an expression involving three independent random variables:

[{R_X} = {R_n}{R_g} + {R_r}. \eqno (11)]

Here Rn represents randomness in photon counting, Rg represents randomness in signal amplification and Rr represents randomness in signal readout. The random variable Rn is governed by a Poisson distribution [f_{R_n} = \,{\cal P}(\mu_n)] whose mean μn represents the expectation value for the number of photons captured by the pixel during the diffraction event. We can simplify the statistics by approximating [{\cal P}({\mu_n})] as a normal distribution with equivalent mean and variance: [f_{R_n} \simeq {\cal N}\left[{\mu_n,\sigma_n = (\mu _n)^{1/2} } \right]]. This approximation breaks down when the number of scattered photons approaches 0; however, in this regime we expect the readout term to dominate RX. The random variable Rg arises due to error in detector gain calibration; even though we divide through by the nominal gain value, we do not know the precise gain of each pixel. Therefore, we let [{f_{{R_g}}} = \,{\cal N}\left({{\mu _g} = 1,{\sigma _g}} \right)] be the distribution governing the gain calibration error. One can estimate σg by recording flat illumination on the detector, e.g. scattering from a copper foil far upstream of the detector, but the result is never perfect. The product of the two normally distributed random variables RnRg is in general not a normal random variable, however in certain limits we can make that approximation. Specifically, if we assume that Rn and Rg are independent random variables, then for large μn/σn and μg/σg we can approximate (Seijas-Macías & Oliveira, 2012[Seijas-Macías, A. & Oliveira, A. (2012). Discuss. Math. Probab. Stat. 32, 87-99.])

[\eqalign { {f_{{R_n}{R_g}}} \simeq & \, {\cal N}\left[{{\mu _n}{\mu _g},\left({\mu_g^2\sigma_n^2 + \mu _n^2\sigma _g^2} \right)^{1/2} } \right] \cr = & \, {\cal N}\,\left[{{\mu _n},\left({\mu_n + \mu_n^2\sigma_g^2}\right)^{1/2} } \right] \cr \simeq & \, {\cal N}\left[{{\mu_n},\left({{\mu_n}} \right)^{1/2} } \right] }. \eqno (12)]

Note, we used the fact that μg = 1 and [\sigma _n^2 = {\mu _n}] as shown above. The second approximation in equation (12)[link], stating that [{\mu _n} + {\left({{\mu _n}{\sigma _g}} \right)^2} \simeq {\mu _n}] is valid for small σg, yet becomes worse as μn increases. If [{\mu _n} = 1/\sigma _g^2], the approximation error is equivalent to μn. For σg = 0.03 (0.84/28, which we used for the synthetic data) this occurs when μn = 1.1 × 10 3 photons, and for the work presented here most pixels received far fewer photons. The random variable Rr describes a random offset applied to each pixel measurement which is governed by the underlying electronics of the detector modules, and it follows a normal distribution [{f_{R_r}} = {\cal N}\left({{\mu _r} = 0,{\sigma _r}} \right)]. Usually during an experiment with a CSPAD, a dark measurement is recorded and subtracted from subsequent measurements. For a given exposure time this dark offset is generally stable, but there is always a random component that fluctuates on a shot-to-shot basis. This readout noise will result in a positive or negative offset applied to each pixel, and it is these offsets that are represented by Rr. The value σr is easy to estimate by closing the X-ray shutter and observing the pixel values fluctuating in the absence of X-rays. We used a value of σr = 3/28 throughout the analysis in this paper, in line with equation (8)[link]. With the above definitions we now define the distribution fRX governing RX. This distribution is the convolution of fRnRg and fRr (true for the sum of any two random variables). In the case that both fRnRg and fRr are normal, then fRX will also be normal: [{f_{{R_X}_\,}} = {f_{{R_n}{R_g}}}*{f_{{R_r}}} = \,{\cal N}\left({\,{\mu _n},\,\,{v^{1/2}}\,} \right)], where [v = {\mu _n} + \sigma _r^2] and * is a convolution operator. With this we can express the probability of observing Xi,s photons as

[\eqalign { p\left({{R_X} = X_{i,s}|{{\Theta}}} \right) \equiv &\ p\left({X_{i,s}|{{\Theta}}} \right) \cr = & {1 \over {{{\left[{2\pi {v_{i,s}}(\Theta)} \right]}^{1/2}}}}{\exp}{{ - {{\left[{X_{i,s} - n_{i,s}(\Theta)} \right]}^2}} \over {2{v_{i,s}}(\Theta)}}, } \eqno (13)]

where we used the definition [n_{i,s}(\Theta) \equiv \mu_n] to be index-explicit, and where the model-dependent variance is given by

[v_{i,s}(\Theta) = n_{i,s}(\Theta) + \sigma_r^2. \eqno (14)]

For the remainder of this report, we use ni,s(Θ) to represent the model for the expected number of photons in pixel i during shot s. Note this is different from Ni,s used in equation (8)[link], which is a randomly drawn sample, given an expectation value ni,s(Θ). It is noteworthy that the probability model in equation (13)[link] allows the observed data Xi,s to be negative, something mathematically forbidden when modeling the observations using Poisson statistics alone. In other words, a photon count by itself can never be negative, only when coupled with additional terms such as the readout noise can a pixel report a negative value. Negative pixel values occur in regions of weak scattering where readout noise dominates. This can easily happen and is indeed expected after dark current subtraction for in-vacuum low-background measurements at facilities like CXI (see Fig. 2[link]).

2.3.3. Selecting pixels for maximum-likelihood estimation

In principle one can evaluate the likelihood shown in equation (9)[link] for every pixel in the camera, but for the work shown here we only included a selection of pixels that were expected to be in the vicinity of Bragg scattering, referred to throughout this text as shoeboxes. Fig. 2[link] shows several such shoeboxes. Each synthetic CSPAD diffraction pattern is made up of 32 × 185 × 388 = 2 296 960 pixels, but by limiting the analysis to shoeboxes, we only used an average of 1.35 × 105 pixels per image. This made maximum-likelihood estimation approximately 17 times faster than it would be if including all pixels. Interestingly, we found that over-predicting shoeboxes (in order to guarantee inclusion of all observed spots) did not hurt the refinement. This is a key distinction from integration methods, where over-prediction can be problematic.

2.3.4. Modeling expected photons in each pixel

We modeled ni,s(Θ) as a sum of Bragg scattering Ii,s and background scattering Ti,s:

[n_{i,s}(\Theta) = I_{i,s,{\rm model}} + T_{i,s,{\rm model}}. \eqno (15)]

We will proceed to define the background and Bragg scattering models, after which we will define the full list of refinement parameters (summarized in Table 2[link]). Note the subscript `model' is used to distinguish these expressions from those in equations (3)[link] and (7)[link] describing the synthetic data.

Table 2
Summary of all model parameters that are refined in order to maximize the likelihood of the data

Gs Scale factor for the Bragg scattering observed during shot s
Us Missetting matrix of the crystal measured during shot s
ms Cube root of the number of unit cells in each mosaic domain
Bs Crystal unit-cell matrix for shot s
{|Fh|} Full set of protein structure factor amplitudes
2.3.5. Bragg scattering model

We modeled the Bragg scattering similarly to that shown in equation (3):[link]

[I_{i,s,{\rm model}} ={J_{s,{\rm all}}}{G_s}r\,_e^2{\left| {F_{\bf h}} \right|^2}m_s^6\exp \left( - C\,m_s^2 \Delta{\bf h}_{i,s} \cdot {\Delta {\bf h}}_{i,s} \right)\kappa_i \Delta \Omega_i . \eqno (16)]

Here [J_{s,{\rm all}} = {\textstyle \sum_\lambda} J_s(\lambda)] is the total fluence across all photon energies in the XFEL pulse, and [\Delta{\bf h}_{i,s}] = [{({{\bf U}_s} {{\bf B}_s})^T} {{\bf q}_i}\left({{{\bar \lambda} _s}} \right) - {\bf h}] is computed using the central wavelength [{{\bar \lambda}_s}] of each XFEL pulse and a single mosaic domain at orientation Us. The scale factor Gs relates primarily to the crystal size variation, but other factors can also affect the scale during real measurement. One can use equation (4)[link] directly to model the full energy spectra; however, here we use equation (16)[link] purely for computational efficiency and accept that it is slightly inaccurate (see Section S1 and Fig. S2 of the supporting information). Also, given the relatively small mosaic domain size used for the synthetic data, we assumed mosaicity would dominate the spot profile shapes, as opposed to wavelength dispersion. Even though all crystals have the same mosaic parameter m and unit-cell matrix B, we modeled each crystal as having a unique mosaic parameter ms and unit-cell matrix Bs.

2.3.6. Background scattering model: tilt planes

The measured background scattering arose from the solvent, and we did not model it using first principles. Instead, we fit a linear expression, or tilt plane, to the pixel measurements at the periphery of each Bragg spot shoebox (Rossmann, 1979[Rossmann, M. G. (1979). J. Appl. Cryst. 12, 225-238.]), and the resulting tilt plane was used to evaluate the background intensity under the Bragg peaks. For fitting, we used weighted linear least squares. To obtain each tilt plane we solved the linear system

[\left [{\matrix{ {{x_1}} & {{y_1}} & 1 \cr {{x_2}\,} & {{y_2}} & 1 \cr \, & \vdots & \, \cr {{x_{{N_T}}}} & {{y_{\,{N_T}}}} & 1 \cr } } \right] \left [{\matrix{ {{t_1}} \cr {{t_2}} \cr {{t_3}} \cr } } \right] = \,\left [{\matrix{ {{X_1}} \cr {{X_2}} \cr \vdots \cr {{X_{{N_T}}}} \cr } } \right] \eqno (17)]

for the tilt plane coefficients t = [t1, t2, t3], where xi, yi are the fast-scan, slow-scan coordinates of the NT shoebox pixels selected for the tilt plane fit, and where Xi are the observed values of those pixels. Rewriting equation (17)[link] as At = b, we can then write the solution as [{\bf t} = {\left({{{\bf A}^T}{\bf WA}} \right)^{ - 1}}{{\bf A}^T}{\bf W}{\bf b}] where W is a diagonal matrix whose diagonal entries are the reciprocals of crude variance estimates for each pixel value in b. Given a pixel value Xi, we approximated its variance (for tilt plane fitting purposes only) as [{v_{{\rm i},{\rm crude}}} = {X_i} + \sigma _r^2], where σr is the readout noise standard deviation (3/28 = 0.11 in the synthetic data), and we used this information to compute a signal-to-noise estimate for each Bragg reflection (see Leslie, 1999[Leslie, A. G. W. (1999). Acta Cryst. D55, 1696-1702.]). Fig. 2[link] shows these signal-to-noise estimates for several simulated reflections. Recalling equation (14)[link], it becomes obvious that the estimate vi,crude uses the approximation niXi. In other words, vi,crude approximates an expectation value with a single measurement, providing a suitable guess of the spot variance in the absence of a model. We emphasize that a unique background vector t was computed for each shoebox, that is, for each Bragg spot prediction on each shot. Ideally the pixels used in equation (17)[link] do not include contributions from Bragg scattering. After solving for t, we modeled the background for pixels in the corresponding shoebox as

[T_{i,s,{\rm model}} = \left [{\matrix{ {{t_1}} & {{t_2}} & {{t_3}} \cr } } \right] \left [{\matrix{ {{x_i}} \cr {{y_i}} \cr 1 \cr } } \right]. \eqno (18)]

This linear fit exhibits no curvature and is therefore best applicable to local regions on the detector where the background signal is slowly varying. During the maximum-likelihood parameter optimization, the tilt plane coefficients [t1, t2, t3] were initialized as the solutions to equation (17)[link] and were then fixed. Though the least squares solution is analytical, it is dependent on the proper distinction between background pixels and Bragg pixels. For weakly scattering data, tilt planes can be evaluated at levels close to zero intensity, sometimes giving rise to negative ni,s(Θ), especially for pixels at the shoebox periphery. This is a result of using a non-physical background model, and it poses a risk to violate equation (13)[link] which requires that vi,s(Θ) remain positive. To guard against this occurrence, we filtered all shoeboxes whose tilt planes dipped below 0 for any pixel in the shoebox. This resulted in the removal of an average of 2.3 out of 580 shoeboxes per shot.

2.3.7. Unknown model parameters

We now explicitly define Θ, the list of all unknown model parameters that we determined via maximum likelihood. A parameter can be placed into one of two categories: local or global. A local parameter belongs to a particular XFEL diffraction shot. We let Γs represent the local parameters of shot s, namely the crystal orientation matrix Us, the unit cell matrix Bs, the scale factor Gs and the mosaic domain parameter ms:

[\Gamma_s = \left\{ {{\bf{U}}_s,{\bf{B}}_s,\,G_s,{m_s}} \right\}. \eqno (19)]

In total each Γs represents seven parameters: three Euler angles describing crystal orientation, two unknown unit-cell constants, a single scale factor and a single mosaic parameter. On the other hand, a global parameter is shared across all shots in the experiment. Global parameters in the model were the set of all structure factors, which we refer to here as {|Fh|}. The full list of parameters for Ns diffraction patterns was

[\Theta = \Gamma_1,\Gamma_2,\Gamma_3, \cdots \Gamma_{N_s},\left\{ |F_{\bf h}| \right\}.\eqno (20)]

For the structure factor refinement, we imposed tetragonal symmetry, leading to 13 704 unique structure factor amplitudes out to 2.1 Å.

2.3.8. Optimization using diffBragg

Solving for ΘML given by equation (10)[link] was carried out using the quasi-Newton optimization algorithm L-BFGS (Liu & Nocedal, 1989[Liu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503-528.]) as implemented in CCTBX. L-BFGS requires the first derivative of the likelihood expression f(Θ) in equation (9)[link] with respect to each parameter of interest defined in equation (20)[link]. In practice it is typical to minimize the negative logarithm of the likelihood, both to maintain numerical accuracy when multiplying the large number of probabilities, and to make use of standard minimization algorithms. Therefore, we solved the equation

[\Theta_{\rm ML} = \mathop {{\rm argmin}}\limits_\Theta \left[{ - {\cal L}(\Theta)} \right],\eqno (21)]

where

[\eqalign { {\cal L}(\Theta) & = {\rm{log}}\, f(\Theta) \cr &= \mathop \sum \limits_s \mathop \sum \limits_i \log \left \{ {{1 \over {{{\left[{2\pi {v_{i,s}}(\Theta)} \right]}^{1/2}}}}{\rm{exp}}{{ - {{\left[{X_{i,s} - n_{i,s}(\Theta)} \right]}^2}} \over {2{v_{i,s}}\left({{\Theta}} \right)}}} \right \} \cr & = - {1 \over 2}\mathop \sum \limits_s \mathop \sum \limits_i \left \{ {\log \left[{2\pi {v_{i,s}}(\Theta)} \right] + {{{{\left[{X_{i,s} - n_{i,s}(\Theta)} \right]}^2}} \over {{v_{i,s}}(\Theta)}}} \right \}. } \eqno (22)]

The first derivative of the log-likelihood for a parameter [\theta \in \Theta] (needed for L-BFGS) is then given by [recalling equation (14)[link]]

[\eqalign { {{\partial {\cal L}(\Theta)} \over {\partial \theta }} = & - {1 \over 2}\,\mathop \sum \limits_s \mathop \sum \limits_i {{\partial n_{i,s}(\Theta)} \over {\partial \theta }}\left[{{1 \over {{v_{i,s}}(\Theta)}}} \right] \cr & \times \left \{ {1 - 2\left[{X_{i,s} - n_{i,s}(\Theta)} \right] - {{{{\left[{X_{i,s} - n_{i,s}(\Theta)} \right]}^2}} \over {{v_{i,s}}(\Theta)}}} \right \}. } \eqno(23)]

We developed a new program alongside nanoBragg (dubbed diffBragg) for computing the derivatives of the expected scattering ni,s with respect to each parameter. For example, when computing the derivative of the likelihood expression with respect to the structure factors, we used diffBragg to evaluate

[{{\partial n_{i,s}(\Theta)} \over {\partial \left| {{F_{\bf{h}}}} \right|}} = {{\partial (I_{i,s,{\rm{model}}} + {T_{i,s,{\rm model}}})} \over {\partial \left| {{F_{\bf h}}} \right|}} = {{2I_{i,s,{\rm model}}} \over {\left| {{F_{\bf h}}} \right|}}, \eqno(24)]

which follows from equation (16)[link]. The results were then substituted into equation (23)[link] to compute the gradients of the likelihood expression needed for optimization.

It is important during refinement that the target function be equally sensitive to all parameters. To this end we applied reparameterizations of the form

[x =(1/{\sigma_\theta })(\theta - {\theta _{\rm o}}) + 1, \eqno (25)]

where θo is the initial value of the parameter and σθ represents the expected variation of the parameter during refinement (Hammersley, 2009[Hammersley, A. (2009). FIT2D. ESRF, Grenoble, France.]). With this change of variables, all parameters started with an initial value of 1. If the target equation appeared exceptionally sensitive to certain parameters, the corresponding σθ values were incremented by factors of 10 until we observed the first several L-BFGS iterations updating parameters by sensible amounts. In this specific problem we also applied bound restraints to certain parameters. This was accomplished by reparameterizations of the form

[x = \left({{1 / {{\sigma _\theta }}}} \right)\left [{\ln \left({\theta - {\theta _{{\rm{min}}}}} \right) - \ln ({\theta_{\rm o}} - {\theta _{{\rm min}}}} \right)] + 1 \eqno (26)]

such that the parameter θ will always be greater than θmin ≥ 0. For example, to ensure the structure factor amplitudes remained positive during refinement, we made the substitution [{x_{{F_{\bf{h}}}}} = \left({1/{\sigma _{{F_{\bf{h}}}}}} \right)(\ln | {{F_{\bf h}}} | - \ln | {{F_{{\bf h},{\rm o}}}}} |) + 1]. The parameters [x_{F_{\bf h}}] refined without restraints; however, the resulting [|{F_{\bf h}}| = \exp [({x_{{F_{\bf h}}}} - 1){\sigma _{{F_{\bf h}}}}]| {{F_{{\bf h},{\rm o}}}}|] remained positive quantities. A similar restraint was used on the scales Gs and mosaic parameters ms. With these reparameterizations the updated derivatives of the target equation can be written as

[\eqalign { {{\partial {\cal L}(\Theta)} \over {\partial x}} = & - {1 \over 2}\,\mathop \sum \limits_s \mathop \sum \limits_i {{\partial n_{i,s}(\Theta)} \over {\partial \theta }}{{\partial \theta } \over {\partial x}}\left[{{1 \over {{v_{i,s}}(\Theta)}}} \right] \cr & \times \left \{{1 - 2\left[{X_{i,s} - \,n_{i,s}(\Theta)} \right] - {{{{\left[{X_{i,s} - n_{i,s}(\Theta)} \right]}^2}} \over {v_{i,s}(\Theta)}}} \right\} } , \eqno (27)]

where ∂θ/∂x is computed for each parameter according to the corresponding reparameterization scheme. See Appendix B[link] for derivative expressions of the remaining model parameters. Optimization was carried out using the National Energy Research Scientific Computing Center (NERSC).

3. Results

3.1. Initialization of model parameters

Initial estimates of the orientation matrices Us and the crystal unit-cell matrices Bs were provided by the program dials.stills_process after successfully indexing each diffraction pattern (see Fig. 3[link]) using the algorithm of Steller et al. (1997[Steller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036-1040.]). The output from dials.stills_process also provided an estimate of the mosaic domain size for the measured crystals (Sauter et al., 2014[Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299-3309.]). We used these estimates to construct an initial guess of 13.7 for the mosaic parameters ms. The quantity 13.7 is the median mosaic domain size from dials.stills_process divided by the cubed root of the unit-cell volume, (a2c)1/3. The initial estimates for |Fh| came from running the standard integration-based XFEL merging application in CCTBX (see Table 3[link]), without post-refinement (see Table S1 of the supporting information for a comparison with post-refinement). The scale factors Gs were each initially set to a very high number, in this case 106. This number was chosen to ensure that the model initially predicted finite Bragg scattering in most of the shoeboxes. The background tilt-planes for all shoeboxes were initialized by solving equation (17)[link] for each reflection shoebox.

Table 3
The integration-based merging statistics from CCTBX from 2023 synthetic shots

Resolution bin range (Å) Completeness (%) Multiplicity Measurements I/σI CC1/2 (%) R-split (%)
30.00–4.573 100 104.12 137235 16.65 99.6 6.0
4.573–3.632 100 64.06 83988 13.01 97.7 8.7
3.632–3.174 100 47.26 61766 10.40 97.5 9.9
3.174–2.884 100 44.69 58404 9.29 96.7 11.4
2.884–2.677 100 40.34 53131 8.00 95.5 12.8
2.677–2.519 100 36.01 47250 7.04 93.8 14.8
2.519–2.393 100 37.40 49036 6.87 90.7 15.6
2.393–2.289 100 35.77 47147 5.97 88.6 18.6
2.289–2.200 100 27.90 36324 4.50 79.7 25.2
2.200–2.125 100 23.17 30307 3.23 71.3 34.4
Overall 100 46.11 604588 8.50 99.6 9.1
[Figure 3]
Figure 3
Stage 1 refinement results for 2023 exposures. During stage 1 refinement images are refined separately, resulting in a unique scale factor Gs, mosaic domain parameter ms, unit cell matrix Bs and rotation matrix Us for each crystal. Starting values are shown in red, and optimized values after stage 1 refinement are shown in blue. (a) and (b) Unit cell edges a and c. Initial values come from the DIALS indexing program (Winter et al., 2018[Winter, G., Waterman, D. G., Parkhurst, J. M., Brewster, A. S., Gildea, R. J., Gerstel, M., Fuentes-Montero, L., Vollmar, M., Michels-Clark, T., Young, I. D., Sauter, N. K. & Evans, G. (2018). Acta Cryst. D74, 85-97.]). Despite all crystals having the same unit cell (a = 79.1, c = 38.4 Å), the indexing results returned a distribution of cells. The unit cell a-edge refined to a median value of 79.097 Å, which is 0.003 Å different from the ground truth value. By modeling each shot with its known photon energy spectra, we indeed recovered a value of 79.100 Å for the median a-edge; however, the added accuracy was not worth the computational cost of simulating each energy in the spectrum separately. This is why we elected to use the reduced equation (16)[link] to model the Bragg scattering. (c) Dimensionless mosaic parameters ms. The initial value of 13.7 resulted from analyzing the mosaic domain size distribution deduced by DIALS during indexing. The median value of m after the stage 1 refinement (9.92) differs from the ground truth by 0.08. We hypothesize that this is because the synthetic data included a mosaic texture distribution that increased the size of the Bragg spots in reciprocal space. Smaller values of m correspond to larger Bragg spots, hence the model is likely trying to compensate for the absence of any mosaic texture in equation (16)[link]. (d) Dimensionless scale factors Gs. To compare with the ground truth, all scale factors Gs were divided by a factor k2 where k = 3.68 is the scale factor which minimizes RGT shown in equation (28)[link]. Initially we let all Gs be 106, which when divided by k2 gives a value of 7.4 × 104 as indicated by the red bar. (e) Misorientation of each crystal from its ground truth. During actual XFEL data collection one can never know this quantity, but synthetic data provides a unique opportunity to use this quantity as a proxy for model accuracy. We approach the optimal model when all [\Delta U_s \to 0]. Note, the vertical axes in (c) and (d) and the horizontal axes in (d) and (e) are on logarithmic scales.

3.2. Parameter estimation carried out in stages

Once the parameters were initialized, maximum-likelihood parameter optimization was carried out in two main stages.

3.2.1. Stage 1 refinement

Here we refined shots one at a time in a series of two steps: first, for each shot, we refined the scale Gs and the mosaic parameter ms while keeping all other parameters fixed. We did this using only the low-resolution shoeboxes (less than 5 Å) with signal-to-noise ratio greater than 3. Second, using the optimized values for Gs and ms, the unit cell matrix Bs and the crystal rotation matrix Us were refined while keeping all other parameters fixed. This was done for all spots with signal-to-noise ratio greater than 0.2 and resolution less than 2.1 Å. After this stage we identified images which refined poorly and removed them prior to the global stage 2 refinement. For the 2023 shot example, we removed 25 out of 2048 shots after stage 1 refinement by examining the resulting distributions for all as, cs, ms and Gs (as and cs are the unit-cell constants that fall out of the optimization of each Bs). The results of stage 1 refinement are shown in Fig. 3[link] for 2023 exposures. For reparameterization in this stage of refinement we let σUs = 0.001°, σBs = 0.1 Å, σGs = 1 and σms = 0.1 for all s, where σUs corresponds to the expected variation in the three angles defining the crystal rotation matrices Us and σBs corresponds to the expected variation in the unit-cell edge parameters. Timing statistics for this stage of refinement are shown in Table 4[link].

Table 4
Stage 1 refinement CPU usage at NERSC

  Average processing time per image (core-seconds)
Refinement level Average iterations per image 2.4 GHz Intel Xeon Gold 6148 2.3 GHz Intel Xeon E5-2698 v3
Stage 1, part 1: Gs and ms 17.02 2.12 2.53
Stage 1, part 2: Us and Bs 6.54 7.04 8.42
3.2.2. Stage 2 refinement

Here we refined the structure factor amplitudes |Fh| and the scale factors Gs as part of a global refinement over all images. All other parameters were kept fixed. At the start of this refinement stage, we set all ms equal to the median of the values obtained during stage 1 [see Fig. 3[link](c)]. This was done because the per-shot scales Gs and the per-shot mosaic domain parameters ms are correlated (both directly increase the number of expected photons in shot s). Also, at the start of this stage, all scale factors Gs were set equal to the median of the results obtained for the scale factors during stage 1 refinement [see Fig. 3[link](d)], but then they optimized to different values for each shot. Fig. 4[link] illustrates stage 2 refinement results for the 2023 image set. This stage of refinement utilized all shoeboxes with a signal-to-noise ratio greater than 0.2 and resolution less than 2.1 Å. For reparameterization during this stage of refinement, we let σFh = σGs = 1 for all Miller indices h and shots s.

[Figure 4]
Figure 4
The R factor between refined structure factors and the ground truth (RGT) during stage 2 refinement.
3.2.3. Processing

Standard message passing interface (MPI) was used to accelerate the analysis, parallelizing over images; however, beyond that no attempt was made to optimize the runtime. Timing tests for stage 1 and stage 2 refinement were conducted on a single compute node at NERSC comprising two 2.4 GHz Intel Xeon Gold processors for a total of 40 hardware threads. The complete set of data for all 2023 shots comprises 1.17 × 106 Bragg spot shoeboxes for a total of 2.7 × 108 pixels. For 2023 shots, stage 1 refinement (including input/output overhead) was completed in 12 min, utilizing all 40 hardware threads. Stage 2 refinement ran at a rate of 23.5 s per L-BFGS iteration using all 40 hardware threads (40 MPI ranks), though this number can be decreased by utilizing multiple compute nodes simultaneously. Future work will also utilize GPU-acceleration. The nanoBragg program used to compute the pixel values in equation (3)[link] includes a GPU kernel which for our specific usen case currently offers a 528-fold speed-up over the CPU code, so we expect similar speed-ups for the minimization. Thus far, however, the methods used by diffBragg to compute the log-likelihood and its derivatives have only been written for CPUs.

3.3. Comparison of optimized parameters with ground truth values

We judged the success of the diffBragg maximum-likelihood estimation by comparing refined parameters with the ground truth parameters used to synthesize the data. In this section we discuss three metrics important for judging the success of the refinement, namely the per-shot lattice misorientation with respect to the ground truth, the ground truth R factor and the anomalous difference correlation with the ground truth.

3.3.1. Misorientation ΔUs

A useful metric to observe during optimization is the misorientation ΔUs of the optimized crystal rotation matrices Us from the known crystal rotation matrices used to synthesize each shot. Fig. 3[link](e) shows ΔUs before and after stage 1 refinement. For 2023 exposures, starting with a median ΔUs of 0.038° (as given by the DIALS indexing results), we were able to refine the crystal orientations such that the median ΔUs was 0.0028° [Fig. 3[link](e)].

3.3.2. R factor with ground truth

During optimization we monitored the R factor between the refined structure factor amplitudes and the ground truth (GT) structure factor amplitudes:

[R_{{\rm{GT}}} = {{{\mathop \sum \nolimits_{\bf{h}} \big| {| {{F_{{\bf h,}{\rm GT}}}}| { - k} |F_{\bf{h}}} |} \big|} \over {\mathop \sum \nolimits_{\bf{h}} |{F_{{\bf h},{\rm GT}}}|}}. \eqno (28)]

Here, k is a scale factor chosen to minimize RGT. Fig. 4[link] shows the evolution of RGT throughout stage 2 refinement for different resolution bins. At each point in refinement we computed a new k using the Adaptive Nelder–Mead Simplex method (Gao & Han, 2012[Gao, F. & Han, L. (2012). Comput. Optim. Appl. 51, 259-277.]) as implemented in the SciPy software for Python.

3.3.3. Anomalous difference correlation with ground truth

Because we targeted Yb3+ bound to lysozyme, we expected a strong anomalous component to be present. To this end we observed the correlation of anomalous difference amplitudes [\Delta_{\bf h} = | {{F_{\bf{h}}}} | - | {{F_{ - {\bf{h}}}}} |] with those from the ground truth [{{\Delta}}_{{\bf{h}},{\rm{GT}}} = | {F_{{\bf{h}},{\rm{GT}}}}| - | {F_{ - {\bf{h}},{\rm{GT}}}}|]. Specifically, we computed the Pearson correlation coefficient CCano* between [{{{\Delta}}_{\bf{h}}}] and [{{\Delta}}_{{\bf{h}},{\rm{GT}}}]. The correlation CCano* is discussed in great detail in the work by Terwilliger et al. (2016[Terwilliger, T. C., Bunkóczi, G., Hung, L.-W., Zwart, P. H., Smith, J. L., Akey, D. L. & Adams, P. D. (2016). Acta Cryst. D72, 346-358.]) where it was shown to be directly proportional to the peak height at sites of the absorptive heavy atoms in an anomalous difference density map, making it a good indicator of one's ability to solve a SAD dataset. Note, it is common practice to report CCano when discussing real data where one cannot know the ground truth model. This is accomplished using an empirical relationship outlined in the work by Terwilliger et al. (2016[Terwilliger, T. C., Bunkóczi, G., Hung, L.-W., Zwart, P. H., Smith, J. L., Akey, D. L. & Adams, P. D. (2016). Acta Cryst. D72, 346-358.]). Here, however, we are explicitly computing CCano* according to a ground truth model, hence the use of an asterisk in the defining symbol. Fig. 5[link] shows CCano* versus resolution for both the integration method and the maximum-likelihood method, where it is obvious that 2023 shots using maximum likelihood is comparable to 19 953 shots using integration. Overall values of RGT and CCano* for various trials are shown in Table 5[link], for both the integration method and the maximum-likelihood method. With only 2023 exposures we achieved overall values of RGT and CCano* equal to 4.9% and 79%, respectively. Using the integration approach on the same images we got values of 11.0% and 48.4%, respectively. It is noteworthy that additional cycles of stage 1 and stage 2 provided little improvement beyond the initial cycle (Fig. S3).

Table 5
Overall structure factor quality for various trials (completeness for each merge is 100% out to a maximum resolution of 2.125 Å)

  Integration Maximum likelihood
No. of exposures RGT CCano* RGT CCano* No. of L-BFGS iterations (stage 2)
505 0.140 0.230 0.059 0.479 576
2023 0.110 0.484 0.049 0.790 528
6144 0.105 0.696 0.049 0.904 359
19953 0.104 0.856 N/A N/A N/A
[Figure 5]
Figure 5
Comparison of CCano* obtained during integration (left) with that obtained using the maximum-likelihood approach (right). CCano* is directly proportional to the average peak height in an anomalous difference map at the positions of the heavy atoms which underwent significant X-ray absorption (Terwilliger et al., 2016[Terwilliger, T. C., Bunkóczi, G., Hung, L.-W., Zwart, P. H., Smith, J. L., Akey, D. L. & Adams, P. D. (2016). Acta Cryst. D72, 346-358.]). It is simply the Pearson correlation coefficient between the observed difference structure factor amplitudes [| {F^ + | - |F^ - }|] and the ground truth [| {F_{\rm GT}^ + | - |F_{\rm GT}^ - }|], with the special requirement that FGT only includes anomalous contributions from the absorptive heavy atoms (all other sources of anomalous scattering are therefore considered noise). In the right panel we show the integration result from 19 953 shots (faded gray diamonds) as a comparison with the maximum-likelihood results.

4. Discussion

The ability to accurately determine protein structure factor amplitudes |Fh| in the presence of large experimental uncertainties largely governs the success of an SFX experiment. The maximum-likelihood program presented here, diffBragg, provides a direct protocol for decoupling the various contributions to the scattering which would otherwise obscure the structure factor amplitudes. These noisy contributions include variable per-shot incident photon spectra, crystal morphology and Ewald offset (partiality), all of which can be detrimental in situations involving weak signals, such as anomalous difference amplitudes used for the spatial resolution of heavy atoms (Sauter et al., 2020[Sauter, N. K., Kern, J., Yano, J. & Holton, J. M. (2020). Acta Cryst. D76, 176-192.]), or for experimental phasing (Schlichting, 2017[Schlichting, I. (2017). IUCrJ, 4, 516.]). With 2023 shots we achieved similar quality anomalous differences (revealed by CCano*) to those obtained by the conventional processing of 19 953 shots. Our approach eliminates the two-step process of measuring Bragg spot intensities on individual images, followed by merging. Rather, we refine the structure factors themselves against the raw data, in a restrained manner along with all other model parameters, to arrive at a stable solution. Also, the pixel-based approach provides more terms (one for each pixel) with which to restrain the optimization of the structure factor amplitudes, with the maximum-likelihood based optimization implicitly incorporating an error-based weighting scheme derived from the physical interpretation of signal measurement.

On a single compute node, refinement of 2023 shots took approximately 208 min, including 12 min for per-shot (stage 1) refinement and 196 min for 500 iterations of global (stage 2) refinement running at 23.5 s per iteration (Fig. 4[link] shows how structure factor quality improves with each iteration). On the same compute node, it took 157 s to index and integrate 2023 images with conventional methods, which were used to initialize the maximum-likelihood optimization. Therefore, the strategy going forward is to reduce this 75× wall-clock disparity by applying GPU acceleration to the maximum-likelihood step. We note that several beamline facilities, including LCLS, offer GPU-accelerated servers, with which it may be possible to run conventional integration and diffBragg in parallel with data collection, to better gauge experimental progress. We also plan to incorporate diffBragg as part of a broader effort centered on enabling leadership-class computing for rapid analysis of XFEL data. New compute facilities such as the pre-exascale system at NERSC (Perlmutter) and exascale systems at Argonne National Laboratory and Oak Ridge National Laboratory (Aurora and Frontier, respectively) will provide data processing at speeds to match the increased data collection rates expected for next-generation XFEL facilities (LCLS-II). The need for this level of computing (which includes GPU acceleration) becomes apparent by recalling equation (16)[link], where we replaced the polychromatic spectrum of each shot with a simplified version. By instead including, for example, 100 energy channels for polychromatic model refinement, the 2023 shot refinement would take approximately 100 times longer, or 2350 s per iteration on a single 40-core CPU compute node.

We clarify that both stage 1 and stage 2 modeling are necessary to maximize the information extracted from the data. To illustrate this, we used the results from stage 1 refinement to compute model-derived partiality corrections for each integrated spot. By correcting each integrated spot intensity using these new terms, we were able to increase CCano* beyond that obtained with uncorrected integrated intensities; however, stage 2 refinement consistently yielded the best results (see Section S2 and Table S2). For example, using 2023 shots we were able to boost CCano* to 0.57: an improvement over uncorrected integration, yet still worse than the value of 0.79 obtained with diffBragg.

It is common to model the aggregate effect of energy-bandwidth and mosaicity in a single Gaussian equation (Parkhurst, 2020[Parkhurst, J. M. (2020). Statistically Robust Methods for the Integration and Analysis of X-ray Diffraction Data from Pixel Array Detectors. Doctoral thesis, University of Cambridge.], Chapter 5), yet diffBragg is a general framework. Notably, the work shown is directly applicable to two-color serial diffraction (Hara et al., 2013[Hara, T., Inubushi, Y., Katayama, T., Sato, T., Tanaka, H., Tanaka, T., Togashi, T., Togawa, K., Tono, K., Yabashi, M. & Ishikawa, T. (2013). Nat. Commun. 4, 1-5.]; Lu et al., 2018[Lu, W., Friedrich, B., Noll, T., Zhou, K., Hallmann, J., Ansaldi, G., Roth, T., Serkez, S., Geloni, G., Madsen, A. & Eisebitt, S. (2018). Rev. Sci. Instrum. 89, 063121.]; Lutman et al., 2014[Lutman, A. A., Decker, F.-J., Arthur, J., Chollet, M., Feng, Y., Hastings, J., Huang, Z., Lemke, H., Nuhn, H.-D., Marinelli, A., Turner, J. L., Wakatsuki, S., Welch, J. & Zhu, D. (2014). Phys. Rev. Lett. 113, 254801.]) and pink-beam serial diffraction (Dejoie et al., 2013[Dejoie, C., McCusker, L. B., Baerlocher, C., Abela, R., Patterson, B. D., Kunz, M. & Tamura, N. (2013). J. Appl. Cryst. 46, 791-794.]; Milne et al., 2017[Milne, C. J., Schietinger, T., Aiba, M., Alarcon, A., Alex, J., Anghel, A., Arsov, V., Beard, C., Beaud, P., Bettoni, S., Bopp, M., Brands, H., Brönnimann, M., Brunnenkant, I., Calvi, M., Citterio, A., Craievich, P., Csatari Divall, M., Dällenbach, M., D'Amico, M., Dax, A., Deng, Y., Dietrich, A., Dinapoli, R., Divall, E., Dordevic, S., Ebner, S., Erny, C., Fitze, H., Flechsig, U., Follath, R., Frei, F., Gärtner, F., Ganter, R., Garvey, T., Geng, Z., Gorgisyan, I., Gough, C., Hauff, A., Hauri, C. P., Hiller, N., Humar, T., Hunziker, S., Ingold, G., Ischebeck, R., Janousch, M., Juranić, P., Jurcevic, M., Kaiser, M., Kalantari, B., Kalt, R., Keil, B., Kittel, C., Knopp, G., Koprek, W., Lemke, H. T., Lippuner, T., Llorente Sancho, D., Löhl, F., Lopez-Cuenca, C., Märki, F., Marcellini, F., Marinkovic, G., Martiel, I., Menzel, R., Mozzanica, A., Nass, K., Orlandi, G. L., Ozkan Loch, C., Panepucci, E., Paraliev, M., Patterson, B., Pedrini, B., Pedrozzi, M., Pollet, P., Pradervand, C., Prat, E., Radi, P., Raguin, J.-Y., Redford, S., Rehanek, J., Réhault, J., Reiche, S., Ringele, M., Rittmann, J., Rivkin, L., Romann, A., Ruat, M., Ruder, C., Sala, L., Schebacher, L., Schilcher, T., Schlott, V., Schmidt, T., Schmitt, B., Shi, X., Stadler, M., Stingelin, L., Sturzenegger, W., Szlachetko, J., Thattil, D., Treyer, D. M., Trisorio, A., Tron, W., Vetter, S., Vicario, C., Voulot, D., Wang, M., Zamofing, T., Zellweger, C. & Zennaro, R. (2017). Appl. Sci. 7, 720.]; Martin-Garcia et al., 2019[Martin-Garcia, J. M., Zhu, L., Mendez, D., Lee, M.-Y., Chun, E., Li, C., Hu, H., Subramanian, G., Kissick, D., Ogata, C., Henning, R., Ishchenko, A., Dobson, Z., Zhang, S., Weierstall, U., Spence, J. C. H., Fromme, P., Zatsepin, N. A., Fischetti, R. F., Cherezov, V. & Liu, W. (2019). IUCrJ, 6, 412-425.]) where indexing, refinement and data reduction protocols are in early development (Gevorkov et al., 2020[Gevorkov, Y., Barty, A., Brehm, W., White, T. A., Tolstikova, A., Wiedorn, M. O., Meents, A., Grigat, R.-R., Chapman, H. N. & Yefanov, O. (2020). Acta Cryst. A76, 121-131.]). The work presented here assumed a perfect detector geometry; however, we demonstrated the robustness of diffBragg refinement to typical levels of detector panel displacement (see Section S3, Figs. S4 and S5, and Table S3). We also assumed a detector with minimal pixel crosstalk and a well calibrated linear response, attributes that are realized in current-generation detectors like the ePix (Sikorski et al., 2016[Sikorski, M., Feng, Y., Song, S., Zhu, D., Carini, G., Herrmann, S., Nishimura, K., Hart, P. & Robert, A. (2016). J. Synchrotron Rad. 23, 1171-1179.]), JUNGFRAU (Leonarski et al., 2018[Leonarski, F., Redford, S., Mozzanica, A., Lopez-Cuenca, C., Panepucci, E., Nass, K., Ozerov, D., Vera, L., Olieric, V., Buntschu, D., Schneider, R., Tinti, G., Froejdh, E., Diederichs, K., Bunk, O., Schmitt, B. & Wang, M. (2018). Nat. Methods, 15, 799-804.]) and AGIPD (Allahgholi et al., 2015[Allahgholi, A., Becker, J., Bianco, L., Delfs, A., Dinapoli, R., Goettlicher, P., Graafsma, H., Greiffenberg, D., Hirsemann, H., Jack, S., Klanner, R., Klyuev, A., Krueger, H., Lange, S., Marras, A., Mezza, D., Mozzanica, A., Rah, S., Xia, Q., Schmitt, B., Schwandt, J., Sheviakov, I., Shi, X., Smoljanin, S., Trunk, U., Zhang, J. & Zimmer, M. (2015). J. Instrum. 10, C01023.]). On the other hand, significant point-spread occurs in detectors such as the widely used Rayonix (Holton et al., 2012[Holton, J. M., Nielsen, C. & Frankel, K. A. (2012). J. Synchrotron Rad. 19, 1006-1011.]; Ke et al., 2018[Ke, T.-W., Brewster, A. S., Yu, S. X., Ushizima, D., Yang, C. & Sauter, N. K. (2018). J. Synchrotron Rad. 25, 655-670.]). Using results derived in the work by Holton et al. (2012[Holton, J. M., Nielsen, C. & Frankel, K. A. (2012). J. Synchrotron Rad. 19, 1006-1011.]), we applied a point-spread function to the synthesized data and observed that structure factor optimization could still proceed, provided the point-spread kernel was also applied to the model ni,s(Θ) (see Section S4, and Figs. S6 and S7). Sources of error we have not described include measurement parallax, per-image Debye–Waller factors, intra-crystal unit-cell variation and multiple lattice scattering, all of which contribute to error in the determined structure factors. Thus far we have neglected any mention of structure factor errors, but we know proper error estimation can aid in solving real systems (Brewster, Bhowmick et al., 2019[Brewster, A. S., Bhowmick, A., Bolotovsky, R., Mendez, D., Zwart, P. H. & Sauter, N. K. (2019). Acta Cryst. D75, 959-968.]). In order to obtain error estimates for the structure factor amplitudes in the current context, one can consider the second derivative of the log-likelihood expression evaluated at the maximum-likelihood estimate:

[{\cal F}_{u,v} = - {{\partial^2 {\cal L}(\Theta)} \over {\partial \theta_{\rm u}\partial \theta_{\rm v}}} \bigg|_{\Theta = \Theta_{\rm ML}}. \eqno (29)]

Here, [{\cal F}_{u,v}] is called the Fisher information matrix, [{\cal L}(\Theta)] is the log-likelihood defined in equation (22)[link], and u and v are row and column indices. The variances of the parameters ΘML, including the structure factors {|Fh|}, are given by the diagonal elements of [{\cal F}_{u,v}^{ - 1}] (Pawitan, 2001[Pawitan, Y. (2001). In All Likelihood: Statistical Modelling and Inference using Likelihood. Oxford University Press.]). The matrix [{\cal F}_{u,v}] is large, and inversion should be performed using sparse matrix algebra. Our future efforts will involve implementing this computation.

In summary, we used the program nanoBragg to generate realistic XFEL diffraction images, which we analyzed using both the standard integration protocol and the new program, diffBragg. With diffBragg, we utilized all measured pixels simultaneously to estimate high-accuracy structure factors while requiring an order of magnitude fewer diffraction events compared with the conventional method. Reducing the number of required shots for a dataset can greatly benefit the general SFX experiment, with scarce beam time routinely plagued by unforeseen interruptions, limiting the amount of data one can realistically collect. Future work will aim to apply this method to real measurements and develop a user-friendly application for the general SFX researcher.

6. Software availability

The tools used for diffBragg refinement are included in CCTBX. Scripts for computing and processing the synthetic data are available at https://github.com/dermen/cxid9114.

APPENDIX A

The interference term I0 for a parallelepiped mosaic block consisting of Na × Nb × Nc unit cells stacked along the crystal axes a, b, c is given in the work by James (1962[James, R. W. (1962). The Optical Principles of the Diffraction of X-rays. London: Bell.]),

[I_0\left({{h_f},{k_f},{l_f}} \right) = {{{{\sin }^2}\pi {N_a}{h_f}} \over {{{\sin }^2}\pi {h_f}}}{{{{\sin }^2}\pi {N_b}{k_f}} \over {{{\sin }^2}\pi {k_f}}}{{{{\sin }^2}\pi {N_c}{l_f}} \over {{{\sin }^2}\pi {l_f}}}, \eqno (30)]

where the subscript f is used to indicate a fractional Miller index evaluated at a pixel. I0 exhibits principal maxima at integers h, k, l and subsidiary maxima along lines between h, k, l grid points as observed in the work by Chapman et al. (2011[Chapman, H. N., Fromme, P., Barty, A., White, T. A., Kirian, R. A., Aquila, A., Hunter, M. S., Schulz, J., DePonte, D. P., Weierstall, U., Doak, R. B., Maia, F. R. N. C., Martin, A. V., Schlichting, I., Lomb, L., Coppola, N., Shoeman, R. L., Epp, S. W., Hartmann, R., Rolles, D., Rudenko, A., Foucar, L., Kimmel, N., Weidenspointner, G., Holl, P., Liang, M., Barthelmess, M., Caleman, C., Boutet, S., Bogan, M. J., Krzywinski, J., Bostedt, C., Bajt, S., Gumprecht, L., Rudek, B., Erk, B., Schmidt, C., Hömke, A., Reich, C., Pietschner, D., Strüder, L., Hauser, G., Gorke, H., Ullrich, J., Herrmann, S., Schaller, G., Schopper, F., Soltau, H., Kühnel, K., Messerschmidt, M., Bozek, J. D., Hau-Riege, S. P., Frank, M., Hampton, C. Y., Sierra, R. G., Starodub, D., Williams, G. J., Hajdu, J., Timneanu, N., Seibert, M. M., Andreasson, J., Rocker, A., Jönsson, O., Svenda, M., Stern, S., Nass, K., Andritschke, R., Schröter, C., Krasniqi, F., Bott, M., Schmidt, K. E., Wang, X., Grotjohann, I., Holton, J. M., Barends, T. R. M., Neutze, R., Marchesini, S., Fromme, R., Schorb, S., Rupp, D., Adolph, M., Gorkhover, T., Andersson, I., Hirsemann, H., Potdevin, G., Graafsma, H., Nilsson, B. & Spence, J. C. H. (2011). Nature, 470, 73-77.]). As I0 is the modulus squared of the Fourier transform of all unit-cell origins in the mosaic block, each principal maximum is equal to the squared number of unit cells (NaNbNc)2. One can verify this claim by applying L'Hôpital's rule to find the limit of equation (30)[link] as (hf, kf, lf) tends to (0, 0, 0). We wish to model the principal peaks in I0 using Gaussians, but for that we also have to analyze the full width at half-maximum, W, of the principal peaks. W can be found numerically by solving the transcendental equation

[{{N^2} \over 2} = {{{\sin^2}\pi Nx} \over {{\sin^2}\pi x}} \eqno (31)]

for x = xfwhm along the interval −0.5 < x < 0.5, such that W = 2|xfwhm|. Numerically solving equation (31)[link] for a range of N (5 ≤ N ≤ 100) reveals an inverse N dependence on W, as expected given the physical interpretation of I0 (larger crystals produce smaller reciprocal space peaks):

[W = 0.89052734375 / N. \eqno (32)]

With this we can write the standard deviation of an appropriate Gaussian form as

[{\sigma _W} = {{0.89052734375} \over {2N(2\ln 2)^{1/2} }}, \eqno (33)]

hence, we can approximate

[I_0 \simeq {({{N_a}{N_b}{N_c}})^2}\exp \left({{{ - {{\Delta}}{h^2}} \over {2\sigma _W^2}}} \right)\exp \left({{{ - {{\Delta}}{k^2}} \over {2\sigma _W^2}}} \right)\exp \left({{{ - {{\Delta}}{l^2}} \over {2\sigma _W^2}}} \right), \eqno (34)]

where Δh, Δk, Δl represent the distance from each value hf, kf, lf to its nearest integer value h, k, l (e.g. Δh = hfh). Substituting equation (33)[link] into (34)[link] we arrive at

[I_0 \simeq {\left({{N_a}{N_b}{N_c}} \right)^2}\exp \left({{{ - N_a^2{{\Delta}}{h^2} - N_b^2{{\Delta}}{k^2} - N_c^2{{\Delta}}{l^2}\,} \over {0.286}}} \right). \eqno (35)]

Letting Na = Nb = Nc = m brings us to the form shown in equation (4)[link], with C = 1/0.286 (though we actually used C = 2/0.63 during all of this work as that was the default parameter in nanoBragg). The precise value of the constant in equation (35)[link] can change for different lattices, and one can potentially refine it along with the mosaic parameter m for each lattice in order to obtain a better fit to the data. The Gaussian approximation is useful computationally as it lends itself to simpler derivatives, while also retaining the main properties of the analytical expression in equation (30)[link].

APPENDIX B

Scale factor derivative. We define its derivative as

[{{\partial n_{i,s}(\Theta)} \over {\partial {G_s}}} = {{I_{i,s,{\rm{model}}}} \over {{G_s}}}. \eqno (36)]

The scale factor Gs should always be a positive quantity, therefore to each Gs we apply a reparameterization of the form [x_{G} = \left({1/{{{\sigma}}_{{{{G}}}}}} \right)(\ln G - \ln {G_{\rm o}}) + 1], such that [{{\partial n_{i,s}(\Theta)} / {\partial {x_G}}} = \sigma_G I_{i,s,{\rm model}}] (after applying the chain rule). This ensures the scale never becomes negative during refinement.

Us-matrix derivative. In practice, rather than minimizing the absolute misorientation angles in the matrix Us(ϕx, ϕy, ϕz), we instead define three refinement parameters representing angles about the standard laboratory-frame basis vectors, e.g. for the laboratory (lab) x axis we define the rotation operator

[{{\bf R}_x} = \left [{\matrix{ 1 & 0 & 0 \cr 0 & {{\rm{cos\Delta }}{\phi _{{{x}},{\rm{lab}}}}} & {{\rm{sin\Delta }}{\phi _{{{x}},{\rm{lab}}}}} \cr 0 & { - {\rm{sin\Delta }}{\phi _{{{x}},{\rm{lab}}}}} & {{\rm{cos\Delta }}{\phi _{{{x}},{\rm{lab}}}}} \cr } } \right]. \eqno (37)]

We define similar matrices for the laboratory y and z axes. The matrix Us is then redefined as Us = RxRyRzUs,o where Us,o is the initial unitary matrix determined by the indexing protocol and then held fixed during refinement. We refine the parameters [{{\Delta}}{\phi _{{{x}},{\rm{lab}}}},{{\Delta}}{\phi _{{{y}},{\rm{lab}}}},{{\Delta}}{\phi _{{{z}},{\rm{lab}}}}], after initializing them all to 0. The Ewald offset [{{\Delta {\bf h}}}_{i,s}] is now redefined as [{{\Delta {\bf h}}}_{i,s} = {({{\bf{R}}_x}{{\bf{R}}_y}{{\bf{R}}_z}{\bf{U}}_{s,{\rm{o}}}{{\bf{B}}_s})^T}{{\bf{q}}_i}\left({\bar \lambda} \right) - {\bf{h}}], where h = (h, k, l) is the integer Miller index, and the derivative of the expected scattered photons with respect to the angular offset parameter is given by

[{{\partial n_{i,s}(\Theta)} \over {\partial {{\Delta}}{\phi _{{{x}},{\rm{lab}}}}}} = - 2Cm_s^2\left \{{{{\Delta {\bf h}}}_{i,s} \left[{{{\left({{\bf{R}}_x'{{\bf{R}}_y}{{\bf{R}}_z}{\bf{U}}_{s,{\rm{o}}}{{\bf{B}}_s}} \right)}^T}{{\bf{q}}_i}} \right]} \right\} I_{i,s,{\rm{model}}}, \eqno (38)]

where

[{\bf{R}}_x' = \left [{\matrix{ 0 & 0 & 0 \cr 0 & { - {{\sin\Delta }}{\phi _{{{x}},{\rm{lab}}}}} & {{{\cos\Delta }}{\phi _{{{x}},{\rm{lab}}}}} \cr 0 & { - {{\cos\Delta }}{\phi _{{{x}},{\rm{lab}}}}} & { - {{\sin\Delta }}{\phi _{{{x}},{\rm{lab}}}}} \cr } } \right], \eqno (39)]

and C is the parameter describing the lattice transform (see Appendix A[link]). Similar expressions follow for [{{\Delta}}{\phi _{{{y}},{\rm{lab}}}}] and [{{\Delta}}{\phi _{{{z}},{\rm{lab}}}}]. No restraints were applied to the rotation angles during refinement, hence the reparameterizations used were of the form (1/σΔϕ)Δϕ + 1.

B-matrix derivative. The derivative of the mean scattered photons with respect to the unit-cell matrix is similar to that derived for Us. We show here the case for the unit cell a edge:

[{{\partial n_{i,s}(\Theta)} \over {\partial {{\rm{a}}_{\rm{s}}}}} = - 2Cm_s^2\left \{ {{{\Delta {\bf h}}}_{i,s} \left[{{{\left({{\bf{R}}_x{{\bf{R}}_y}{{\bf{R}}_z}{\bf{U}}_{s,o}{\bf{B'}}} \right)}^T}{{\bf{q}}_i}} \right]} \right \} I_{i,s,{\rm{model}}}, \eqno (40)]

where

[{\bf B'} = \left [{\matrix{ 1 & 0 & 0 \cr 0 & 1 & 0 \cr 0 & 0 & 0 \cr } } \right]. \eqno (41)]

A similar result follows for the c edge. Reparameterization could be applied in order to keep a and c within a valid range; however, we never encountered a situation where a or c diverged. We therefore applied reparameterizations of the form xa = (1/σa)(aao) + 1.

Derivative of mosaic parameter m. The derivative of the mean scattered photons with respect to m is given by

[{{\partial n_{i,s}(\Theta)} \over {\partial {m_s}}} = \left({6 \over {m_s}} - 2C{m_s}{\Delta{\bf h}}_{i,s}\cdot {\Delta {\bf h}}_{i,s}\right)I_{i,s,{\rm model}}. \eqno (42)]

For this parameter we performed a reparameterization to ensure that m was always greater than 3, hence we use the reparameterization xm = (1/σm)[ln(m − 3) − ln(mo − 3)] + 1.

Supporting information


Footnotes

1Recently, `whole pattern' methods have demonstrated the fitting of entire serial diffraction datasets to a global three-dimensional intensity model (Dilanian et al., 2016[Dilanian, R. A., Williams, S. R., Martin, A. V., Stretsov, V. A. & Quiney, H. M. (2016). IUCrJ, 3, 127-138.]; Lan et al., 2018[Lan, T.-Y., Wierman, J. L., Tate, M. W., Philipp, H. T., Martin-Garcia, J. M., Zhu, L., Kissick, D., Fromme, P., Fischetti, R. F., Liu, W., Elser, V. & Gruner, S. M. (2018). IUCrJ, 5, 548-558.]), avoiding the integration step completely. Lan et al. (2018[Lan, T.-Y., Wierman, J. L., Tate, M. W., Philipp, H. T., Martin-Garcia, J. M., Zhu, L., Kissick, D., Fromme, P., Fischetti, R. F., Liu, W., Elser, V. & Gruner, S. M. (2018). IUCrJ, 5, 548-558.]) employed the expand–maximize–compress (EMC) algorithm (Loh & Elser, 2009[Loh, N. D. & Elser, V. (2009). Phys. Rev. E, 80, 026705.]), derived for single-particle imaging, to process serial millisecond crystallography data. By employing the maximum-likelihood approach, they extracted structure factor intensities from weak signals in the context of a stable synchrotron source.

Acknowledgements

We thank Peter Zwart, Jeff Donatelli, Billy Poon, Dorothee Liebschner and Pavel Afonine for useful discussions. Giles Mullen wrote the GPU version of nanoBragg used for simulating data. The synthetic experiment in this paper was modeled after LCLS proposal LD91 conducted in 2014 in collaboration with Stanford faculty Soichi Wakatsuki, Axel Brunger and William Weis, and future work will apply this technique to that data.

Funding information

Research was supported by the National Institutes of Health (grants GM117126 to NKS; GM124149, GM124169, GM103393 and AI150476 to JH; GM110501 to JY; and GM126289 to JK) and the Exascale Computing Project (grant 17-SC-20-SC to NKS), a collaborative effort of the Department of Energy (DOE) Office of Science and the National Nuclear Security Administration. JK and JY were supported by the Director, Office of Science, Office of Basic Energy Sciences (OBES), Division of Chemical Sciences, Geosciences, and Biosciences (CSGB) of the DOE; JH was supported by the Integrated Diffraction Analysis Technologies program of the DOE Office of Biological and Environmental Research (OBER); and data processing was performed at the National Energy Research Scientific Computing Center, supported by the DOE Office of Science; all three under DOE contract DE-AC02-05CH11231. This work made use of the GPU nodes allocated for the NERSC Exascale Science Applications Program (NESAP).

References

First citationAllahgholi, A., Becker, J., Bianco, L., Delfs, A., Dinapoli, R., Goettlicher, P., Graafsma, H., Greiffenberg, D., Hirsemann, H., Jack, S., Klanner, R., Klyuev, A., Krueger, H., Lange, S., Marras, A., Mezza, D., Mozzanica, A., Rah, S., Xia, Q., Schmitt, B., Schwandt, J., Sheviakov, I., Shi, X., Smoljanin, S., Trunk, U., Zhang, J. & Zimmer, M. (2015). J. Instrum. 10, C01023.  Web of Science CrossRef Google Scholar
First citationAlonso-Mori, R., Asa, K., Bergmann, U., Brewster, A. S., Chatterjee, R., Cooper, J. K., Frei, H. M., Fuller, F. D., Goggins, E., Gul, S., Fukuzawa, H., Iablonskyi, D., Ibrahim, M., Katayama, T., Kroll, T., Kumagai, Y., McClure, B. A., Messinger, J., Motomura, K., Nagaya, K., Nishiyama, T., Saracini, C., Sato, Y., Sauter, N. K., Sokaras, D., Takanashi, T., Togashi, T., Ueda, K., Weare, W. W., Weng, T. C., Yabashi, M., Yachandra, V. K., Young, I. D., Zouni, A., Kern, J. F. & Yano, J. (2016). Faraday Discuss. 194, 621–638.  Web of Science CAS PubMed Google Scholar
First citationArndt, U. W. & Wonacott, A. J. (1977). The Rotation Method in Crystallography. Amsterdam: North-Holland.  Google Scholar
First citationAzároff, L. V. (1955). Acta Cryst. 8, 701–704.  CrossRef IUCr Journals Web of Science Google Scholar
First citationBellamy, H. D., Snell, E. H., Lovelace, J., Pokross, M. & Borgstahl, G. E. O. (2000). Acta Cryst. D56, 986–995.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBrewster, A. S., Bhowmick, A., Bolotovsky, R., Mendez, D., Zwart, P. H. & Sauter, N. K. (2019). Acta Cryst. D75, 959–968.  Web of Science CrossRef IUCr Journals Google Scholar
First citationBrewster, A. S., Waterman, D. G., Parkhurst, J. M., Gildea, R. J., Michels-Clark, T. M., Young, I. D., Bernstein, H. J., Winter, G., Evans, G. & Sauter, N. K. (2016). Comput. Crystallogr. Newslett. 7, 32–53.  Google Scholar
First citationBrewster, A. S., Waterman, D. G., Parkhurst, J. M., Gildea, R. J., Young, I. D., O'Riordan, L. J., Yano, J., Winter, G., Evans, G. & Sauter, N. K. (2018). Acta Cryst. D74, 877–894.  Web of Science CrossRef IUCr Journals Google Scholar
First citationBrewster, A. S., Young, I. D., Lyubimov, A., Bhowmick, A. & Sauter, N. K. (2019). Comput. Crystallogr. Newslett. 10, 22–39.  Google Scholar
First citationBrown, P. J., Fox, A. G., Maslen, E. N., O'Keefe, M. A. & Willis, B. T. M. (2006). International Tables for Crystallography, Vol. C, 1st online ed., Section 6.1.1, pp. 554–590. Chester: International Union of Crystallography.   Google Scholar
First citationBusing, W. R. & Levy, H. A. (1967). Acta Cryst. 22, 457–464.  CrossRef IUCr Journals Web of Science Google Scholar
First citationChapman, H. N., Caleman, C. & Timneanu, N. (2014). Philos. Trans. R. Soc. B, 369, 20130313.  Web of Science CrossRef Google Scholar
First citationChapman, H. N., Fromme, P., Barty, A., White, T. A., Kirian, R. A., Aquila, A., Hunter, M. S., Schulz, J., DePonte, D. P., Weierstall, U., Doak, R. B., Maia, F. R. N. C., Martin, A. V., Schlichting, I., Lomb, L., Coppola, N., Shoeman, R. L., Epp, S. W., Hartmann, R., Rolles, D., Rudenko, A., Foucar, L., Kimmel, N., Weidenspointner, G., Holl, P., Liang, M., Barthelmess, M., Caleman, C., Boutet, S., Bogan, M. J., Krzywinski, J., Bostedt, C., Bajt, S., Gumprecht, L., Rudek, B., Erk, B., Schmidt, C., Hömke, A., Reich, C., Pietschner, D., Strüder, L., Hauser, G., Gorke, H., Ullrich, J., Herrmann, S., Schaller, G., Schopper, F., Soltau, H., Kühnel, K., Messerschmidt, M., Bozek, J. D., Hau-Riege, S. P., Frank, M., Hampton, C. Y., Sierra, R. G., Starodub, D., Williams, G. J., Hajdu, J., Timneanu, N., Seibert, M. M., Andreasson, J., Rocker, A., Jönsson, O., Svenda, M., Stern, S., Nass, K., Andritschke, R., Schröter, C., Krasniqi, F., Bott, M., Schmidt, K. E., Wang, X., Grotjohann, I., Holton, J. M., Barends, T. R. M., Neutze, R., Marchesini, S., Fromme, R., Schorb, S., Rupp, D., Adolph, M., Gorkhover, T., Andersson, I., Hirsemann, H., Potdevin, G., Graafsma, H., Nilsson, B. & Spence, J. C. H. (2011). Nature, 470, 73–77.  Web of Science CrossRef CAS PubMed Google Scholar
First citationClark, G. N. I., Hura, G. L., Teixeira, J., Soper, A. K. & Head-Gordon, T. (2010). Proc. Natl Acad. Sci. USA, 107, 14003–14007.  Web of Science CrossRef CAS PubMed Google Scholar
First citationDarwin, C. G. (1922). London Edinb. Dubl. Philos. Mag. J. Sci. 43, 800–829.  CrossRef CAS Google Scholar
First citationDasgupta, M., Budday, D., de Oliveira, S. H. P., Madzelan, P., Marchany-Rivera, D., Seravalli, J., Hayes, B., Sierra, R. G., Boutet, S., Hunter, M. S., Alonso-Mori, R., Batyuk, A., Wierman, J., Lyubimov, A., Brewster, A. S., Sauter, N. K., Applegate, G. A., Tiwari, V. K., Berkowitz, D. B., Thompson, M. C., Cohen, A. E., Fraser, J. S., Wall, M. E., van den Bedem, H. & Wilson, M. A. (2019). Proc. Natl Acad. Sci. USA, 116, 25634–25640.  Web of Science CrossRef CAS PubMed Google Scholar
First citationDejoie, C., McCusker, L. B., Baerlocher, C., Abela, R., Patterson, B. D., Kunz, M. & Tamura, N. (2013). J. Appl. Cryst. 46, 791–794.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDePonte, D. P., Weierstall, U., Schmidt, K., Warner, J., Starodub, D., Spence, J. C. H. & Doak, R. B. (2008). J. Phys. D Appl. Phys. 41, 195505.  Web of Science CrossRef Google Scholar
First citationDiederichs, K. (2009). Acta Cryst. D65, 535–542.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDilanian, R. A., Williams, S. R., Martin, A. V., Stretsov, V. A. & Quiney, H. M. (2016). IUCrJ, 3, 127–138.  CrossRef CAS PubMed IUCr Journals Google Scholar
First citationDuisenberg, A. J. M., Kroon-Batenburg, L. M. J. & Schreurs, A. M. M. (2003). J. Appl. Cryst. 36, 220–229.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationEmma, P., Bane, K., Cornacchia, M., Huang, Z., Schlarb, H., Stupakov, G. & Walz, D. (2004). Phys. Rev. Lett. 92, 074801.  Web of Science CrossRef PubMed Google Scholar
First citationEvans, P. R. & Murshudov, G. N. (2013). Acta Cryst. D69, 1204–1214.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationFransson, T., Chatterjee, R., Fuller, F. D., Gul, S., Weninger, C., Sokaras, D., Kroll, T., Alonso-Mori, R., Bergmann, U., Kern, J., Yachandra, V. K. & Yano, J. (2018). Biochemistry, 57, 4629–4637.  Web of Science CrossRef CAS PubMed Google Scholar
First citationGao, F. & Han, L. (2012). Comput. Optim. Appl. 51, 259–277.  Web of Science CrossRef Google Scholar
First citationGevorkov, Y., Barty, A., Brehm, W., White, T. A., Tolstikova, A., Wiedorn, M. O., Meents, A., Grigat, R.-R., Chapman, H. N. & Yefanov, O. (2020). Acta Cryst. A76, 121–131.  Web of Science CrossRef IUCr Journals Google Scholar
First citationGinn, H. M., Evans, G., Sauter, N. K. & Stuart, D. I. (2016). J. Appl. Cryst. 49, 1065–1072.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGrosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHammersley, A. (2009). FIT2D. ESRF, Grenoble, France.  Google Scholar
First citationHara, T., Inubushi, Y., Katayama, T., Sato, T., Tanaka, H., Tanaka, T., Togashi, T., Togawa, K., Tono, K., Yabashi, M. & Ishikawa, T. (2013). Nat. Commun. 4, 1–5.  CrossRef Google Scholar
First citationHart, P., Boutet, S., Carini, G., Dubrovin, M., Duda, B., Fritz, D., Haller, G., Herbst, R., Herrmann, S., Kenney, C., Kurita, N., Lemke, H., Messerschmidt, M., Nordby, M., Pines, J., Schafer, D., Swift, M., Weaver, M., Williams, G., Zhu, D., Van Bakel, N. & Morse, J. (2012). Proc. SPIE, 8504, 85040C.  CrossRef Google Scholar
First citationHendrickson, W. A. & Ogata, C. M. (1997). Methods Enzymol. 276, 494–523.   CrossRef CAS PubMed Web of Science Google Scholar
First citationHenke, B. L., Gullikson, E. M. & Davis, J. C. (1993). At. Data Nucl. Data Tables, 54, 181–342.  CrossRef CAS Web of Science Google Scholar
First citationHolton, J. M., Classen, S., Frankel, K. A. & Tainer, J. A. (2014). FEBS J. 281, 4046–4060.  Web of Science CrossRef CAS PubMed Google Scholar
First citationHolton, J. M. & Frankel, K. A. (2010). Acta Cryst. D66, 393–408.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHolton, J. M., Nielsen, C. & Frankel, K. A. (2012). J. Synchrotron Rad. 19, 1006–1011.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationIbrahim, M., Fransson, T., Chatterjee, R., Cheah, M. H., Hussein, R., Lassalle, L., Sutherlin, K. D., Young, I. D., Fuller, F. D., Gul, S., Kim, I. S., Simon, P. S., de Lichtenberg, C., Chernev, P., Bogacz, I., Pham, C. C., Orville, A. M., Saichek, N., Northen, T., Batyuk, A., Carbajo, S., Alonso-Mori, R., Tono, K., Owada, S., Bhowmick, A., Bolotovsky, R., Mendez, D., Moriarty, N. W., Holton, J. M., Dobbek, H., Brewster, A. S., Adams, P. D., Sauter, N. K., Bergmann, U., Zouni, A., Messinger, J., Kern, J., Yachandra, V. K. & Yano, J. (2020). Proc. Natl Acad. Sci. USA, 117, 12624–12635.  CrossRef CAS PubMed Google Scholar
First citationJames, R. W. (1962). The Optical Principles of the Diffraction of X-rays. London: Bell.  Google Scholar
First citationKabsch, W. (2014). Acta Cryst. D70, 2204–2216.  Web of Science CrossRef IUCr Journals Google Scholar
First citationKahn, R., Fourme, R., Gadet, A., Janin, J., Dumas, C. & André, D. (1982). J. Appl. Cryst. 15, 330–337.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationKe, T.-W., Brewster, A. S., Yu, S. X., Ushizima, D., Yang, C. & Sauter, N. K. (2018). J. Synchrotron Rad. 25, 655–670.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationKern, J., Chatterjee, R., Young, I. D., Fuller, F. D., Lassalle, L., Ibrahim, M., Gul, S., Fransson, T., Brewster, A. S., Alonso-Mori, R., Hussein, R., Zhang, M., Douthit, L., de Lichtenberg, C., Cheah, M. H., Shevela, D., Wersig, J., Seuffert, I., Sokaras, D., Pastor, E., Weninger, C., Kroll, T., Sierra, R. G., Aller, P., Butryn, A., Orville, A. M., Liang, M., Batyuk, A., Koglin, J. E., Carbajo, S., Boutet, S., Moriarty, N. W., Holton, J. M., Dobbek, H., Adams, P. D., Bergmann, U., Sauter, N. K., Zouni, A., Messinger, J., Yano, J. & Yachandra, V. K. (2018). Nature, 563, 421–425.  Web of Science CrossRef CAS PubMed Google Scholar
First citationKirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). Opt. Express, 18, 5713–5723.  Web of Science CrossRef PubMed Google Scholar
First citationKroon-Batenburg, L. M. J., Schreurs, A. M. M., Ravelli, R. B. G. & Gros, P. (2015). Acta Cryst. D71, 1799–1811.  Web of Science CrossRef IUCr Journals Google Scholar
First citationLan, T.-Y., Wierman, J. L., Tate, M. W., Philipp, H. T., Martin-Garcia, J. M., Zhu, L., Kissick, D., Fromme, P., Fischetti, R. F., Liu, W., Elser, V. & Gruner, S. M. (2018). IUCrJ, 5, 548–558.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationLeonarski, F., Redford, S., Mozzanica, A., Lopez-Cuenca, C., Panepucci, E., Nass, K., Ozerov, D., Vera, L., Olieric, V., Buntschu, D., Schneider, R., Tinti, G., Froejdh, E., Diederichs, K., Bunk, O., Schmitt, B. & Wang, M. (2018). Nat. Methods, 15, 799–804.  Web of Science CrossRef CAS PubMed Google Scholar
First citationLeslie, A. G. W. (1999). Acta Cryst. D55, 1696–1702.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationLiang, M., Williams, G. J., Messerschmidt, M., Seibert, M. M., Montanez, P. A., Hayes, M., Milathianaki, D., Aquila, A., Hunter, M. S., Koglin, J. E., Schafer, D. W., Guillet, S., Busse, A., Bergan, R., Olson, W., Fox, K., Stewart, N., Curtis, R., Miahnahri, A. A. & Boutet, S. (2015). J. Synchrotron Rad. 22, 514–519.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationLiu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503–528.  CrossRef Web of Science Google Scholar
First citationLoh, N. D. & Elser, V. (2009). Phys. Rev. E, 80, 026705.  Web of Science CrossRef Google Scholar
First citationLu, W., Friedrich, B., Noll, T., Zhou, K., Hallmann, J., Ansaldi, G., Roth, T., Serkez, S., Geloni, G., Madsen, A. & Eisebitt, S. (2018). Rev. Sci. Instrum. 89, 063121.  Web of Science CrossRef PubMed Google Scholar
First citationLutman, A. A., Decker, F.-J., Arthur, J., Chollet, M., Feng, Y., Hastings, J., Huang, Z., Lemke, H., Nuhn, H.-D., Marinelli, A., Turner, J. L., Wakatsuki, S., Welch, J. & Zhu, D. (2014). Phys. Rev. Lett. 113, 254801.  Web of Science CrossRef PubMed Google Scholar
First citationLyubimov, A. Y., Uervirojnangkoorn, M., Zeldin, O. B., Zhou, Q., Zhao, M., Brewster, A. S., Michels-Clark, T., Holton, J. M., Sauter, N. K., Weis, W. I. & Brunger, A T. (2016). eLife, 5, e18740.  CrossRef PubMed Google Scholar
First citationMacDowell, A. A., Celestre, R. S., Howells, M., McKinney, W., Krupnick, J., Cambie, D., Domning, E. E., Duarte, R. M., Kelez, N., Plate, D. W., Cork, C. W., Earnest, T. N., Dickert, J., Meigs, G., Ralston, C., Holton, J. M., Alber, T., Berger, J. M., Agard, D. A. & Padmore, H. A. (2004). J. Synchrotron Rad. 11, 447–455.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMargaritondo, G. & Rebernik Ribic, P. (2011). J. Synchrotron Rad. 18, 101–108.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMartin-Garcia, J. M., Zhu, L., Mendez, D., Lee, M.-Y., Chun, E., Li, C., Hu, H., Subramanian, G., Kissick, D., Ogata, C., Henning, R., Ishchenko, A., Dobson, Z., Zhang, S., Weierstall, U., Spence, J. C. H., Fromme, P., Zatsepin, N. A., Fischetti, R. F., Cherezov, V. & Liu, W. (2019). IUCrJ, 6, 412–425.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationMilne, C. J., Schietinger, T., Aiba, M., Alarcon, A., Alex, J., Anghel, A., Arsov, V., Beard, C., Beaud, P., Bettoni, S., Bopp, M., Brands, H., Brönnimann, M., Brunnenkant, I., Calvi, M., Citterio, A., Craievich, P., Csatari Divall, M., Dällenbach, M., D'Amico, M., Dax, A., Deng, Y., Dietrich, A., Dinapoli, R., Divall, E., Dordevic, S., Ebner, S., Erny, C., Fitze, H., Flechsig, U., Follath, R., Frei, F., Gärtner, F., Ganter, R., Garvey, T., Geng, Z., Gorgisyan, I., Gough, C., Hauff, A., Hauri, C. P., Hiller, N., Humar, T., Hunziker, S., Ingold, G., Ischebeck, R., Janousch, M., Juranić, P., Jurcevic, M., Kaiser, M., Kalantari, B., Kalt, R., Keil, B., Kittel, C., Knopp, G., Koprek, W., Lemke, H. T., Lippuner, T., Llorente Sancho, D., Löhl, F., Lopez-Cuenca, C., Märki, F., Marcellini, F., Marinkovic, G., Martiel, I., Menzel, R., Mozzanica, A., Nass, K., Orlandi, G. L., Ozkan Loch, C., Panepucci, E., Paraliev, M., Patterson, B., Pedrini, B., Pedrozzi, M., Pollet, P., Pradervand, C., Prat, E., Radi, P., Raguin, J.-Y., Redford, S., Rehanek, J., Réhault, J., Reiche, S., Ringele, M., Rittmann, J., Rivkin, L., Romann, A., Ruat, M., Ruder, C., Sala, L., Schebacher, L., Schilcher, T., Schlott, V., Schmidt, T., Schmitt, B., Shi, X., Stadler, M., Stingelin, L., Sturzenegger, W., Szlachetko, J., Thattil, D., Treyer, D. M., Trisorio, A., Tron, W., Vetter, S., Vicario, C., Voulot, D., Wang, M., Zamofing, T., Zellweger, C. & Zennaro, R. (2017). Appl. Sci. 7, 720.  CrossRef Google Scholar
First citationNango, E., Kubo, M., Tono, K. & Iwata, S. (2019). Appl. Sci. 9, 5505.  CrossRef Google Scholar
First citationNave, C. (1998). Acta Cryst. D54, 848–853.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNave, C. (2014). J. Synchrotron Rad. 21, 537–546.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNogly, P., Weinert, T., James, D., Carbajo, S., Ozerov, D., Furrer, A., Gashi, D., Borin, V., Skopintsev, P., Jaeger, K., Nass, K., Bath, P., Bosman, R., Koglin, J., Seaberg, M., Lane, T., Kekilli, D., Brunle, S., Tanaka, T., Wu, W., Milne, C., White, T., Barty, A., Weierstall, U., Panneels, V., Nango, E., Iwata, S., Hunter, M., Schapiro, I., Schertler, G., Neutze, R. & Standfuss, J. (2018). Science, 361, eaat0094.  Web of Science CrossRef PubMed Google Scholar
First citationPande, K., Hutchison, C. D. M., Groenhof, G., Aquila, A., Robinson, J. S., Tenboer, J., Basu, S., Boutet, S., DePonte, D. P., Liang, M., White, T. A., Zatsepin, N. A., Yefanov, O., Morozov, D., Oberthuer, D., Gati, C., Subramanian, G., James, D., Zhao, Y., Koralek, J., Brayshaw, J., Kupitz, C., Conrad, C., Roy-Chowdhury, S., Coe, J. D., Metz, M., Xavier, P. L., Grant, T. D., Koglin, J. E., Ketawala, G., Fromme, R., rajer, V., Henning, R., Spence, J. C. H., Ourmazd, A., Schwander, P., Weierstall, U., Frank, M., Fromme, P., Barty, A., Chapman, H. N., Moffat, K., van Thor, J. J. & Schmidt, M. (2016). Science, 352, 725–729.  Web of Science CrossRef CAS PubMed Google Scholar
First citationParkhurst, J. M. (2020). Statistically Robust Methods for the Integration and Analysis of X-ray Diffraction Data from Pixel Array Detectors. Doctoral thesis, University of Cambridge.  Google Scholar
First citationPawitan, Y. (2001). In All Likelihood: Statistical Modelling and Inference using Likelihood. Oxford University Press.  Google Scholar
First citationPinker, F., Brun, M., Morin, P., Deman, A.-L., Chateaux, J.-F., Oliéric, V., Stirnimann, C., Lorber, B., Terrier, N., Ferrigno, R. & Sauter, C. (2013). Cryst. Growth Des. 13, 3333–3340.  Web of Science CrossRef CAS Google Scholar
First citationRossmann, M. G. (1979). J. Appl. Cryst. 12, 225–238.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSauter, N. K. (2015). J. Synchrotron Rad. 22, 239–248.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299–3309.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSauter, N. K., Hattne, J., Grosse-Kunstleve, R. W. & Echols, N. (2013). Acta Cryst. D69, 1274–1282.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSauter, N. K., Kern, J., Yano, J. & Holton, J. M. (2020). Acta Cryst. D76, 176–192.  CrossRef IUCr Journals Google Scholar
First citationSchlichting, I. (2017). IUCrJ, 4, 516.  Google Scholar
First citationSchreurs, A. M. M., Xian, X. & Kroon-Batenburg, L. M. J. (2010). J. Appl. Cryst. 43, 70–82.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSeijas-Macías, A. & Oliveira, A. (2012). Discuss. Math. Probab. Stat. 32, 87–99.  Google Scholar
First citationShapiro, L., Fannon, A. M., Kwong, P. D., Thompson, A., Lehmann, M. S., Grübel, G., Legrand, J.-F., Als-Nielsen, J., Colman, D. R. & Hendrickson, W. A. (1995). Nature, 374, 327–337.  CrossRef CAS PubMed Google Scholar
First citationSharma, A., Johansson, L., Dunevall, E., Wahlgren, W. Y., Neutze, R. & Katona, G. (2017). Acta Cryst. A73, 93–101.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSikorski, M., Feng, Y., Song, S., Zhu, D., Carini, G., Herrmann, S., Nishimura, K., Hart, P. & Robert, A. (2016). J. Synchrotron Rad. 23, 1171–1179.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSpence, J. C. H. (2017). IUCrJ, 4, 322–339.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationStagno, J. R., Liu, Y., Bhandari, Y. R., Conrad, C. E., Panja, S., Swain, M., Fan, L., Nelson, G., Li, C., Wendel, D. R., White, T. A., Coe, J. D., Wiedorn, M. O., Knoska, J., Oberthuer, D., Tuckey, R. A., Yu, P., Dyba, M., Tarasov, S. G., Weierstall, U., Grant, T. D., Schwieters, C. D., Zhang, J., Ferré-D'Amaré, A. R., Fromme, P., Draper, D. E., Liang, M., Hunter, M. S., Boutet, S., Tan, K., Zuo, X., Ji, X., Barty, A., Zatsepin, N. A., Chapman, H. N., Spence, J. C. H., Woodson, S. A. & Wang, Y. (2017). Nature, 541, 242–246.  Web of Science CrossRef CAS PubMed Google Scholar
First citationSteller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036–1040.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTerwilliger, T. C., Bunkóczi, G., Hung, L.-W., Zwart, P. H., Smith, J. L., Akey, D. L. & Adams, P. D. (2016). Acta Cryst. D72, 346–358.  Web of Science CrossRef IUCr Journals Google Scholar
First citationThomaston, J. L., Woldeyes, R. A., Nakane, T., Yamashita, A., Tanaka, T., Koiwai, K., Brewster, A. S., Barad, B. A., Chen, Y., Lemmin, T., Uervirojnangkoorn, M., Arima, T., Kobayashi, J., Masuda, T., Suzuki, M., Sugahara, M., Sauter, N. K., Tanaka, R., Nureki, O., Tono, K., Joti, Y., Nango, E., Iwata, S., Yumoto, F., Fraser, J. S. & DeGrado, W. F. (2017). Proc. Natl Acad. Sci. USA, 114, 13357–13362.  Web of Science CrossRef CAS PubMed Google Scholar
First citationTosha, T., Nomura, T., Nishida, T., Saeki, N., Okubayashi, K., Yamagiwa, R., Sugahara, M., Nakane, T., Yamashita, K., Hirata, K. et al. (2017). Nat. Commun. 8, 1–9.  CrossRef CAS PubMed Google Scholar
First citationUervirojnangkoorn, M., Zeldin, O. B., Lyubimov, A. Y., Hattne, J., Brewster, A. S., Sauter, N. K., Brunger, A. T. & Weis, W. I. (2015). eLife, 4, e05421.  Web of Science CrossRef Google Scholar
First citationWaterman, D. G., Winter, G., Gildea, R. J., Parkhurst, J. M., Brewster, A. S., Sauter, N. K. & Evans, G. (2016). Acta Cryst. 72, 558–575.  Google Scholar
First citationWhite, T. A. (2014). Philos. Trans. R. Soc. B, 369, 20130330.  Web of Science CrossRef Google Scholar
First citationWhite, T. A., Kirian, R. A., Martin, A. V., Aquila, A., Nass, K., Barty, A. & Chapman, H. N. (2012). J. Appl. Cryst. 45, 335–341.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationWhite, T. A., Mariani, V., Brehm, W., Yefanov, O., Barty, A., Beyerlein, K. R., Chervinskii, F., Galli, L., Gati, C., Nakane, T., Tolstikova, A., Yamashita, K., Yoon, C. H., Diederichs, K. & Chapman, H. N. (2016). J. Appl. Cryst. 49, 680–689.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationWinter, G., Waterman, D. G., Parkhurst, J. M., Brewster, A. S., Gildea, R. J., Gerstel, M., Fuentes-Montero, L., Vollmar, M., Michels-Clark, T., Young, I. D., Sauter, N. K. & Evans, G. (2018). Acta Cryst. D74, 85–97.  Web of Science CrossRef IUCr Journals Google Scholar
First citationYefanov, O., Mariani, V., Gati, C., White, T. A., Chapman, H. N. & Barty, A. (2015). Opt. Express, 23, 28459–28470.  Web of Science CrossRef PubMed Google Scholar
First citationZhu, D., Cammarata, M., Feldkamp, J. M., Fritz, D. M., Hastings, J. B., Lee, S., Lemke, H. T., Robert, A., Turner, J. L. & Feng, Y. (2012). Appl. Phys. Lett. 101, 034103.  Web of Science CrossRef Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

IUCrJ
Volume 7| Part 6| November 2020| Pages 1151-1167
ISSN: 2052-2525