## research papers

## Accounting for partiality in serial crystallography using ray-tracing principles

**Loes M. J. Kroon-Batenburg,**

^{a}^{*}Antoine M. M. Schreurs,^{a}Raimond B. G. Ravelli^{b}and Piet Gros^{a}^{a}Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands, and ^{b}M4I Division of Nanoscopy, Maastricht University, PO Box 616, 6200 MD Maastricht, The Netherlands^{*}Correspondence e-mail: l.m.j.kroon-batenburg@uu.nl

Serial crystallography generates `still' diffraction data sets that are composed of single diffraction images obtained from a large number of crystals arbitrarily oriented in the X-ray beam. Estimation of the reflection partialities, which accounts for the expected observed fractions of diffraction intensities, has so far been problematic. In this paper, a method is derived for modelling the partialities by making use of the ray-tracing diffraction-integration method *EVAL*. The method estimates partialities based on crystal mosaicity, beam divergence, crystal size and the interference function, accounting for crystallite size. It is shown that modelling of each reflection by a distribution of interference-function weighted rays yields a `still' Lorentz factor. Still data are compared with a conventional rotation data set collected from a single lysozyme crystal. Overall, the presented still integration method improves the data quality markedly. The *R* factor of the still data compared with the rotation data decreases from 26% using a Monte Carlo approach to 12% after applying the Lorentz correction, to 5.3% when estimating partialities by *EVAL* and finally to 4.7% after post-refinement. The merging *R*_{int} factor of the still data improves from 105 to 56% but remains high. This suggests that the accuracy of the model parameters could be further improved. However, with a multiplicity of around 40 and an *R*_{int} of ∼50% the merged still data approximate the quality of the rotation data. The presented integration method suitably accounts for the partiality of the observed intensities in still diffraction data, which is a critical step to improve data quality in serial crystallography.

Keywords: serial crystallography; *EVAL*; partiality of still data.

### 1. Introduction

X-ray free-electron lasers and high-brilliance undulator beamlines at synchrotrons have been used to perform serial (femtosecond) crystallography, collecting diffraction data from a large number (thousands up to millions) of micrometre-sized or nanometre-sized crystals (Chapman *et al.*, 2011; Boutet *et al.*, 2012; Redecke *et al.*, 2013; Gati *et al.*, 2014; Demirci *et al.*, 2013). Individual crystals may be hit by an X-ray pulse, thereby producing a diffraction pattern within the 10–50 fs before being vaporized by the transferred energy. This principle of `diffraction before destruction' has been demonstrated by experiments on the Linac Coherent Light Source (LCLS) hard X-ray free-electron laser (Chapman *et al.*, 2011). Since the X-ray pulses are shorter than it takes for radiation-induced structural changes to occur, this approach of serial crystallography overcomes radiation damage, which has become a major problem with highly brilliant synchrotron sources (Weik *et al.*, 2000; Ravelli & McSweeney, 2000; Burmeister, 2000) using conventional rotation methods of collecting data from one or very few larger crystals. The diffraction images in serial crystallography are single snapshots of nonrotating crystals: so-called still images. As opposed to the conventional rotation data, the reflections are not fully integrated but are partials, except possibly when using future pink XFEL beams (Dejoie *et al.*, 2015). The particular orientation of the crystal determines the extent of this partiality, which is a great unknown in the data-reduction process.

The specific challenges in data processing are the indexing of the stills, the reconstruction of full intensities and the merging of data obtained from different crystals, in addition to the handling of huge amounts of data. Three software packages are available to process serial X-ray diffraction patterns: *CrystFEL* (White *et al.*, 2012, 2013; White, 2014), *cctbx.xfel* from the *Computational Crystallographic Toolbox* (Sauter *et al.*, 2013; Hattne *et al.*, 2014) and *nXDS* (Kabsch, 2014). For indexing, rotation-method indexing packages such as *MOSFLM* (Leslie & Powell, 2007), *DirAx* (Duisenberg, 1992) and *LABELIT* (Sauter *et al.*, 2004) are being used. In 2010, Kirian and coworkers proposed a Monte Carlo integration method that, by averaging large numbers of diffraction spots, averages out the unknown partialities as well as differences in crystal size, beam and the incident spectrum (Kirian *et al.*, 2010). Thousands of diffraction images are needed for this method to converge (Boutet *et al.*, 2012). It is generally believed (White, 2014) that estimation of partialities could reduce the number of images needed for the Monte Carlo integration method and could improve the data quality. Three approaches have been proposed to estimate partialities. All three use post-refinement to improve the partiality correction factors and scale factors for each image. Kabsch (2014) derived an analytical expression for partiality from a Gaussian mosaic spread function. Comparison of ultrafine-sliced rotation images treated as stills or as normal rotation images gave satisfactory results. Kabsch includes a Lorentz factor for still data explicitly. The still data processing is not as good as one would expect, according to Kabsch. He concludes that this may be caused by two-dimensional rather than three-dimensional profile fits and the lack of other unimplemented corrections. White (2014) considers the overlap of reciprocal reflection volumes with a nest of Ewald spheres and calculates partialities from the distance of reciprocal-lattice points to the two limiting Ewald spheres. Using modelled data, White shows that the partiality estimation improve the data, with significant improvement of the statistics upon post-refinement. Most recently, Sauter (2015) and Uervirojnangkoorn *et al.* (2015) presented a partiality model that is implemented in *cctbx.xfel*. They calculated the intersection with the of a spherical reciprocal-lattice point, where the radii of the points are determined by mosaic spread and (asymmetric) beam divergence. Sauter (2015) also includes a parameter for the coherently scattering volume of mosaic blocks. Using this approach on XFEL data with post-refinement of crystal orientations, scale factors and beam parameters, the data are improved in quality as judged from molecular-replacement scores, structural and anomalous difference maps (Uervirojnangkoorn *et al.*, 2015). Moreover, they show that reliable structures can be obtained with a lower number of images. Unfortunately, these authors do not mention merging *R* factors. Another correction that potentially improves Monte Carlo integration convergence for nanocrystals is explored by estimation of the crystal sizes and their corresponding diffraction power, as described by Qu *et al.* (2014). They show that the geometric correction factor, solely based on the maximum of the Laue interference function for each crystal with size *N _{x}* ×

*N*×

_{y}*N*, is superior to Monte Carlo integration for simulated data. Although the above efforts were made to improve processed serial crystallography still data, many questions still need to be addressed. Why do the data not improve rigorously with the current partiality-correction models? What factors exactly determine the partiality? Which errors dominate the partiality-estimation schemes?

_{z}Here, we describe an extension of the *EVAL* profile-prediction algorithm to process still images. *EVAL* is a data-reduction method designed for integrating reflection intensities through profile fitting using ray-tracing simulations (Duisenberg *et al.*, 2003; Schreurs *et al.*, 2010). We derived a general interference function that is valid for crystals of any size and effectively includes the shape transform. The diffraction process is simulated by typically 10 000 rays, which are diffracted by an equal amount of reciprocal-lattice vectors. In the rotation method, we bring reciprocal-lattice vectors onto the by rotation around the spindle axis. However, in the still diffraction method we calculate the deviation from the exact Bragg condition for each ray and estimate its contribution to the total diffracted intensity using the interference function. By summation, the partiality of a reflection is obtained and, as we will show, also the still Lorentz factor. To test the approach, we used two still data sets collected on our in-house diffractometer using a single lysozyme crystal: one consisting of consecutive, stepwise stills and one consisting of stills from arbitrary orientations. Both were compared with conventional rotation data collected under the same conditions. We show that for these data sets the reflection partialities can be estimated by the ray-tracing simulation method and that the presented approach significantly improves the mean intensities of the observed reflections.

### 2. Diffraction theory

Reflection profiles from a crystal in *EVAL* are simulated by generating ray traces. We consider a crystal to be built up from small crystallites by dividing the crystal on a three-dimensional grid (sampled from a distribution **K**) that can have random orientations taken from a mosaic distribution (**M**). Incident X-rays are emitted from a virtual focus (*e.g.* a square area **F**) in direction **k**_{0} with respect to the crystallite and with wavelength λ (sampled from a **L**). A crystallite with an orientation of the reciprocal-lattice vector gives rise to a diffracted ray in direction **k**_{1} as determined by the Ewald construction. For **F**, **L**, **K** and **M** several statistical distributions are available (Schreurs *et al.*, 2010). In the simulation of rotation data the vectors are rotated around the spindle axis so as to match the Bragg condition and then touch the (Fig. 1).

In case of still diffraction experiments with one particular orientation of the crystal, none of the crystallite ∊) of the θ and may give rise to diffracted intensity that is a function of ∊ (see below). The integrated intensity of all vectors from the various crystallites depends on the mosaic spread, the the beam size and divergence, the crystal size and the crystallite size itself. The latter corresponds to the coherently diffracting volume of the mosaic blocks, and the total reflected intensity of the crystal is the incoherent sum of all diffracted rays.

vectors are exactly on the However, some vectors are within a certain tolerance in angular deviation (#### 2.1. Still diffraction images

The scattered intensity of a crystal at θ can be thought of as the coherent sum of scattering by *s* layers of thickness *d*, according to what we call the James–Buerger theory (James, 1958; Buerger, 1960). The scattered intensity of a single layer is

where *F*^{2} is the squared *I*_{0} is the incident *p* is the polarization owing to the reflection, *e*^{2}/*mc*^{2} is the Thomson scattering length of one electron and *n* is the number of unit cells per unit volume. A is made up of tiny crystallites with associated reciprocal-lattice vectors that are spread over an angular range μ, and each of them may not be perfectly oriented to be in Bragg condition. The *s* layers within such a crystallite then scatter slightly out of phase and their scattered intensity is given by the interference function

where ∊ is the deviation of the θ, *B* = 2π*d*cosθ/λ and *V*_{crystallite} is the reflecting volume of the crystallite.

The James–Buerger theory can be extended by writing the total diffracted intensity as an integral over all possible orientations of the crystallite vectors that make angles of 90° − η with the incident X-ray beam (90° − θ in the Bragg condition) and replacing *V*_{crystallite} by the volume of the crystal *V* and ∊ by θ − η (see Fig. 2). This results in

where *P*(η|η_{0}) is the probability distribution of η angles given the angle η_{0} of the central reciprocal-lattice vector (0) of the crystal as obtained from the unit-cell matrix.

The mosaic spread, the divergence of incident rays, the wavelength variations and the crystallite positions being slightly off-centre in a larger crystal are the cause of deviations ∊ for the individual rays. For the discussion here, we will concentrate on the mosaic spread, but the other parameters are accounted for as well in our ray-tracing simulations.

The distribution function can take several forms. Suppose that *P*(η|η_{0}) is uniform, while , then the integral over dη in (3) reduces to *s*π/*B* and

where *C* = *I*_{0}*F*^{2}*p*(*e*^{2}/*mc*^{2})^{2}λ^{3}*n*^{2}. The last term in (4) is familiar: it is the Lorentz factor for rotation in the equatorial plane and is equal to the powder Lorentz factor. Thus, for still images the powder Lorentz factor applies. When *P*(η|η_{0}) is a normal or an otherwise monotonous distribution, we should explicitly include it in the calculation of (3). However, it cannot be reduced to a simple trigonometric function nor to an erf (see Kabsch, 2014) because of the presence of the sinc function in the integral. Instead, it can be evaluated numerically.

#### 2.2. Rotation diffraction images

Rotation images can be regarded as a superposition of many stills separated by an infinitesimal rotation angle ω. The integrated intensity for these is

In a sufficiently large ω scan each vector makes a complete pass through the so that we can write

where *L*′ is the duration component of the Lorentz factor *L* for the rotation experiment (equal to the reflection range in Kabsch, 2014). The rotation Lorentz factor may alternatively be written as 1/[**d**^{*}(**k**_{0} × **ω**)] (Milch & Minor, 1974). In case of rotation in the equator, (6) reduces to (4). In rotation data, therefore, the specific distribution function *P*(η) is irrelevant to the integrated intensity and complete reflections are obtained.

#### 2.3. Implementation of the interference function in *EVAL*

In *EVAL* a large number of vectors (*i*) are generated from a Gaussian or Lorentzian two-dimensional mosaic distributions (**M**) and combined with vectors **k**_{0} from **F**, **L**, **K** distributions. The contribution of each of these to the scattered intensity is calculated with

Summing all contributions gives the total scattered normalized intensity (*i.e.* *C* = 1.0; see text below equation 4), which is effectively an integral over d*η*, d**k**_{0} and, to a minor extent, d*λ*, because our beam is almost monochromatic. This normalized intensity is stored in the parameter `partiality' after correction for the still Lorentz factor, *i.e.* the partiality is . The only new parameter introduced is the number of unit cells in the crystallite *N*_{cell}, where *s* = *N*_{cell}(|*h*| + |*k*| + |*l*|), the number of reflecting planes, while the crystallite size equals *sd*_{hkl}.

Every (*i*) produces its own impact on detector pixel coordinates (*x*, *y*) and is weighted by contribution *I _{i}*. All impacts together build the two-dimensional reflection profile that is used as the model profile in the

*EVAL*least-squares fit to obtain the observed integrated intensity for each diffraction spot on an image. Both the observed intensity and the summed interference function (7) contain the still Lorentz factor, and by dividing one by the other we extract

*F*

^{2}. We also correct for the polarization and apply possible incidence corrections.

#### 2.4. Laue interference function

In this paper, we follow the James and Buerger approach, as explained in §2.1, for deriving diffracted intensities by crystals. The resulting interference function only depends on the deviation ∊ from the θ and the number of unit cells contained in the crystal. An alternative is to use the three Laue conditions, and the squared sinc function in (3) is replaced by

Here,Δ**k** = **k**_{1} − **k**_{0} and *N*_{1}, *N*_{2} and *N*_{3} are the number of unit cells in the three periodic axis directions. In Appendix *A*, we show that the two approaches are exactly the same.

(8) is often referred to as the shape transform of the crystal (Kirian *et al.*, 2010; Spence *et al.*, 2011).

#### 2.5. Impact positions and refinement

Peak-position *PEAKREF* (Schreurs, 1999) minimizes the peak-position residuals and the deviation ∊_{0} of the central reciprocal-lattice vector either using peak maxima from the peak search or using optimized profile centroids from the *EVAL* profile fit. Inclusion of ∊_{0} in the unit-cell matrix avoids divergence of unit-cell orientations through rotations perpendicular to the incident beam, as discussed by Sauter *et al.* (2014). Similarly, Kabsch (2014) uses the angular deviation τ divided by the mosaic spread σ_{M} in the target function for peak All three approaches use the β axis, defined for each reflection as the axis perpendicular to the incident and diffracted beams, to calculate the deviation from the θ (Schutt & Winkler, 1977). For each still image, the following target function was minimized to refine the unit-cell matrix,

where Δ*x*_{i} and Δ*y*_{i} are the differences of observed and calculated peak positions. *PEAKREF* can optimize many instrumental parameters such as detector-offset positions, primary beam direction and crystal position, which in the current analysis were fixed in the still data and based on the rotation data (see below).

We found that the peak-position residuals from the post-refinement were much smaller than from the peak maxima found on a single still image, despite the much larger number of peaks. This was caused by a shift in the observed θ value for the partial reflections with large ∊_{0}. For large mosaic crystals such as our lysozyme crystal measured with a divergent beam, these shifts in θ occur because only distinct directions of the primary rays or distinct points on the crystal are active dependent on the deviation ∊_{0} (Fig. 3). A negative value for ∊_{0} results in an apparent larger θ and a positive ∊_{0} in an apparent smaller θ. This θ-divergence effect has to be taken into account when the cell matrix is determined and refined from peak positions. We introduced a parameter `flex' in *PEAKREF* that is jointly refined and takes account of this shift. The `flex' parameter turned out to have a constant value for all still images and it appears to be a property typical for the crystal and the beam divergence of the particular experiment.

#### 2.6. Post-refinement

We implemented a post-refinement procedure in which both the peak positions from the *EVAL* integration and the partialities could be refined. For this purpose, we calculate the mean intensity of all equivalent reflections *h* as

Weights are obtained from the standard deviations from the *EVAL* profile fit (Schreurs *et al.*, 2010) and are given by *w*_{e} = 1/σ_{e}^{2}. The partialities *p _{e}* arise from the

*EVAL*ray-tracing simulation, and the image scale factors

*s*are determined in

_{f}*ANY*(Schreurs, 2007), assuming a constant sum of Bragg intensities in each frame. The summation in (10) runs over all equivalents of reflection

*h*(

*N*) in the data set. In

^{h}*PEAKREF*image scale factors

*s*′ and unit-cell parameters and crystal orientation angles are refined using the target function

_{f}We specifically include peak positions in this *EVAL* ray-tracing procedure is not repeated to obtain partialities; instead, we use a fitted partiality *versus* ∊ curve with a single Gaussian. The parameters in the Gaussian were kept fixed in the ∊ changes with the unit-cell parameters from which we recalculate the partiality (*p _{i}′*).

### 3. Materials and methods

#### 3.1. Crystal preparation

Hen egg-white lysozyme (Sigma–Aldrich, Schnelldorf, Germany) was crystallized using the hanging-drop vapour-diffusion method with a protein concentration of 75 mg ml^{−1} in 0.1 *M* sodium acetate buffer pH 4.8. The precipitant consisted of 0.1 *M* sodium acetate buffer pH 4.8, 10–15%(*w*/*v*) sodium chloride, 30%(*v*/*v*) ethylene glycol (Sutton *et al.*, 2013). Drops of 4 µl were set up with a 1:1 protein:precipitant ratio.

#### 3.2. Data collection

A crystal of dimensions 250 × 250 × 150 µm was vitrified in a cold N_{2}-gas stream from an Oxford Instruments 700 series jet operated at 100 K. Data were collected on a Bruker–AXS X8 Proteum in-house source with Cu *K*α radiation. The rotating anode was operated at 45 kV and 60 mA. The reference rotation data set was collected by rotating over 190° in φ in 0.5° steps per frame. Data were recorded on a PLATINUM^{135} CCD detector with a sample-to-crystal distance of 52 mm. 380 still images were collected with identical angular settings as the starting angles for each of the rotation frames; thus, 380 still images were collected at 0.5° intervals. An additional 394 stills were recorded by random selections from ω scans 0–7° in ω apart at 15 different ω, κ and φ goniometer settings. The exposure time for all images was 5 s.

#### 3.3. Data processing and analysis

*VIEW* was used for image display and peak search (Schreurs, 1998). Both the rotation images and stills were indexed using *DirAx* (Duisenberg, 1992). Almost all stills could be indexed without manual intervention. constraints were applied and the unit cells were made congruent (using the goniometer positions), ensuring a consistent choice of unit-cell axes. The unit-cell matrix and detector positions were refined from 649 peak positions in the rotation data. For the still peak positions we used different options. In the first approach we made use of our knowledge of the relative positions of the goniometer axes, so that a global single unit-cell matrix could be refined against 10 728 peak positions. In the second approach, we determined and refined a unit-cell matrix for each image from 300 peak positions, as would be the normal procedure in serial crystallography. The detector-offset positions were taken from the peak-position of the rotation data. The unit-cell matrix was refined against the observed peak positions, using the `flex' parameter to account for apparent shifts in θ, simultaneously minimizing the off-Bragg angle ∊_{0} (9). Using the unit-cell matrix, we extracted three-dimensional and two-dimensional reflection boxes for rotation and still images, respectively, and processed these with *EVAL*. For every reflection, 10 000 rays were simulated and the impacts were collected in pixels contained in the box. In case of still data every individual ray is associated with a reciprocal-lattice vector (*i*) with a small angular deviation from the ∊_{i} and is weighted by the interference function (7). The impact position on the detector is given by the direction of the shortest distance of the reciprocal-lattice vector to the The divergence effects are accounted for in the ray tracing and thus the profiles are generated correctly at deviating positions in θ (*i.e.* without the need for a `flex' parameter as used at the peak-refinement stage).

The parameters for crystal size, mosaic spread and beam divergence were optimized automatically in the reflection profile fitted to ∼50 reflections with *I*/σ(*I*) > 20 using a simplex method (see Schreurs *et al.*, 2010). For comparison reasons, identical values of parameters in the ray-tracing simulations were used for both types of data sets, although a similar optimization can be performed for still images. In addition, for the still images we used *N*_{cell} = 25 in the interference function (the number of unit cells in a crystallite as described in §2.3). Sampling of the interference function converges much faster with low values of *N*_{cell}, typical for nano-sized crystals. The current data imply a larger value of *N*_{cell}, which in the current implementation would require many more rays (up to 10^{6} instead of 10 000) to sample reflection profiles smoothly. The integrated intensities are obtained by a least-squares fit of the three-dimensional and two-dimensional model profiles to the observed pixel intensities for the rotation and still reflections, respectively. *EVAL* then delivers the profile-fitted, Lorentz- and polarization-corrected intensity values in an XML-type datafile that is further processed in *ANY* (Schreurs, 2007). In this program, we determine image scale factors, correct for the partiality factor and output the intensities and standard deviations to an hkl- or mtz-type file. Many of the graphical plots and statistical analyses are made using *ANY*.

All still images were also processed with the *CrystFEL* software suite v.0.5.1 (White *et al.*, 2012). Structural refinements were carried out with *REFMAC*5 (Murshudov *et al.*, 2011) and scaling between data sets with *SCALEIT* (Howell & Smith, 1992), both from the *CCP*4 suite (Winn *et al.*, 2011). *ANODE* (Thorn & Sheldrick, 2011) was used to calculate anomalous difference densities.

### 4. Results

We collected rotation and still diffraction data from one lysozyme crystal and formed three data sets for analysis: a 190° rotation data set collected in 380 images in ranges of 0.5° for reference, a consecutive still data set of 380 images collected in steps of 0.5°, and this consecutive data set combined with 15 wedges of separate arbitrary orientations totalling 774 images (Table 1). Indexing and peak by *PEAKREF*/*EVAL* and *CrystFEL* yielded unit-cell dimensions that varied by ∼0.3–0.7% between the separate stills. Initially, the average residuals in peak positions for the still data were significantly larger than for the rotation data. Introduction of the `flex' parameter, which takes into account the apparent shift in θ owing to divergence effects (see §2.5), reduced the positional residuals of peak maxima on still images significantly: from 0.13 to 0.07–0.08 mm on average. This deviation is only slightly larger than that observed for the rotation data, which was 0.06 mm. The residual in rotation angle for the rotation data was 0.039° (for 0.5° scan width). For the still data the deviation from the ∊_{0} was 0.18°, which is consistent with a mosaic spread of 0.5°. Relaxing the unit cells by a separate matrix for each image lowered the ∊_{0} residuals to 0.14° (compare `single *versus* `unit cell per image' in Table 1). In our setup the orientation of each matrix was known because the crystal orientation was set using a goniometer. The r.m.s. deviations between the set and refined orientations of the reciprocal-lattice vectors **a***, **b*** and **c*** were 0.03° for the consecutive still data and increased to ∼0.10° using all still data (consecutive and random orientations). Overall, the number of observations taken into account by *EVAL* were 106 × 10^{3} for the rotation data set, 325 × 10^{3} for the consecutive still data and 657 × 10^{3} for all still data, whereas *CrystFEL* took 733 × 10^{3} into account for all still data (Table 2). All processed sets resulted in ∼8300 unique reflections. The multiplicity of the consecutive still data was roughly three times that of the rotation data, indicating that reflections were, on average, sliced through three times in our still data-collection experiment.

‡Average angular deviation of central reciprocal-lattice vector d* with (see text for explanation). |

mmm. ‡Still data are not scaled by SADABS like the rotation data and no error model is determined for σ. In the merging step σ is determined from the internal standard deviation . § R_{int} = , where the summations runs over all N unique reflections h and equivalents. |

The statistics for the integration and merging of data for the rotation and still data are shown in Table 2. Processing of the reference rotation data yielded an internal merging *R*_{int} of 3.8% with an 〈*I*/σ(*I*)〉 after merging of 47.7. Processing of the still diffraction data without correction, referred to as Monte Carlo averaging, produced *R*_{int} values exceeding 100% and 〈*I*/σ(*I*)〉 values that were about fourfold lower than that for the rotation data using the same number of images. Application of the still Lorentz correction (4) slightly increased the *R*_{int} (Table 2).

To estimate partialities, we determined the parameters for mosaic spread, divergence of the incident beam, crystal size and *N*_{cell} by optimizing two-dimensional profile fits using figures of merit (Schreurs *et al.*, 2010) on a subset of reflections in *EVAL*. Mosaic spread was set to 0.5°, beam divergence to 8.6 mrad, crystal size to 130 × 130 × 130 µm (although we estimated a slightly larger size when selecting the crystal under the microscope) and *N*_{cell} to 25. The ray-tracing procedure yielded partialities which showed a Gaussian-like distribution with ∊_{0} (Fig. 4*a*). Notably, the computed still partialities are not normalized and exceed a value of 1, and hence are used as relative scale factors. In rotation data the partiality is defined up to 1 for a fully observed reflection (Rossmann & Beek, 1999); in contrast, the partiality in still diffraction is determined by the angular width of the intersection with the which depends on various instrumental and crystal parameters such as those given by (3). Lorentz-corrected still and (Lorentz-corrected) rotation reflections on average give the same absolute intensities. Fig. 5 shows that some still partialities are larger than 1.0 and the still intensities scatter around the rotation intensity. Further, to illustrate that the partialities depend strongly on the precise ray-tracing model parameters, Fig. 4(*b*) shows the partialities as a function of ∊_{0} in the case of a long focus for the incident beam, which results in two Gaussian-like curves superimposed. This implies that a simple Gaussian model for the partiality is not always correct. When divided into ∊_{0} bins, the observed average intensities correlate well with the estimated partialities (Fig. 6). Application of the partiality model resulted in average *I*/〈*I*〉 values that varied around the ideal value of 1.0. Subsequent merging of these data, *i.e.* with both Lorentz and partiality corrections applied, reduced the *R*_{int} values to 57 and 63% for the data sets with consecutive and all stills, respectively.

Next, the effects of Lorentz and partiality correction were evaluated by comparing the data with the reference rotation data set. The uncorrected and the Lorentz-corrected intensities have high internal *R*_{int} values of 104.9 and 106.5%, respectively, consistent with the scattering in Fig. 5. The Lorentz- and partiality-corrected intensities have an *R*_{int} of 63.8%. Upon merging the data to unique reflections the agreement with the rotation data improved dramatically; the scatter diagrams in Fig. 7(*a*) and 7(*b*) reflect the improvement corresponding to the uncorrected (Monte Carlo) and corrected (Lorentz and partiality) data. The effects from the still data corrections are more clearly demonstrated by the *R* factors with respect to the reference rotation data, which we refer to as *R*_{comp} (Table 3). *R*_{comp} (on intensities) was 26% using Monte Carlo averaging. Application of the Lorentz correction alone decreased the *R*_{comp} to 12%. Application of both Lorentz and partiality corrections yielded an *R*_{comp} of 5.3%.

mmm. ‡From SCALEIT: overall scale. |

Although the Lorentz and partiality corrections significantly improved the quality of the merged data, the merging *R*_{int} value remained high (*i.e.* 63.8% for all still data). To improve the partialities, we performed post-refinement of the image scale factor, unit-cell parameters and orientations, minimizing the target function of (11). Post-refinement of the `all stills' data gave scale factors of 0.84–1.35 (additional to the scale factor *s*_{f} used in equation 10) and sharpened the distribution of unit-cell dimensions, with virtually no effect on the variation of crystal orientations (Table 1). These adjustments resulted in a significant, but modest, reduction of *R*_{int} from 63.8 to 55.7% (Table 2). The progress in the precision of processing the data is reflected by the distributions *I*(*hkl*)/〈*I*(*hkl*)〉 shown in Fig. 8. Ideally, *I*(*hkl*)/〈*I*(*hkl*)〉 values form a sharp distribution around 1 (as a reference, we depict the distribution resulting from the rotation data in Fig. 8*e*). Figs. 8(*b*) and 8(*c*) reflect the striking improvement obtained by modelling the partiality in *EVAL* and subsequent post-refinement. Fig. 8(*d*) shows that mainly the weak data do not profit from the post-refinement. Comparison of the merged data sets shows that the improvement in precision is matched by an improvement in accuracy. Post-refinement reduced the *R*_{comp} from 5.3 to 4.7% (Table 3).

To illustrate the data quality, we refined the lysozyme 193l (Vaney *et al.*, 1996) against the reflection data using *REFMAC*, and we observed similar *R*_{work} and *R*_{free} values for the differently processed data (Table 4). Significant differences between the methods were observed for the resulting average isotropic *B* factors. Monte Carlo averaging of the data in *CrystFEL* and *EVAL* yielded increased *B* factors (21–25 Å^{2}) compared with the reference defined by the rotation data set (〈*B*〉 = 13.8 Å^{2}). The Lorentz correction had a large effect on the *B* factors and produced an average *B* factor of 11.8 Å^{2}; this large effect on the *B* factors is explained by a comparable fall-off in θ of the Lorentz factor and the temperature factor. When the Lorentz and partiality corrections were both applied, the *B* factors became more similar to those obtained when using the rotation data (13.2 *versus* 13.8 Å^{2}). Anomalous differences are much more sensitive to the accuracy of the data than structure We generated anomalous difference densities based on the processed data sets using phases from the refined structure by *ANODE*. For the methionine sulfur positions the anomalous density from the rotation data gave a peak height of 13.3σ. The uncorrected, Monte Carlo averaged still data yielded a weak anomalous signal: a 4.2σ peak for methionine S, corresponding to 32% of the peak height using the rotation data. Lorentz correction improved the methionine S signal to 35%, whereas including partiality corrections resulted in 47% of the signal. Finally, this signal improved to 54% after post-refinement. This shows that both Lorentz and partiality correction improved the intensities deduced from the still data.

B factors, input structure PDB entry 193l . ‡ ANODE with data merged in 422. Averaged densities over similar atom types (two for Met SD, eight for Cys SG, 14 for Cl^{−} and one for Na^{+}). §Selected by SHELXC. |

We tested the effect of data-set size by limiting the still data to 60 images (Table 5). For the reduced `consecutive still' data we used images 250–310. For the `random still' data 60 images from three different wedges were used. For these limited data sets (91.7 and 97.3% completeness, respectively), the *R*_{free} factors show that the structure quality deteriorated. Furthermore, the anomalous signal is largely lost. For both structure and anomalous density analyses the Lorentz and partiality-corrected data outperform the noncorrected Monte Carlo processed data.

B factors, input structure PDB entry 193l . ‡ ANODE with data merged in 422. Averaged densities over similar atom types (two for Met SD, eight for Cys SG, 14 for Cl^{−} and one for Na^{+}). §Selected by SHELXC. |

### 5. Discussion and conclusions

We used our ray-tracing profile-prediction methods to model partialities of the observed reflections in still diffraction data and adapted the programs *PEAKREF* and *EVAL* to process still diffraction images. By taking experimental conditions into account, we compute 10 000 rays generated from focus, crystal grid points, wavelength spectrum and mosaic distributions, and calculate the interference-function weighted contribution to an observed reflection and hence derive its partiality. Our formalism implicitly models for the Lorentz factor, mimicking the contribution of the Lorentz factor to the observed intensities. Our approach differs fundamentally from other still data-processing methods. Kabsch (2014) defined an analytical erf function for the partiality, which is the integral over a Gaussian mosaic function. It is equivalent to our integral in (3) for an infinitely sharp sinc function (implying that integration over this function is complete within a solid angle smaller than the pixel size of the detector), while ignoring broadening effects other than the mosaic spread. Kabsch explicitly corrects for the still Lorentz factor. White (2014) and Sauter (2015) use reciprocal-lattice point volumes for calculating the partiality. White (2014) accounts for spectral width and beam divergence by calculating the overlap of a reciprocal-lattice volume with a nest of Ewald spheres. Sauter (2015) and Uervirojnangkoorn *et al.* (2015) use a single and calculate the intersection with a spherical reciprocal-lattice volume, the size of which is determined by beam divergence, mosaic spread and spectral dispersion. Both approaches account for increase of reciprocal diffracting volume with resolution, and in this way for the wider range of acceptable off-Bragg angles (d∊; see Appendix *A*). However, both approaches lack the reflectivity part of the Lorentz factor (dΩ; see Appendix *A*). If the spectral width of the beam becomes large, an additional Lorentz factor needs to be accounted for, as used in the Laue method (Zachariasen, 1945). Uervirojnangkoorn *et al.* (2015) very recently presented their results on XFEL data. They showed that the *R*_{work} and *R*_{free} of refined structures improved and part of the anomalous signal was retrieved. Unfortunately, they do not provide merging *R*_{int} or a comparison to a rotation data set, *i.e.* *R*_{comp}, to evaluate the resulting quality of the data more directly. In our approach, the integration of (3) is achieved by simulation of the rays that contribute to an observed reflection spot. Because of the simulation, the derivation of analytical functions for the various effects is not needed and the Lorentz effect is implicitly taken into account. Moreover, the interference function can be taken into account in our approach.

For the initial development of the method, we used an experimental setup that allowed a direct comparison to the conventional rotation method. Our analysis showed a dramatic improvement in data quality after partiality and Lorentz correction. Both data processing and structure *R*_{comp} factor between the intensities derived from rotation and still data from 26% to a final value of 4.7% after Lorentz and partiality correction and post-refinement.

Concurrent with the improvement in the final data quality upon Lorentz and partiality correction in *EVAL*, the internal merging *R*_{int} decreased from 105 to 64%. When we were developing the method, we hoped that post-refinement of the parameters would improve the final unique intensity data as well as further reduce the internal merging *R*_{int} factor. Post-refinement improved the precision of the modelled unit-cell dimensions and scale factor per image, although the error in modelled crystal orientations remained ∼0.1°. These more precise parameters indeed improved the resulting intensities (*R*_{comp} decreased from 5.3 to 4.7%). The internal statistics improved as well (*R*_{int} decreased from 64 to 56%); however, the final *R*_{int} factor remained high. This high *R*_{int} could be owing to features that were not included in our ray-tracing model, such as possible asymmetry in the focus, (anisotropic) mosaic spread or crystal form, or absorption by the crystal. Notably, crystal absorption may have a significant effect on the presented data because a relatively large crystal was used in this experiment. Crystal absorption is likely to be negligible when data are collected from microcrystals or nanocrystals, as is the case in serial crystallography. Obviously, further development of our approach is needed to account for the experimental conditions of serial (femtosecond) crystallography using XFEL or synchrotron sources. Automated schemes will be needed to model, for example, the large number of single-crystal diffraction images and fluctuations in beam spectra. In general, comprehensive modelling of the relevant experimental conditions should improve both the internal merging statistics and the resulting intensities. Not modelling significant effects that are present in the data can only be overcome by collecting more data to allow the averaging out of these effects by the Monte Carlo approach. In a real-case scenario the rotation data will not be available to evaluate the data quality, and an *R*_{int} of ∼50% may possibly be a practical metric to judge the resulting data quality.

Overall, we have shown that ray tracing can produce reliable partialities that improve the resulting data quality originating from still diffraction images. Moreover, our method is versatile and allows the modelling of a wide variety of effects, including those that yield non-Gaussian, asymmetric effects on the diffraction spot. In particular, the approach can take the interference function into account, which will be critical for processing data obtained from nanocrystals. Thus, in this paper we have presented the theoretical framework and demonstrated the potential of the ray-tracing methodology for processing still diffraction data.

The rotation and still diffraction images are available at http://rawdata.chem.uu.nl/c003 .

### APPENDIX A

### Comparison with Laue interference function

The diffracted intensity reflected by a small crystal bathed in an incident monochromatic beam is proportional to the shape transform of the crystal. The reflected intensity received by the detector in a small cone of solid angle dΩ, while the reciprocal-lattice vector has a small deviation ∊ of the θ, is given by the Laue interference function (Laue, 1936; James, 1958) and is used in papers by Kirian *et al.* (2010) and White *et al.* (2012),

*N*_{1}, *N*_{2} and *N*_{3} are the number of unit cells in the three dimensions of the parallepiped crystal. ξ is the scalar product Δ**k** · **a** and in near-Bragg condition it is equal to *h* + Δ*h*, and likewise for the other directions. As we are only interested in the diffracted intensity close to the Bragg condition, we introduce a local reciprocal axis system and replace ξ by the nonperiodic Δ*h*. The terms in the denominator of (12) can then be written as (πΔ*h*)^{2} because they concern only small numbers,

It is more convenient to choose the reciprocal axes system such that Δ*l* is along the reciprocal-lattice vector and Δ*h* and Δ*k* are parallel to the diffracting Bragg plane *hkl* (Authier, 2001). Such a transformation can be carried out because the normal to the Bragg plane is always a reciprocal-lattice vector. (Note that Δ*h*, Δ*k* and Δ*l* are dimensionless.) The Jacobians of the transformation of the integration variables are dΩ = *V*^{*}_{cell}(*d _{hkl}*/sinθ)λ

^{2}d(Δ

*h*)d(Δ

*k*) and d∊ = λ/(2

*d*cosθ)d(Δ

_{hkl}*l*) (Authier, 2001), leading to

By integration over Δ*h* and Δ*k* the diffracted intensity for a given value of ∊ is obtained,

The equivalence of this expression to that of James (1958) and Buerger (1960), as we use in *EVAL*, is shown by the following. How does Δ*l* depend on a small deviation Δθ = ∊? We write Δ*l* = d(*l*)/dθ = d(*c*/*d*_{hkl})/dθ = d(*cd*^{*}* _{hkl}*)/dθ =

*c*2cosθ/λ.

*cN*

_{3}can be written as

*ld*

_{hkl}N_{3}, and further we use the property that the volume of the crystal

*V*

_{crystal}=

*N*

_{1}

*N*

_{2}

*N*

_{3}

*V*

_{cell}and

*B*= 2π

*d*cosθ/λ. Writing (14) in terms of d∊ gives

_{hkl}Using *N*_{1}*N*_{2}/*V*_{cell} = *V*_{crystal}/*V*^{2}_{cell}(1/*N*_{3}) (James, 1958, p. 43) and *d _{hkl}* = λ/2sinθ, we can write

(17) is exactly the equation used in *EVAL* (2), as the number of layers *s* = *N*_{3}*l*, and using *n* = *N*_{1}*N*_{2}*N*_{3}/*V*_{crystal} we can write *V*_{crystal}/*V*^{2}_{cell} = *n*^{2}*V*_{crystal}.

### Acknowledgements

We gratefully acknowledge Thomas White and Henry Chapman (CFEL, Hamburg) for discussions. This work was supported by the Council for Chemical Sciences of the Netherlands Organization for Scientific Research (NWO-CW) grant No. 175.010.2007.013.

### References

Authier, A. (2001). *Dynamical Theory of X-ray Diffraction*. Oxford University Press. Google Scholar

Boutet, S. *et al.* (2012). *Science*, **337**, 362–364. CrossRef CAS PubMed Google Scholar

Buerger, M. J. (1960). *Crystal Structure Analysis*. New York: John Wiley & Sons. Google Scholar

Burmeister, W. P. (2000). *Acta Cryst.* D**56**, 328–341. Web of Science CrossRef CAS IUCr Journals Google Scholar

Chapman, H. N. *et al.* (2011). *Nature (London)*, **470**, 73–77. Web of Science CrossRef CAS PubMed Google Scholar

Dejoie, C., Smeets, S., Baerlocher, C., Tamura, N., Pattison, P., Abela, R. & McCusker, L. B. (2015). *IUCrJ*, **2**, 361–370. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar

Demirci, H. *et al.* (2013). *Acta Cryst.* F**69**, 1066–1009. CrossRef IUCr Journals Google Scholar

Duisenberg, A. J. M. (1992). *J. Appl. Cryst.* **25**, 92–96. CrossRef CAS Web of Science IUCr Journals Google Scholar

Duisenberg, A. J. M., Kroon-Batenburg, L. M. J. & Schreurs, A. M. M. (2003). *J. Appl. Cryst.* **36**, 220–229. Web of Science CrossRef CAS IUCr Journals Google Scholar

Gati, C., Bourenkov, G., Klinge, M., Rehders, D., Stellato, F., Oberthür, D., Yefanov, O., Sommer, B. P., Mogk, S., Duszenko, M., Betzel, C., Schneider, T. R., Chapman, H. N. & Redecke, L. (2014). *IUCrJ*, **1**, 87–94. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar

Hattne, J. *et al.* (2014). *Nature Methods*, **11**, 545–548. Web of Science CrossRef CAS PubMed Google Scholar

Howell, P. L. & Smith, G. D. (1992). *J. Appl. Cryst.* **25**, 81–86. CrossRef Web of Science IUCr Journals Google Scholar

James, R. W. (1958). *The Optical Principles of the Diffraction of X-rays*. London: G. Bell & Sons. Google Scholar

Kabsch, W. (2014). *Acta Cryst.* D**70**, 2204–2216. Web of Science CrossRef IUCr Journals Google Scholar

Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). *Opt. Express*, **18**, 5713–5723. Web of Science CrossRef PubMed Google Scholar

Laue, M. von (1936). *Ann. Phys.* **41**, 971–988. Google Scholar

Leslie, A. G. W. & Powell, H. R. (2007). *Evolving Methods for Macromolecular Crystallography*, edited by R. J. Read & J. L. Sussman, pp. 41–51. Dordrecht: Springer. Google Scholar

Milch, J. R. & Minor, T. C. (1974). *J. Appl. Cryst.* **7**, 502–505. CrossRef IUCr Journals Web of Science Google Scholar

Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). *Acta Cryst.* D**67**, 355–367. Web of Science CrossRef CAS IUCr Journals Google Scholar

Qu, K., Zhou, L. & Dong, Y.-H. (2014). *Acta Cryst.* D**70**, 1202–1211. CrossRef IUCr Journals Google Scholar

Ravelli, R. B. G. & McSweeney, S. M. (2000). *Structure*, **8**, 315–328. Web of Science CrossRef PubMed CAS Google Scholar

Redecke, L. *et al.* (2013). *Science*, **339**, 227–230. Web of Science CrossRef CAS PubMed Google Scholar

Rossmann, M. G. & van Beek, C. G. (1999). *Acta Cryst.* D**55**, 1631–1640. Web of Science CrossRef CAS IUCr Journals Google Scholar

Sauter, N. K. (2015). *J. Synchrotron Rad.* **22**, 239–248. Web of Science CrossRef CAS IUCr Journals Google Scholar

Sauter, N. K., Grosse-Kunstleve, R. W. & Adams, P. D. (2004). *J. Appl. Cryst.* **37**, 399–409. Web of Science CrossRef CAS IUCr Journals Google Scholar

Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). *Acta Cryst.* D**70**, 3299–3309. Web of Science CrossRef IUCr Journals Google Scholar

Sauter, N. K., Hattne, J., Grosse-Kunstleve, R. W. & Echols, N. (2013). *Acta Cryst.* D**69**, 1274–1282. Web of Science CrossRef CAS IUCr Journals Google Scholar

Schreurs, A. M. M. (1998). *VIEW.* Utrecht University, The Netherlands. Google Scholar

Schreurs, A. M. M. (1999). *PEAKREF.* Utrecht University, The Netherlands. Google Scholar

Schreurs, A. M. M. (2007). *ANY.* Utrecht University, The Netherlands. Google Scholar

Schreurs, A. M. M., Xian, X. & Kroon-Batenburg, L. M. J. (2010). *J. Appl. Cryst.* **43**, 70–82. Web of Science CrossRef CAS IUCr Journals Google Scholar

Schutt, N. K. & Winkler, F. K. (1977). *The Rotation Method in Crystallography*, edited by U. W. Arndt & A. J. Wonacott, pp. 173–186. Amsterdam: North Holland. Google Scholar

Spence, J. C. H., Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., White, T., Barty, A., Chapman, H. N., Marchesini, S. & Holton, J. (2011). *Opt. Express*, **19**, 2866–2873. Web of Science CrossRef CAS PubMed Google Scholar

Sutton, K. A., Black, P. J., Mercer, K. R., Garman, E. F., Owen, R. L., Snell, E. H. & Bernhard, W. A. (2013). *Acta Cryst.* D**69**, 2381–2394. Web of Science CrossRef CAS IUCr Journals Google Scholar

Thorn, A. & Sheldrick, G. M. (2011). *J. Appl. Cryst.* **44**, 1285–1287. Web of Science CrossRef CAS IUCr Journals Google Scholar

Uervirojnangkoorn, M., Zeldin, O. B., Lyubimov, A. Y., Hattne, J., Brewster, A. S., Sauter, N. K., Brunger, A. T. & Weis, W. I. (2015). *eLife*, **4**, e05421. CrossRef Google Scholar

Vaney, M. C., Maignan, S., Riès-Kautt, M. & Ducruix, A. (1996). *Acta Cryst.* D**52**, 505–517. CrossRef CAS Web of Science IUCr Journals Google Scholar

Weik, M., Ravelli, R. B. G., Kryger, G., McSweeney, S., Raves, M. L., Harel, M., Gros, P., Silman, I., Kroon, J. & Sussman, J. L. (2000). *Proc. Natl Acad. Sci. USA*, **97**, 623–628. Web of Science CrossRef PubMed CAS Google Scholar

White, T. A. (2014). *Philos. Trans. R. Soc. B Biol. Sci.* **369**, 20130330. CrossRef Google Scholar

White, T. A., Barty, A., Stellato, F., Holton, J. M., Kirian, R. A., Zatsepin, N. A. & Chapman, H. N. (2013). *Acta Cryst.* D**69**, 1231–1240. Web of Science CrossRef CAS IUCr Journals Google Scholar

White, T. A., Kirian, R. A., Martin, A. V., Aquila, A., Nass, K., Barty, A. & Chapman, H. N. (2012). *J. Appl. Cryst.* **45**, 335–341. Web of Science CrossRef CAS IUCr Journals Google Scholar

Winn, M. D. *et al.* (2011). *Acta Cryst.* D**67**, 235–242. Web of Science CrossRef CAS IUCr Journals Google Scholar

Zachariasen, W. H. (1945). *Theory of X-ray Diffraction in Crystals*. New York: Dover. Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.