research papers
Processing of X-ray snapshots from crystals in random orientations
aMax-Planck-Institut für medizinische Forschung, Jahnstrasse 29, D-69120 Heidelberg, Germany
*Correspondence e-mail: kabsch@mpimf-heidelberg.mpg.de
A functional expression is introduced that relates scattered X-ray intensities from a still or a rotation snapshot to the corresponding structure-factor amplitudes. The new approach was implemented in the program nXDS for processing monochromatic diffraction images recorded by a multi-segment detector where each exposure could come from a different crystal. For images containing indexable spots, the intensities of the expected reflections and their variances are obtained by profile fitting after mapping the contributing pixel contents to the The varying intensity decline owing to the angular distance of the reflection from the surface of the is estimated using a Gaussian rocking curve. This decline is dubbed `Ewald offset correction', which is well defined even for still images. Together with an image-scaling factor and other corrections, an explicit expression is defined that predicts each recorded intensity from its corresponding structure-factor amplitude. All diffraction parameters, scaling and correction factors are improved by post-refinement. The ambiguous case of a lower than the lattice symmetry is resolved by a method reminiscent of the technique of `selective breeding'. It selects the indexing alternative for each image that yields, on average, the highest correlation with intensities from all other images. Processing a test set of rotation images by XDS and treating the same images by nXDS as snapshots of crystals in random orientations yields data of comparable quality, clearly indicating an anomalous signal from Se atoms.
Keywords: nXDS.
1. Introduction
The availability of free-electron lasers (FELs) as a source of ultrabright X-ray pulses of femtosecond duration has provided a new approach for collecting diffraction data that makes extremely small and radiation-sensitive samples accessible to structural studies. It is expected that this new X-ray source will strongly contribute to our knowledge of the large group of biological objects such as membrane proteins that are difficult to grow as macroscopic crystals.
Destruction of the irradiated object by a single pulse is much slower than the
so that data can be collected at room temperature before the sample deteriorates owing to radiation damage. As a consequence many diffraction snapshots are required, each from a different sample in a random orientation, to yield a complete data set. For crystalline samples diffraction data are partially recorded on still images, which presents a challenging task to processing that traditional crystallographic programs were not designed to handle.This has led to the development of new software suites: (i) CrystFEL (White et al., 2012, 2013), which uses MOSFLM (Leslie & Powell, 2007; Powell et al., 2013) or DirAx (Duisenberg, 1992) for reflection indexing and merges individual intensity measurements (Kirian et al., 2010, 2011) to estimate the integrated intensity of each unique reflection, and (ii) the cctbx.xfel (Sauter et al., 2013; Hattne et al., 2014) and cctbx.spotfinder (Zhang et al., 2006) components within the Computational Crystallography Toolbox (cctbx). These programs rely on the Monte Carlo method for estimating reflection intensities, thereby assuming that other factors such as fluctuations in the incident-beam intensity, wavelength and spectrum as well as the irradiated volume of the specimen `integrate out' upon averaging (White et al., 2012; Kirian et al., 2011). It has recently been demonstrated (Barends et al., 2014) that a de novo can be determined from X-ray free-electron laser data analyzed by CrystFEL. The data comprise about 60 000 indexed images out of 2.4 million snapshots from a lysozyme heavy-atom derivative that gives a strong anomalous signal from two Gd atoms per (Barends et al., 2014).
Another complication that is not encountered in traditional data collection by the rotation method is the ambiguity in the choice of unit-cell basis vectors in cases where the lattice symmetry is higher than the crystal symmetry. The problem arises because the reflections recorded by each snapshot are indexed independently and their partial intensities are too inaccurate for any decision-making based on correlations involving different exposures (White et al., 2012). To circumvent this problem, CrystFEL generates a perfectly twinned data set of higher symmetry by merging all snapshots, which renders the subsequent structure solution more difficult. Recently, new methods for breaking the indexing ambiguity have been described (Brehm & Diederichs, 2014) and the detour via artificially twinned data sets appears to no longer be necessary.
Some of the problems in processing FEL snapshots had already been encountered more than 35 years ago when trying to solve the first virus crystal structures. Often, on account of radiation damage, the crystal had to be replaced after a single exposure covering only a small rotation range. This resulted in many partially recorded reflections and their complete intensities had to be estimated. A solution to this problem, the `post-refinement' method, was developed (Schutt & Winkler, 1977; Rossmann et al., 1979; Harrison et al., 1985; Rossmann, 1985) to derive complete intensities from refined estimates for the fractions of observed intensity, the `partiality'. For rotation images the partiality of each reflection can always be calculated as a function of orientation, the unit-cell metric, the mosaic spread of the crystal and a model of the reflection profile. The idea of the `post-refinement' method is to improve these parameters after all images have been processed by using symmetry-related, fully recorded reflections for reference.
As an alternative to the processing of snapshots by Monte Carlo integration, the software package nXDS has been developed that resumes the old `post-refinement' idea, now enriched by a model for the recorded snapshot intensities as a function of their corresponding structure-factor amplitudes and also including the correct treatment of stills. The unique reflection intensities and diffraction parameters for all crystals are refined simultaneously to minimize the discrepancy between observed and expected spot centroids and recorded intensities. The indexing ambiguity is resolved by a method reminiscent of `selective breeding'. It selects the indexing alternative for each image that yields, on average, the highest correlation with symmetry-related reflection intensities from all other images. The implementation of nXDS is derived from the standard XDS package (Kabsch, 2010a,b) in its present version, which includes the handling of multi-segment detectors.
2. Description of the diffraction experiment
Any right-handed orthonormal laboratory coordinate system {l1, l2, l3} can be chosen with its origin at the intersection point of the beam and the crystal. This reference system serves to specify the following.
|
A right-handed orthonormal segment system {d′1, d′2, d′3} and an origin vector d′0 are specified for each segment with respect to the detector coordinate system. The X-ray-sensitive area consists of identical pixels of sizes QX, QY (mm) along d′1 and d′2, respectively.
Expressing the segment origin d′0 in terms of {d′1, d′2, d′3}, the location x′ of a segment pixel x, y with respect to the detector system is given by
Thus, the laboratory coordinates x of the same pixel x, y are
where
and
The segment plane is found at a distance |F| from the crystal. F has a negative sign if d3 points towards the crystal.
For accurate integration, the spot shape and extent are modelled as Gaussian distributions involving two parameters: the standard deviations of the reflecting range (mosaicity), σM, and the combined effects of beam divergence and mosaicity, σD.
3. Processing steps of nXDS
The program package nXDS was developed for automatic determination of scaled and fully corrected reflection intensities from monochromatic X-ray diffraction images, each from a different crystal in a random orientation. nXDS uses many ideas, routines and the overall structure of the rotation data-processing program XDS (Kabsch, 2010a,b) and includes numerous new routines for the efficient evaluation of a large number of snapshots. The algorithms developed for integration, scaling and post-refinement are described in the following.
The snapshots are processed by nXDS in seven steps, named XYCORR, INIT, COLSPOT, POWDER, IDXREF, INTEGRATE and CORRECT, which are called in succession by nXDS. Information is communicated between the steps by files, which allows the repetition of selected steps with a different set of input parameters without rerunning the whole program.
3.1. XYCORR
If necessary for the detector being used, this step computes lookup tables of spatial corrections for each detector pixel. In subsequent data-processing steps, when the true coordinates of a pixel with respect to the laboratory coordinate system are needed, the correction values for the X and Y coordinates are retrieved from the tables and added to the pixel's array coordinates in the data image. This step is essentially the same as that used by XDS.
3.2. INIT
Three lookup tables are determined here that are required by the subsequent processing steps for classifying pixels in the data images as untrusted, background or belonging to a diffraction spot. This step is virtually the same as that used by XDS.
3.3. COLSPOT
Strong diffraction spots occurring in the data images are located and their centroids are saved in a file. Compared with the corresponding step in XDS a simplification resulted for snapshots from the fact that there are no neighbouring images with contributions to the same spot.
3.4. POWDER
The origin of the detector system can often be found from a powder pattern generated from the spots located in the previous step. An incorrectly specified origin could lead to a misindexing of reflections, a risk that is particularly high for processing snapshots. As the use of multi-segment detectors can prevent direct recognition of the powder circles, the following method was devised.
A plane with the incident-beam vector S0 as its normal is constructed from a second vector t = (1, 1, 1). To assure that t is non-collinear with S0, one of its components is reset to 0, namely the component corresponding to the maximum absolute value of the components of S0. If this is not unique, the first occurrence of the maximum is taken. For example, t = (1, 0, 1) if the y coordinate is the first occurrence of the absolutely largest component of S0. A right-handed orthonormal system {t1, t2, t3} can then be defined by
Now, for a spot located at pixel coordinates x, y in the COLSPOT step a scattering vector S can be calculated,
which intersects the powder plane at unit distance at coordinates
In the ideal case the scattering vectors mark a set of concentric rings: the powder pattern. The centre of the rings should be at the tip of the incident beam, which is the image centre. Often the centre of the powder rings is instead found to be offset from its ideal position, which can be interpreted to result from an incorrect value for the origin of the detector coordinate system D0.
3.5. IDXREF
For each snapshot this step tries to find a
that explains the observed diffraction spots by assigning indices to them and refines all parameters. The step returns a list of the successfully processed images and the indexed spots as well as the corresponding diffraction parameters. These parameters are refined for accurate prediction and integration of all reflections that are expected to occur in each snapshot.Short real-space lattice vectors are determined by the method of Steller et al. (1997). Extraction of a determination and indexing of the observed spots is performed as described by Kabsch (1993). The computational methods were taken from XDS, with the exception of the procedure, which had to be adapted for handling stills as well, where the concept of a rotation axis has no meaning. This is achieved by using a different target function. Instead of a rotation angle about a fixed camera axis, the smallest angle is used by which a reciprocal-lattice point could reach the by some unrestricted rotation. A closed analytical expression for this angle is derived below and it is subsequently shown how it is used in the procedure.
3.5.1. Rotating a reciprocal-lattice point to the Ewald sphere
We would like to find the smallest rotation that moves a reciprocal-lattice point onto the S0 denote the incident-beam wavevector of length 1/λ (λ is the wavelength) pointing from the X-ray source towards the crystal. Diffraction occurs along the wavevector S when the crystal is rotated so that is changed into p* on the satisfying the Laue equations
Let denote any arbitrary reciprocal-lattice vector andThe distance vector depends on the rotation used and thus is not unique.
Extending earlier work (Kabsch, 1988b), a unique rotation can be found for a given S0, that yields the shortest distance vector , provided < 2|S0| and < .
The unique rotation can be obtained by minimizing the function
where the two constraints on the solution are enforced by the Lagrange multipliers μ1, μ2. At the minimum the gradient of f(Δ) must vanish and the matrix of second derivatives must be positive definite,
The Lagrange multipliers are adjusted so that the constraints are satisfied. At a vanishing gradient of f, the are satisfied if
which implies
Conservation of the length of the reciprocal-lattice point implies that
which leads to a quadratic equation for μ1,
The solution is
From
it can be concluded that the positive sign refers to the minimum of Δ2, while the negative sign corresponds to the maximum distance from the Using the abbreviations
the closest point p* on the reachable by a rotation of is
Obviously, the vectors Δ, S, p* all lie in the plane defined by . We conclude that reaches the at p* by the shortest path if it is rotated about an axis parallel to the plane normal and passing through the origin of The rotation axis is identical to the `β-axis' (Schutt & Winkler, 1977) defined for each reflection to be perpendicular to both the incident-beam and diffracted-beam wavevectors.
In the case of the rotation method the crystal is forced to rotate about a fixed axis m2 which has a component ζ on the `β-axis'. ζ is called the reflecting-range (Kabsch, 2010b) because the angle to rotate a reflection to the about m2 increases (by 1/ζ).
Finally, the angular deviation of the reciprocal-lattice point from the
can be defined as the vector3.5.2. Initial refinement
For each image, let j enumerate the n strong spots with observed centroids Xj′, Yj′, detector segment identifying number sj and reflection indices hj, kj, lj. To each j, we denote the reciprocal-lattice point , from which a diffracted beam wavevector Sj, an angular deviation from the τj and reflecting-range expansion factors ζj can be determined as described in the previous section. With the detector segment recording the strong spot at distance Fsj, orientation and origin X0sj, Y0sj, the residuals between the calculated and observed spot centroids are (segment specified in the laboratory system)
The goal of the
procedure is to find the detector parameters, incident-beam direction and unit-cell basis vectors that minimize the target functionThe positional part of the target function pushes the calculated spot positions towards the observed centroids, while the angular part minimizes the angular deviations from the
The variance for the angular part is expected to increase with the size of the rotation range.The residuals are expanded to first order in the parameter changes so that E becomes a quadratic function of these changes. Minimization then leads to a system of normal equations whose solution is used to update the parameters. During the cell metric obeys constraints imposed by the lattice symmetry by choosing an appropriate set of free parameters for representing the allowed changes of the basis vectors (details not shown).
The minimization procedure is repeated keeping the same set of weights wX, wY, wτ until convergence is reached. New weights are then determined so that
Refinement continues with the new weights until convergence is reached again. The whole wX, wY, wτ.
procedure is terminated upon convergence of the weights3.6. INTEGRATE
Starting with the refined diffraction parameters for the successfully indexed data images, this step determines the recorded reflection intensities by two-dimensional profile fitting and saves the results on file for subsequent processing by the CORRECT step. No correction factors are applied except for compensating missing parts in the reflection profile owing to overlap with bad pixels or closely neighbouring reflections.
Similar to the integration procedure as implemented in XDS (Kabsch, 2010b), analysis of each image consists of determination of the reflection spot size and the mosaicity and of the diffraction parameters using the strong spots. This is followed by determination of a two-dimensional reflection reference profile and integration by profile fitting for all expected reflections, including the weak reflections
3.6.1. Mapping image pixels to the Ewald sphere
As described earlier (Kabsch, 1988b, 2010b) and implemented in the program XDS (Kabsch, 2010a), it is useful to represent the intensity distribution in a reflection-specific coordinate system {e1, e2, e3} so that all reflections appear as if they had followed the shortest path through the and were recorded on the surface of the sphere. This eliminates reflection-specific differences in the intensity profile caused by the oblique incidence of the diffracted beam on a flat detector segment or by crystal rotation around a fixed axis for a rotation data image.
For a reciprocal-lattice point the nearest point S on the is obtained as described above, so that a reflection-specific coordinate system can be defined as
It has its origin at the terminus of the diffracted-beam wavevector S and therefore could move depending on the specific path of through the The unit vectors e1 and e2 are tangential to the while e3 is perpendicular to e1 and p* = S − S0.
Diffraction along wavevector S′ in the neighbourhood of S is recorded at pixel X′, Y′ in the detector plane. This pixel is at a distance D from the crystal. With the detector at a distance F, an orientation d1, d2 and an origin X0, Y0 of the detector plane, we have
The corresponding coordinates ∊1, ∊2, ∊3 in the reflection-specific system on the are then
In the case of a rotation image, a reciprocal-lattice point needs a larger angle ∊3/|e1·m2| to reach the because the movement is restricted by the fixed rotation axis m2. The reflecting-range ζ = e1·m2 corrects for this effect. It is closely related to the reciprocal Lorentz correction factor for rotation images,
The reflection-specific coordinate system defined above sets up a one-to-one correspondence between the points of a region R of the ∊1∊2 plane tangential to the and a region R′ of the X′Y′ plane of the detector segment. The Jacobian of the transformation is
From
one finds
Using
one finds
and the Jacobian reduces to the simple expression
Thus, if the region R′ of the X′Y′ plane of the detector covers a single pixel, the corresponding area R in the ∊1∊2 plane is the mean value of |J| times the pixel area,
3.6.2. Intensity recorded by a detector pixel
Because of crystal mosaicity and beam divergence, the intensity of a reflection is smeared around the diffraction maximum. Following earlier work (Kabsch, 1988b, 2010b), the fraction of the total reflection intensity found in the volume element d∊1d∊2d∊3 at ∊1, ∊2, ∊3 is modelled as the product of two functions:
The first function ω12(∊1, ∊2) describes the expected intensity profile of the reflections mapped to their specific ∊1, ∊2 plane tangential to the Initially, Gaussians are assumed with the parameter σD modelling the combined effects of beam divergence and mosaicity as
σD is estimated from the observed variance in intensity of scattered rays for the strong reflections and provides information on the spot width. In a second step the initial Gaussian form for ω12 is replaced by the superposition of the profiles of all strong reflections and is used for definition of the final integration region included in profile fitting of all reflections predicted to occur in the diffraction image.
The second function, the rocking curve ω3(∊3), models the dependency of the reflection intensity on the angular distance from the surface of the as a Gaussian with the mosaicity σM of the crystal as the standard deviation. An estimate of σM is found as described previously (Kabsch, 2010b). If τ = rad is the angular deviation of the reciprocal-lattice point from the the fraction of intensity recorded on the image, i.e. the partiality of the reflection, is
The integration extends over the rotation range Δφ of the spindle during exposure of the image multiplied by the reflecting-range ζ, which corrects for the increased path length of the reflection through the when rotated around a fixed axis, i.e. ∊ = ζ·Δφ/2. Using the dimensionless variables t = τ/21/2σM and z = ∊/21/2σM the partiality of the reflection can also be given in terms of the error function erf:
Together with the above expression for the Jacobian J, the expected fraction of reflection intensity recorded by pixel X′, Y′ in the detector plane is, according to our model,
Here, and denote mean values in the region R of the ∊1∊2 plane that maps to the region R′ of the pixel at X′, Y′ in the plane of the detector.
3.7. CORRECT
After resolving possible indexing ambiguities, the raw intensities of the (reindexed) reflections as obtained from the previous step are scaled and corrected by a modified post-refinement procedure and then saved in the final reflection output file for subsequent structure-solution software packages.
As described below, the new concept of the Ewald offset correction factor as a replacement for partiality links raw intensities to structure-factor amplitudes, rendering classical post-refinement applicable even to still snapshots.
3.7.1. Correction factors
A reflection intensity , as recorded on a snapshot and returned from the INTEGRATE step, is proportional to the `true' intensity I (the squared structure-factor amplitude),
The correction factor C consists of a factor that is not affected by possible indexing ambiguities and a factor T that accounts for different scale and resolution fall-off between the snapshots.
is a product of factors modelling the Ewald offset correction (Q), Lorentz factor (L), polarization (P), air absorption (A) and sensor-thickness correction (O). These corrections are functions only of the diffraction parameters of the snapshot that recorded the reflection.
Ewald offset correction. The recorded intensities of the reflections of the same image would be directly comparable if they simultaneously obeyed the i.e. τ = 0 for all reflections. The idea is to correct the intensity by the estimated decline owing to the angular distance of the reflection from the surface of the Using the dimensionless variables t = τ/21/2σM and z = ∊/21/2σM (∊ = ζ·Δφ/2) this leads to the definition of the Ewald offset correction
which is well defined even for still images (z = 0). Q(t, z) can be calculated for each reflection from the incident-beam wavevector So, the reciprocal basis , the mosaicity σM and the oscillation range Δφ.
Lorentz factor. Assuming an infinitely thin the can be satisfied for a reciprocal-lattice point by a variation in the wavelength or the direction of the incident beam relative to the crystal. For an ideal crystal the scattered intensity is sharply concentrated around the direction of the diffraction maximum with a solid angle much smaller than the finite aperture of a detector pixel. Consequently, only integrated intensities can be observed: these are related to their squared structure factors by the Lorentz correction. Explicit forms of the corrections are available (see, for example, Zachariasen, 1945) for all conceivable methods of recording sharp diffraction maxima.
|
Polarization. The intensity of scattering from the crystal is proportional to the polarization factor
where E0 denotes the electrical field vector of the incident beam and S the diffracted beam wavevector. If n denotes the polarization plane normal and p the probability of finding the field vector E0 in this plane,
For an unpolarized incident beam, the electrical field vector E0 is found with equal probability pointing along S0 × S or (S0 × S) × S0, so that
This leads to
The chosen parametrization allows description of the effect of polarization for most data-collection scenarios (Kahn et al., 1982). Values for the two parameters n and p are provided by the user (not refined).
Air absorption. The recorded intensity is reduced from its `true' value owing to air absorption of the diffracted beam by the factor
where μ denotes the fraction of intensity loss per millimetre and D is the distance (in millimetres) between the crystal and the position of the reflection spot on the detector segment. μ is a wavelength-dependent input constant (not refined).
Sensor-thickness correction. The recorded intensity is increased owing to the effect of the oblique incidence of the diffracted beam on a sensor of finite thickness. Let δ denote the thickness of the sensor and κ the fraction of intensity loss per millimetre. If the diffracted beam makes an angle ω with the segment normal, the probability of a photon penetrating the sensor undetected is exp(−κδ/cosω). Thus, the correction
accounts for the oblique incidence of the diffracted beam. Values for δ and κ are provided by the user and are not refined.
Scaling and temperature factor. Using the parameters g and B, the factor
puts the `true' intensity for reciprocal-lattice point on the same scale as the observed one in the data image. The values of g and B adjust differences in beam intensity and in crystal disorder and volume in a resolution-dependent way. This correction factor is obtained by comparison of symmetry-equivalent reflections from different images and therefore depends on a consistent indexing choice. For resolving possible indexing ambiguities, correlations between equivalent reflection pairs from different snapshots must be computed (Brehm & Diederichs, 2014). To manage a potentially large number of snapshots and reflections, an efficient solution was implemented in nXDS.
3.7.2. Efficient calculation of correlation factors
A procedure was developed for calculating correlation factors in which the number of operations is proportional to the total number of recorded reflections of the compared snapshots. The procedure consists of two parts: initialization of the data structures and calculation of the correlation factor between the unique intensity estimates from the compared snapshots.
Initialization.
Correlation.
|
3.7.3. Indexing alternatives
For a given a) is first defined and serves as a reference basis. As each image is indexed and integrated independently, it often happens that the reflection indices refer to different (reduced) cells. The problem is to find for each image the set of possible reindexing transformations that allow the reflections to be described in terms of a rotated version of the reference basis.
and unit-cell parameters, a reciprocal cell basis in some reference orientation (Kabsch, 1988This set of reindexing transformations is determined for each image using the following procedure. From the reciprocal basis vectors used for indexing the reflections of the image in the INTEGRATE step, a and its reciprocal cell are determined. The three reciprocal vectors thus obtained are considered as reflections that need to be indexed with respect to a rotated version of the reciprocal reference basis. Possible indices of the reduced-cell reciprocal vectors with respect to the reference basis are found by simply testing all possibilities involving indices absolutely smaller than 4. For each assignment of indices a residual error for the best superposition with the reference basis is determined (Kabsch, 1976, 1978) and is used as a measure of the quality of the indexing. Index assignments related by symmetry of the reference cell are omitted, so that a list of symmetry-independent interpretations remains. This list is sorted by increasing r.m.s. of the superposition with the reciprocal reference cell. Entries in the list with an r.m.s. larger than some multiple of that of the first item are omitted.
The list may be empty if no reasonable interpretation is found for the basis vectors of the image. In this case the image is omitted from further calculations.
3.7.4. Resolving the indexing ambiguity
Ideally, only one symmetry-independent solution remains, identified using the above procedure solely by geometrical considerations. However, for nXDS the indexing ambiguity can be resolved by using one of two methods. Both methods rely on correlation factors between intensities that have been corrected by for various effects as described above. Note that the scaling corrections T are not needed here.
and pseudo-merohedral crystals, where the lattice symmetry is higher than the symmetry of the more than one choice for the reindexing transformation exists. InComparison with a reference. For each snapshot all possible reindexing transformations are tested and the indexing choice yielding the largest correlation factor with the given reference data set is selected. Here, symmetry-equivalent reflection intensities from the snapshot as well as from the reference data are merged separately prior to calculation of the correlation factor. Moreover, an initial scaling factor is determined at little additional computational effort that puts the intensities from each snapshot on the level of the reference.
Selective breeding. If no external reference data set is available, a solution is found by a method that is reminiscent of the technique of selective breeding. The method initiates a cyclic procedure with some arbitrary indexing transformation from the list of possibilities assumed by each snapshot. For each cycle the following steps are carried out.
|
This procedure usually terminates within ten cycles. The moderate amount of storage needed is proportional to the number of snapshots and to the size of the hash table. In addition one array is needed that keeps the hash-table address of the unique indices for each reflection of the whole set of images. This speeds up the computation of correlation factors between all pairs of snapshots. Within each cycle these computations can be carried out very efficiently and in parallel by a team of processors.
3.7.5. Scaling
We assume that all reflections have been consistently indexed. The reflection intensities of each image are still on different scales owing to differences in the intensity of the incident beam or irradiated crystal volume. Determination of the scale factors by the method described here is based on intensity estimates for the unique reflections occurring on each image,
where ν enumerates the symmetry-equivalent reflections and and the recorded intensities on the same image and their estimated standard deviations, respectively. denotes the correction factors as described above. A scaling correction factor for each image is determined by least-squares minimization using common unique reflections with a positive intensity that occur in more than one image.
Let h enumerate the nh different unique indices of the reflections involved in scaling and l enumerate the nl images from which the reflections come. Let j enumerate the n unique reflections included. To each j, we denote intensity , variance , unique reflection index hj and image lj. The goal of the scaling procedure is to find factors gl > 0 and mean intensities Ih > 0 by minimizing the target function (Hamilton et al., 1965)
To guarantee success of the solution method for a large set of snapshots and possibly large variations in their scaling factors, the target function is modified by using logarithms,
defining , , Gl = lngl and Jh = lnIh. This target function is quadratic, with a constant matrix of second derivatives and positive diagonal elements:
Therefore, unique directional minimizers can be defined by equating the gradients to zero:
The target function is monotonically reduced by a cyclic procedure of alternating minimizations along the J and G directions. The procedure is initiated at Gl = 0, Jh(G = 0). In each following cycle new scaling factors are found as
with the step size c chosen to maximize the reduction of the target function. Expansion of the quadratic target function at the point yields
and the resulting reduction in the target function is
Moving from G to G′ changes the mean logarithmic intensities by
Expansion of Ψ at point G′, J(G′) yields
Using the abbreviations
the reduction in the target function at completion of one cycle is
A finite value for the reduction requires a > b, which leads to an optimal step size c = a/(a − b), so that the target function is reduced by Ψ[G, J(G)] − Ψ[G′, J(G′)] = ac.
For the gradient, we have
Since the target function is non-negative, the procedure generates a bounded monotonically decreasing sequence converging to the minimum of the target function at vanishing gradient. As shown above, this also implies convergence of the sequence of scaling factors.
Obviously, the solution thus obtained is not unique because the target function does not change its value if one adds an arbitrary constant to the logarithmic scaling factor Gl and subtracts an appropriate constant vector from Jh. This amounts to just changing the common scale of all images in the original problem, which is of no importance as we are only interested in the relative scale.
It may happen that several sets of images exist that are not connected by common measurements. In this case the target function could be thought of consisting of a sum of the same type of target functions, one for each unconnected subset of images. Now there is an arbitrary common factor for each subset. Apparently, the presence of arbitrary common factors for each subset of images does not prevent convergence of the cyclic solution procedure described here.
3.7.6. Post-refinement
As mentioned above, the diffraction parameters of each image are refined in the IDXREF and INTEGRATE steps to minimize deviations between the observed and the predicted locations of the strong spots and to minimize their angular distance from the The angular part of the target function takes care of the fact that reflections can be visible only if they are close to the In addition, the distribution of τ angles thus obtained provides an initial guess for the crystal mosaicity σM. However, the angular part of the initial target cannot account for the fact that very strong reflections can still be observed even if they are farther away from the than the weaker reflections. This leads to a systematic bias in the initial parameter so that strong reflections will be predicted to be closer to the than they really are.
These deficiencies can be overcome when all images have been processed and intensity estimates for the recorded reflections are included in the ; Rossmann et al., 1979; Harrison et al., 1985; Rossmann, 1985) is extended here to handle still snapshots as well when fully recorded, measured reflections are not available for comparison and the notion of `partiality' loses its meaning. Its role is assumed by the Ewald offset correction Q defined above that is applicable for rotation images as well as stills. The `post-refinement' variant implemented here considers the possible unique reflection intensities as free parameters that are to be refined along with the diffraction parameters of each snapshot (Bolotovsky et al., 1998).
which explains why this approach has been dubbed `post-refinement'. The original idea (Schutt & Winkler, 1977The goal of the
procedure is the minimization of the target functionHere again, j enumerates the n recorded reflections from all snapshots, ΔXj, ΔXj the residuals between the calculated and observed spot centroids (see §3.5.2) and the recorded raw intensity and its standard deviation. If a spot j is strong enough so that a centroid could be determined then wj = 1, otherwise wj = 0.
Each spot j is associated with a reciprocal-lattice point close to the For each observation a correction factor Cj can be computed from the diffraction parameters that relates the recorded intensity to a unique reference intensity Ihj, where hj denote the unique indices of the reciprocal-lattice points .
The target function E depends on private parameters for each snapshot and global parameters and constants.
|
Each Ih, keeping the current parameter values constant. Minimization of the target function yields
round starts with the determination of the unique reflection intensitiesThe weights are then calculated as
and are kept throughout the wX, wY, wI.
round. The whole procedure is terminated upon convergence of the weightsDuring a E becomes a quadratic function of these changes. Minimization then leads to a system of normal equations whose solution is used for updating the parameters. The gradients of ΔXj, ΔYj and Cj are computed from analytic expressions (not shown).
round the diffraction parameters for each snapshot are corrected iteratively to minimize the target function until convergence is reached. The residuals are expanded to first order in the parameter changes so thatFortunately, these calculations are independent for each snapshot and can be performed in parallel by a team of processors. Moreover, the memory requirements are almost negligible even when the
of detector parameters is included.4. Example of data processing with nXDS
As an example to demonstrate the quality of data processed by nXDS in comparison to conventional data reduction by XDS, 20 000 consecutive rotation images were collected at 100 K from a crystal of a selenomethionine-labelled double mutant of the RNA-processing factor SCAF8 (Becker et al., 2008). Each image covers a rotation range Δφ of 0.02° and is treated by nXDS as a snapshot taken from a randomly oriented crystal. The images were collected on beamline X10SA at the Swiss Light Source, Villigen, Switzerland at a wavelength of 0.9779 Å, slightly above the Se K edge. The images were recorded by a PILATUS 6M pixel detector (Dectris AG, Baden, Switzerland) located at 300 mm distance. The crystal has P43 space-group symmetry, which is lower than the 422 lattice symmetry, implying a twofold indexing ambiguity.
The processing results are summarized in Table 1. The upper part refers to the evaluation of the images by XDS as conventional rotation data. The lower part shows the corresponding quantities as obtained from nXDS. Here, reflections were only included if their Ewald offset correction was larger than 0.7. A total of 356 854 reflections in the resolution range 15–2 Å were integrated by XDS, so that for each unique reflection almost eight symmetry-related reflections are available. Because contributions to each reflection are also recorded by adjacent images, it is not surprising that nXDS found almost ten times more reflections, which is consistent with the correspondingly higher multiplicity of observations.
|
After merging symmetry-related reflection intensities one might expect that the mean signal-to-noise ratio 〈I/σ(I)〉 would come out about the same regardless of whether the images were processed by XDS or nXDS. This is not the case: the mean signal-to-noise ratio is higher by a factor of almost three for the results from XDS. The lower accuracy of nXDS presumably results from two-dimensional instead of three-dimensional profile fitting and the lack of other corrections not carried out yet by this version of nXDS. The nearly perfect correlation factors CC1/2 (Karplus & Diederichs, 2012) between intensities of symmetry-related reflections obtained from processing by both programs reflect the excellent quality of the data images. The presence of anomalous scatterers is clearly indicated in both processing results by the highly significant value for the anomalous correlation CCano. Finally, the reflection intensities obtained from both programs are in excellent agreement, showing a of 98%.
Data processing was carried out by a 12-core machine with 16 GB memory running under Linux. Making use of the hyperthreading capability, up to 24 threads were employed for processing the images stored on a local disk. Elapsed wall-clock times for each step are listed in Table 2. COLSPOT uses a very fast spot-finding procedure but spends most of its time waiting for the next image to arrive. On average only three out of 24 threads were active. COLSPOT is a time-consuming step in nXDS because each image had to be analyzed for diffraction spots since the knowledge that the images comprise a rotation data set was not used. In contrast, XDS only requires spots from a small fraction of the images for recognizing the and for accurate of the diffraction parameters. For the same reason, the IDXREF step in nXDS takes much longer than in XDS. In the INTEGRATE step nXDS is somewhat faster than XDS because of the reduced overhead in control when only single images are involved. Compared with XDS, the CORRECT step of nXDS takes longer because of the much larger number of reflections. Furthermore, additional computations are required by the selective breeding procedure when no reference data set is available. According to Table 2, the breeding procedure required about 14 min to resolve the twofold indexing ambiguities for 20 000 snapshots and a total of 3.4 million reflections.
|
Details of the procedure are shown in Table 3 as the number of misfitting snapshots. In the first two generations both indexing alternatives are nearly randomly distributed. A small fluctuation towards one choice builds up in the third generation and quickly dominates the population. After five generations a homogeneous population is obtained.
|
Phasing of single anomalous diffraction data was performed for the XDS and nXDS processed data sets using the SHELXC/D/E program suite (Sheldrick, 2010). The results are summarized in Table 4. In brief, the marker-atom structure factors were estimated from pre-merged data using SHELXC. Subsequently, the selenium was determined in a search of 100 trials for ten putative sites while applying a high-resolution cutoff at 2.5 Å. For both data sets, SHELXD identified 14 positions, of which seven showed significantly higher occupancy when compared with the less significant positions. This is in good agreement with the expected eight selenium positions (Becker et al., 2008). Phases were further improved by density modification using SHELXE; eight sites were refined to significant occupancy and the phases obtained resulted in excellent electron-density maps with high pseudo-free correlation coefficients CCfree (Table 4) and the correct enantiomorphic setting. Peak heights at the eight heavy-atom sites are highly significant and are well above the largest noise peak. Although phasing for both data sets was unambiguous, the data set processed with nXDS showed slightly lower correlation coefficients, Patterson figures of merit (PATFOM) and heavy-atom peak heights.
|
5. Conclusion
This study describes a new approach for processing a large set of snapshots from randomly oriented crystals that does not rely on the Monte Carlo method of integration. Instead, the concept of the Ewald offset correction factor was devised to overcome difficulties arising from the use of partiality for modelling reflection intensities recorded by snapshots. The new approach has been implemented in the program nXDS that has borrowed many ideas and routines from the rotation data-processing package XDS as well as from the powerful post-refinement technique that has been in widespread use for several decades.
The implemented Ewald offset correction relies on a Gaussian model for the rocking curve, assuming a sufficiently large crystal whose shape transform can be ignored. In this case the exact functional form of the curve is not critical for reflections sufficiently close to the
In the test data set, a reflection was only included if its Ewald correction factor was larger than 0.7.As shown for the test case with fine-sliced rotation images of excellent quality, nXDS delivers results almost approaching those obtained by XDS and is able to retrieve the anomalous signal from a selenomethionine-labelled protein crystal. The source of the lower accuracy of nXDS is not yet clear. It could result from two-dimensional instead of three-dimensional profile fitting and the omission of information from weak contributions to reflections further away from the that are used only by XDS. In fact, a small improvement in overall data quality by 0.9% in 〈I/σ(I)〉 and 2.9% in CCano was observed upon the inclusion of weaker contributions when the minimum required Ewald offset correction was lowered from 0.8 to 0.7.
So far, no `real' FEL data have been processed by nXDS. These data typically vary for each snapshot in wavelength, bandwidth and crystal parameters. Although nXDS allows some of these to change for each snapshot, program modifications are likely to become necessary when the incident-beam bandwidth can no longer be substituted by a mean wavelength or if the shape transform of the crystals cannot be ignored. Presently, nXDS can only accept images that XDS can read. Work is in progress to adapt the package to also handle the detectors used at FEL beamlines and to make nXDS and its documentation available from the internet.
Acknowledgements
I thank Anton Meinhart for providing the crystal and analysis of the anomalous phasing power of the XDS and nXDS processed data, Ilme Schlichting for measuring the test data sets, discussions and support of this work, and Bruce Doak for reading the manuscript. I am grateful to Kay Diederichs and Michael Junk for inspiring communications about resolving the indexing ambiguity and scaling.
References
Barends, T. R. M., Foucar, L., Botha, S., Doak, B., Shoeman, R. L., Nass, K., Koglin, J. E., Williams, G. J., Boutet, S., Messerschmidt, M. & Schlichting, I. (2014). Nature (London), 505, 244–247. Web of Science CrossRef CAS PubMed Google Scholar
Becker, R., Loll, B. & Meinhart, A. (2008). J. Biol. Chem. 283, 22659–22669. Web of Science CrossRef PubMed CAS Google Scholar
Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708–717. Web of Science CrossRef CAS IUCr Journals Google Scholar
Brehm, W. & Diederichs, K. (2014). Acta Cryst. D70, 101–109. Web of Science CrossRef CAS IUCr Journals Google Scholar
Duisenberg, A. J. M. (1992). J. Appl. Cryst. 25, 92–96. CrossRef CAS Web of Science IUCr Journals Google Scholar
Hamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129–130. CrossRef IUCr Journals Web of Science Google Scholar
Harrison, S. C., Winkler, F. K., Schutt, C. E. & Durbin, R. M. (1985). Methods Enzymol. 114, 211–237. CrossRef CAS PubMed Google Scholar
Hattne, J. et al. (2014). Nature Methods, 11, 545–548 Web of Science CrossRef CAS PubMed Google Scholar
Kabsch, W. (1976). Acta Cryst. A32, 922–923. CrossRef IUCr Journals Web of Science Google Scholar
Kabsch, W. (1978). Acta Cryst. A34, 827–828. CrossRef IUCr Journals Web of Science Google Scholar
Kabsch, W. (1988a). J. Appl. Cryst. 21, 67–72. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (1988b). J. Appl. Cryst. 21, 916–924. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (2010a). Acta Cryst. D66, 125–132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kabsch, W. (2010b). Acta Cryst. D66, 133–144. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kahn, R., Fourme, R., Gadet, A., Janin, J., Dumas, C. & André, D. (1982). J. Appl. Cryst. 15, 330–337. CrossRef CAS Web of Science IUCr Journals Google Scholar
Karplus, A. & Diederichs, K. (2012). Science 336, 1030–1033. Web of Science CrossRef CAS PubMed Google Scholar
Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). Opt. Express, 18, 5713–5723. Web of Science CrossRef PubMed Google Scholar
Kirian, R. A., White, T. A., Holton, J. M., Chapman, H. N., Fromme, P., Barty, A., Lomb, L., Aquila, A., Maia, F. R. N. C., Martin, A. V., Fromme, R., Wang, X., Hunter, M. S., Schmidt, K. E. & Spence, J. C. H. (2011). Acta Cryst. A67, 131–140. Web of Science CrossRef CAS IUCr Journals Google Scholar
Leslie, A. G. W. & Powell, H. R. (2007). Evolving Methods for Macromolecular Crystallography, edited by R. J. Read & J. L. Sussman, pp. 41–51. Dordrecht: Springer. Google Scholar
Powell, H. R., Johnson, O. & Leslie, A. G. W. (2013). Acta Cryst. D69, 1195–1203. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rossmann, M. G. (1985). Methods Enzymol. 114, 237–280. CrossRef CAS PubMed Google Scholar
Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570–581. CrossRef CAS IUCr Journals Web of Science Google Scholar
Sauter, N. K., Hattne, J., Grosse-Kunstleve, R. W. & Echols, N. (2013). Acta Cryst. D69, 1274–1282. Web of Science CrossRef CAS IUCr Journals Google Scholar
Schutt, C. & Winkler, F. K. (1977). The Rotation Method in Crystallography, edited by U. W. Arndt & A. J. Wonacott, pp. 173–186. Amsterdam: North-Holland. Google Scholar
Sheldrick, G. M. (2010). Acta Cryst. D66, 479–485. Web of Science CrossRef CAS IUCr Journals Google Scholar
Steller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036–1040. Web of Science CrossRef CAS IUCr Journals Google Scholar
White, T. A., Barty, A., Stellato, F., Holton, J. M., Kirian, R. A., Zatsepin, N. A. & Chapman, H. N. (2013). Acta Cryst. D69, 1231–1240. Web of Science CrossRef CAS IUCr Journals Google Scholar
White, T. A., Kirian, R. A., Martin, A. V., Aquila, A., Nass, K., Barty, A. & Chapman, H. N. (2012). J. Appl. Cryst. 45, 335–341. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wirth, N. (1976). Algorithms + Data Structures = Programs, pp. 264–274. New York: Prentice Hall. Google Scholar
Zachariasen, W. H. (1945). Theory of X-Ray Diffraction in Crystals. New York: Dover. Google Scholar
Zhang, Z., Sauter, N. K., van den Bedem, H., Snell, G. & Deacon, A. M. (2006). J. Appl. Cryst. 39, 112–119. Web of Science CrossRef CAS IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.