research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047

Data processing

aDepartment of Biological Sciences, Purdue University, West Lafayette, Indiana 47907-1392, USA
*Correspondence e-mail: mgr@indiana.bio.purdue.edu

(Received 29 January 1999; accepted 22 June 1999)

X-ray diffraction data processing proceeds through indexing, pre-refinement of camera parameters and crystal orientation, intensity integration, post-refinement and scaling. The DENZO program has set new standards for autoindexing, but no publication has appeared which describes the algorithm. In the development of the new Data Processing Suite (DPS), one of the first aims has been the development of an autoindexing procedure at least as powerful as that used by DENZO. The resultant algorithm will be described. Another major problem which has arisen in recent years is scaling and post-refinement of data from different images when there are few, if any, full reflections. This occurs when the mosaic spread approaches or exceeds the angle of oscillation, as is usually the case for frozen crystals. A procedure which is able to obtain satisfactory results for such a situation will be described.

1. Introduction

Intensity data estimation has been an integral part of structural biology since Bragg used an ionization chamber technique to determine the energy of diffracted reflections from simple salts. Two alternative types of detector have been used: point detectors, which measure the energy of a single reflection, and area detectors [e.g. films, imaging plates, wire detectors, charge-coupled devices (CCDs)], which collect numerous reflections on the same two-dimensional device. The latter gave rise to the early rotation and oscillation photography and, subsequently, to the analogue Weissenberg and precession cameras. However, these cameras required the screening out of most of the diffracted rays in order to concentrate on the recording of only a single reciprocal-lattice plane. Xuong et al. (1968[Xuong, N., Kraut, J., Seely, O., Freer, S. T. & Wright, C. S. (1968). Acta Cryst. B24, 289-290.]) and Arndt et al. (1973[Arndt, U. W., Champness, J. N., Phizackerley, R. P. & Wonacott, A. J. (1973). J. Appl. Cryst. 6, 457-463.]) pointed out that mostly non-overlapping reflections can be selected by removing the screen but reducing the oscillation or precession angles. Fortunately, two-dimensional film-scanning devices became available at about that time, which allowed for both accurate positional determination as well as intensity determination of reflections.

Subsequent to the publication of The Rotation Method in Crystallography (Arndt & Wonacott, 1977[Arndt, U. W. & Wonacott, A. J. (1977). The Rotation Method in Crystallography. Amsterdam: North-Holland.]), the oscillation technique became the method of choice for intensity estimation of diffraction patterns from crystals of biological macromolecules. During the first decade or so of oscillation photography, it was the practice to carefully `set' a crystal with its axes oriented in known directions relative to the camera axes. The `American method' (shoot first, think later) was introduced by Rossmann & Erickson (1983[Rossmann, M. G. & Erickson, J. W. (1983). J. Appl. Cryst. 16, 629-636.]) to avoid radiation damage during the tedious crystal-setting operation and to enhance the rate of data collection while using precious synchrotron time. However, the American method required that a good indexing system was available for determining the crystal setting.

Various methods of determining X-ray intensities were described in the Arndt and Wonacott book (Arndt & Wonacott, 1977[Arndt, U. W. & Wonacott, A. J. (1977). The Rotation Method in Crystallography. Amsterdam: North-Holland.]). We developed the Purdue system (Rossmann, 1979[Rossmann, M. G. (1979). J. Appl. Cryst. 12, 225-238.]; Rossmann et al., 1979[Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570-581.]) on which the popular DENZO or HKL system was originally based (Otwinowski & Minor, 1997[Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307-326.]). As the precise algorithms used by HKL are not available, we initiated a project to update our old (1979) procedures. We are developing the Data Processing Suite (DPS) available at, for instance, the Cornell High Energy Synchrotron Source (CHESS), as well as other synchrotron beam-lines (see also http://bilbo.bio.purdue.edu/~viruswww/Rossmann_home/rstest.html ). This has been performed in collaboration with MacCHESS (Steve Ealick, Dan Thiel, Marian Szebenyi) and Chris Nielson of Area Detector Systems Corp.

Modern data processing can be divided into a series of steps.

  • (i) Autoindexing. This requires a peak-picking procedure (c.f. Kim, 1989[Kim, S. (1989). J. Appl. Cryst. 22, 53-60.]), followed by an analysis of the position of the peaks to determine unit-cell dimensions, Bravais lattice and crystal orientation.

  • (ii) Pre-refinement of the camera parameters (crystal-to-detector distance, scanning direction relative to oscillation direction, detector tilt away from being normal to the X-ray beam), crystal orientation and effective mosaic spread (actual mosaic spread convoluted with beam divergence).

  • (iii) Intensity integration by profile fitting, assuming reflection position as calculated from the pre-refined camera and crystal parameters. [Error estimates can be made for each reflection; overlap and overloaded (non-linear response of detector) corrections can be applied; partiality of reflections can be computed.]

  • (iv) Lorentz and polarization corrections, followed by reduction to a unique asymmetric unit in reciprocal space (this is a Laue-group-dependent step). The reflections then need to be sorted on the basis of their indices reduced to a selected asymmetric unit in reciprocal space. This permits ready comparison of symmetry-related reflections which will be adjacent in the reflection list.

  • (v) Scaling of images onto a common scale.

  • (vi) Display of input and output on a graphical user ­interface.

Although unpublished, DENZO has an exceptionally good autoindexing procedure. It was clear that DPS would require an algorithm (cf. Steller et al., 1997[Steller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036-1040.]) at least as good as DENZO if it were to become useful. As older scaling procedures depended primarily on the matching of whole reflections, we recognized that with the advent of frozen crystals and the correspondingly larger mosaic spreads, new methods of scaling and post-refinement were also required (Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.]). We describe here our DPS procedures, which use our autoindexing algorithm, MOSFLM (Leslie, 1992[Leslie, A. G. W. (1992). Crystallographic Computing 5. From Chemistry to Biology, edited by D. Moras, A. D. Pojarny & J. C. Thierry. Oxford University Press.]) for integration and have the option of using SCALA or our SNP scaling procedure.

2. Autoindexing – introduction

A variety of techniques was suggested to determine the crystal orientation, some of which required initial knowledge of the unit-cell dimensions (Vriend & Rossmann, 1987[Vriend, G. & Rossmann, M. G. (1987). J. Appl. Cryst. 20, 338-343.]; Kabsch, 1988[Kabsch, W. (1988). J. Appl. Cryst. 21, 67-71.]), while more advanced techniques (Kim, 1989[Kim, S. (1989). J. Appl. Cryst. 22, 53-60.]; Higashi, 1990[Higashi, T. (1990). J. Appl. Cryst. 23, 253-257.]; Kabsch, 1993[Kabsch, W. (1993). J. Appl. Cryst. 26, 795-800.]) determined both unit-cell dimensions and crystal orientation. All these methods start with the determination of the reciprocal-lattice vectors, assuming that the oscillation photographs are `stills'. The methods of Higashi and of Kabsch, as well as, in part, Kim, analyze the distribution of the difference vectors generated from the reciprocal-lattice vectors. The most frequent difference vectors are taken as the basis vectors defining the reciprocal-lattice unit cell and its orientation. In addition, Kim's technique requires the input of the orientation of a likely zone-axis direction onto which the reciprocal-lattice vectors are then projected. The projections will have a periodicity distribution consistent with the recip­rocal-lattice planes perpendicular to the zone axis. Duisenberg (1992[Duisenberg, A. J. M. (1992). J. Appl. Cryst. 25, 92-96.]) used a similar approach for single-point detector data, although he did not rely on prior knowledge of the zone-axis direction.

A major advance was made in the program DENZO, a part of the HKL package (Otwinowski & Minor, 1997[Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307-326.]), which not only has a robust indexing procedure but also has a useful graphical interface. The indexing technique used in the procedure has not been described, except for a few hints in the manual on the use of a fast Fourier transform (FFT). Indeed, Bricogne (1986[Bricogne, G. (1986). Editor. Proceedings of the EEC Cooperative Workshop on Position-Sensitive Detector Software (Phase III), pp. 28. Paris: LURE. ]) suggested that a three-dimensional Fourier transformation might be a powerful indexing tool. However, for large unit cells, this procedure requires an excessive amount of memory and time (Campbell, 1997[Campbell, J. W. (1997). CCP4 Newslett. 33, 5-16.]).

3. The crystal orientation matrix

The position x (x,y,z) of a reciprocal-lattice point can be given as

[{\bf x} = [\Phi][A] {\bf h}. \eqno (1)]

The matrix [Φ] is a rotation matrix around the camera's spindle axis for a rotation of φ. The vector h represents the Miller indices (hkl) and [A] defines the reciprocal unit-cell dimensions and the orientation of the crystal lattice with respect to the camera axes when φ = 0. Thus,

[[A] = \left ( \matrix {a^{*}_{x} & b^{*}_{x} & c^{*}_{x} \cr a^{*}_{y} & b^{*}_{y} & c^{*}_{y} \cr a^{*}_{z} & b^{*}_{z} & c^{*}_{z}} \right ), \eqno (2)]

where a*x, a*y and a*z are the components of the crystal a* axis with respect to the orthogonal camera axes. When an oscillation image is recorded, the position of a reciprocal-lattice point is moved from x1 to x2, corresponding to a rotation of the crystal from φ1 to φ2. The recorded position of the reflection on the detector corresponds to the point x when it is on the Ewald sphere somewhere between x1 and x2. The actual value of φ at which this crossing occurs cannot be retrieved directly from the oscillation image. We shall, therefore, assume here, as is the case in all other procedures, that [Φ][A] defines the crystal orientation in the center of the oscillation range. Defining the camera axes as in Rossmann (1979[Rossmann, M. G. (1979). J. Appl. Cryst. 12, 225-238.]), it is easy to show that a reflection recorded at the position (X,Y) on a flat detector normal to the X-ray beam at a distance D from the crystal corresponds to

[\eqalign {x &= X/[\lambda(X^{2} + Y^{2} + D^{2})^{1/2}], \cr y &= Y/[\lambda(X^{2} + Y^{2} + D^{2})^{1/2}], \cr z &= D/[\lambda(X^{2} + Y^{2} + D^{2})^{1/2}], \eqno(3)}]

where λ is the X-ray wavelength.

If an approximate [A] matrix is available, the Miller indices of an observed peak at (X,Y) can be roughly determined using (3) and (1), where

[{\bf h} = [A]^{-1} [\Phi]^{-1} {\bf x}, \eqno (4)]

with the error being dependent on the width of the oscillation range, the error in the detector parameters and errors in determining the coordinates of the centers of the recorded reflections.

4. Fourier analysis of the reciprocal-lattice vector distribution when projected onto a chosen direction

If the members of a set of reciprocal-lattice planes perpen­dicular to a chosen direction t are well separated, then the projections of the reciprocal-lattice vectors onto t will have an easily recognizable periodic distribution. Unlike the procedure of Kim (1989[Kim, S. (1989). J. Appl. Cryst. 22, 53-60.]), which requires the input of a likely zone-axis direction, the present procedure tests all possible directions and analyzes the frequency distribution f(j) of the projected reciprocal-lattice vectors in each case. Also, unlike the procedure of Kim, the periodicity is determined using a one-dimensional FFT (Fig. 1[link]).

[Figure 1]
Figure 1
Let a line t in reciprocal space be divided into discrete intervals Δt at a distance t from the origin. Then, let the frequency of reflections at (x, y, z) projected onto this line be given by the function f(t) in an interval of Δt. The one-dimensional Fourier transform of this line will be given by F(k) = [\textstyle \sum_{t=0}^{t=m\Delta t} f(t) \exp (2 \pi i k t)], where m is an integer. In the figure, the largest Fourier coefficient other than F(0) corresponds to k = 27 and measures the distance between reciprocal-lattice planes perpendicular to the line of projection. (Reprinted with permission from Steller et al., 1997[Steller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036-1040.].)

5. Exploring all possible directions to find a good set of basis vectors

The polar coordinates ψ,φ define the direction t, where ψ defines the angle between the X-ray beam and the chosen direction t. The Fourier analysis is performed for each direction t in the range 0 < ψ  ≤  π/2, 0 < φ ≤ 2π. A suitable angular increment in ψ was determined empirically to be about 0.03 rad (1.7°). For each value of ψ, the increment in φ is taken to be the closest integral value to (2πsinψ)/0.03. This procedure results in ∼7300 separate roughly equally spaced directions. For each direction t, the distribution of the corresponding F(k) coefficients is surveyed to locate the largest local maximum. The ψ and φ values associated with the 30 largest maxima are selected for refinement by a local search procedure to obtain an accuracy of 10−4 rad (∼0.006°). Directions are chosen from these vectors to give a linearly independent set of three basis vectors of a primitive real-space unit cell. These are then converted to the basis vectors of the reciprocal cell. The components of the three reciprocal-cell axes along the three camera axes are the nine components of the crystal orientation matrix [A] (2)[link]. The resultant unit cell is then reduced and analyzed in terms of the 44 lattice types (Burzlaff et al., 1992[Burzlaff, H., Zimmermann, H. & de Wolff, P. M. (1992). International Tables for Crystallography, Vol. A, edited by T. Hahn, pp. 738-749. Dordrecht: Kluwer Academic Publishers.]).

6. The effects of errors on indexing

The components of the basis vectors parallel to the X-ray beam are necessarily rather inaccurate when applying any autoindexing procedure. This is because the usual flat detector records data only in a forward direction and because the normal oscillation angle is small, resulting in a lack of information about the extent of the reciprocal lattice along the X-­ray beam. Thus, it would be an advantage to combine images of one crystal taken at different rotation angles or, best, separated by a 90° rotation. In principle, this is not difficult, as the vectors x from different orientations of the crystal can be combined with different oscillation angles δφ using (1)[link]. However, in practice, the errors in the values of camera parameters used for calculating the positions x and the assumption that the crystal is stationary for any given image introduce errors into the calculation of the position x for widely separated images.

An attempt was made to combine the reciprocal-lattice vectors derived from three separate images, taken at φ  =  0, 14.8 and 37.8°, recorded on a CCD detector using a frozen human rhinovirus 16 (HRV16) crystal at beamline SBC-19ID at the Advanced Photon Source (Argonne National Laboratory, Chicago). Each image was indexed successfully when analyzed by itself. However, on combining the information from the three images, the FFT systematically determined an [A] matrix for one of the images which contained about 30% more useful reflections than the other two images. This showed that the FFT found the dominant periodicity and that the positions of the reciprocal-lattice points for the other images did not mesh precisely with those of the dominant image on account of inaccurate camera parameters. Although unsuccessful for the purpose initially proposed, this result is particularly interesting as it shows that split crystals containing a dominant fragment would be readily indexable with the autoindexing procedure described here. Omission of the indexed reflections would then allow indexing of the minor component of the crystal.

7. Scaling and post-refinement – introduction

A least-squares procedure frequently used for scaling frames of data which contain a substantial number of `full' reflections is the Hamilton, Rollett and Sparks (HRS) method (Hamilton et al., 1965[Hamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129-130.]). The target for this least-squares minimization is

[\Psi = \textstyle \sum \limits_h \sum \limits_i W_{h_{i}}(I_{h_{i}} - G_{m} I_{h})^{2}, \eqno (5)]

where Ih is the best least-squares estimate of the intensity of a reflection with reduced Miller indices h, Ihi is the intensity of the ith measurement of reflection h, Whi is a weight for reflection hi and Gm is the inverse linear scale factor for the frame m on which reflection hi is recorded. The HRS expression (5)[link] assumes that all reflections hi are full; that is, their reciprocal-lattice points have completely passed through the Ewald sphere.

For all h, the values of Ih must correspond to a minimum in Ψ. Thus,

[(\partial \Psi / \partial I_{h}) = 0. \eqno (6)]

Therefore, Ih is given by

[I_{h} = \left ( \textstyle \sum \limits_{i} W_{h_{i}} G_{m} I_{h_{i}} \right ) \bigg / \left (\textstyle \sum \limits_{i} W_{h_{i}} G^{2}_{m} \right ). \eqno (7)]

Since Ψ is not linear with respect to the scale factors Gm, the values of the scale factors have to be determined by an iterative non-linear least-squares procedure. As the scale factors are relative to each other, the HRS procedure requires that one of them is arbitrarily fixed. If there are frames which have too few or no common reflections with any other frames, the normal equations matrix will be singular.

An improved method of solving the HRS normal least-squares equations is described by Fox & Holmes (1966[Fox, G. C. & Holmes, K. C. (1966). Acta Cryst. 20, 886-891.]). Their approach is based on the singular-value decomposition of the normal equations matrix. Apart from an accelerated convergence of the least-squares procedure, the advantage of the Fox and Holmes method is that no ad hoc decision needs to be made as to which scale factor should be fixed. Furthermore, `troublesome' frames of data can be identified as causing negligibly small eigenvalues in the normal equations matrix.

8. Generalization of the Hamilton, Rollett and Sparks equations to take into account partial reflections

In general, a Bragg reflection will occur on a number of consecutive frames as a series of partial reflections, and the full intensity can only be estimated from the measured intensities of the partial reflections. Let Ihim represent the intensity contribution of reflection hi recorded on frame m. If all the parts of reflection hi are available in the data set, then

[I_{h_{i}} = \textstyle \sum \limits_{m} (I_{h_{im}} / G_{m}). \eqno (8)]

In practice, there will always occur reflections which do not have all their parts available. In such cases, the only way to estimate the full intensity of a reflection is to apply an estimated value of partiality to the measured intensities of available partial reflections.

Various models have been proposed in the literature to calculate the reflection partiality. In this study, we use Rossmann's model (Rossmann, 1979[Rossmann, M. G. (1979). J. Appl. Cryst. 12, 225-238.]; Rossmann et al., 1979[Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570-581.]) with Greenhough and Helliwell's correction (Greenhough & Helliwell, 1982[Greenhough, T. J. & Helliwell, J. R. (1982). J. Appl. Cryst. 15, 338-351.]). This model treats partiality as a fraction of a spherical volume swept through a nest of Ewald spheres. The coordinates of the spherical volume are defined by the Miller indices of the reflection, crystal orientation matrix and rotation angle. The divergence of the Ewald spheres accounts for the crystal mosaicity. Alternative geometrical descriptions of the reciprocal-lattice point passing through the nest of Ewald spheres have been given by Winkler et al.(1979[Winkler, F. K., Schutt, C. E. & Harrison, S. C. (1979). Acta Cryst. A35, 901-911.]), Greenhough & Helliwell (1982[Greenhough, T. J. & Helliwell, J. R. (1982). J. Appl. Cryst. 15, 338-351.]) and Bolotovsky & Coppens (1997[Bolotovsky, R. & Coppens, P. (1997). J. Appl. Cryst. 30, 65-70.]).

Provided that the reflection partiality phim is known, the full intensity is estimated by

[I_{h_{i}} = I_{h_{im}}/p_{h_{im}}G_{m}. \eqno (9)]

(9)[link] can produce as many estimates of Ihi as there are parts of reflection hi, while (8)[link] produces only one estimate of Ihi from all parts of reflection hi. Having defined the relationships between measured intensities of partial reflections and estimated full intensities by expressions (8)[link] and (9)[link], two methods of generalizing the HRS equations can be considered.

8.1. Method 1

If a reflection hi occurs on a number of consecutive frames and all intensity parts Ihim are available in the data set, the generalized HRS target equation takes the form

[\Psi = \textstyle \sum \limits_{h} \sum \limits_{i} \sum \limits_{m} W_{h_{im}} \left \{I_{h_{im}} - G_{m} \left [I_{h} - \sum \limits_{m' \neq m} (I_{h_{im'}}/G_{m'} )\right ] \right \}^{2}. \eqno (10)]

Using (6), the best least-squares estimate of Ih will be

[I_{h} = {{\textstyle \sum \limits_{i} I_{h_{i}} \sum \limits_{m} W_{h_{im}} G^{2}_{m}} \over {\textstyle \sum \limits_{i} \sum \limits_{m} W_{h_{im}} G^{2}_{m}}}. \eqno (11)]

8.2. Method 2

If the theoretical partiality phim of partial reflections him can be estimated, the generalized HRS target equation takes the form

[\Psi = \textstyle \sum \limits_{h} \sum \limits_{i} \sum \limits_{m} W_{h_{im}} (I_{h_{im}} - G_{m} p_{h_{im}} I_{h})^{2} \eqno (12)]

and, using (6)[link], the best least-squares estimate of Ih becomes

[ I_{h} = {{\textstyle \sum \limits_{i} \sum \limits_{m} W_{h_{im}} G_{m} p_{h_{im}}I_{h_{im}}} \over {\textstyle \sum \limits_{i} \sum \limits_{m} W_{h_{im}} G^{2}_{m}}\,p^2_{h_{im}}}. \eqno (13)]

When all reflections in the data set are full, expressions (10)[link] and (12)[link], and (11)[link] and (13)[link], reduce to the `classical' HRS expressions (5)[link] and (7)[link]. Method 1 allows refinement of the scale factors only while method 2 allows refinement of the scale factors, crystal mosaicity and orientation matrix (Table 1[link]), because the latter two factors contribute to the calculated partiality.

Table 1
Scaling and post-refinement parameters

Parameter Method 1 Method 2
Scale factors Yes Yes
Temperature factors Yes Yes
Crystal orientation No Yes
Effective mosaicity No Yes

9. Selection of reflections useful for scaling

Method 1 requires that all parts of a reflection are available in order to incorporate that reflection into expression (10). Thus, reflections which occur at the beginning or end of the crystal rotation or at gaps within the rotation range must be rejected. Even when all the necessary parts of a reflection are recorded, at least one of these parts could have a problem during peak integration, thus making the rest of the reflection useless for scaling.

Method 2 allows the use of all reflections for scaling, because every observation of a partial reflection is sufficient to estimate the full reflection intensity by expression (9). However, the smaller the calculated partiality, the greater the error of the estimated full intensity. Therefore, a reasonable lower limit of calculated partiality has to be imposed in selecting partial reflections useful for scaling purposes.

Based on the above, the algorithm for selecting reflections is as follows.

  • (i) Sort all reflections in the data set according to (a) symmetry-reduced Miller indices, (b) original Miller indices, (c) oscillation range of the frame on which the reflection is recorded.

  • (ii) Reject some of the reflections according to criteria listed in Table 2[link].

Table 2
Hierarchy of criteria for selecting reflections for the scaling and averaging procedures

In methods 1 and 2, reject all parts of a reflection which has
 (i) No successfully integrated parts
 (ii) No parts with significant intensity (for scaling procedure only)
 (iii) Some parts entering and some parts exiting the Ewald sphere (this condition implies that the  reflection is too close to the rotation axis and is partly in the blind zone)
   
In method 1, reject all parts of a reflection  which has (i) any part which is not successfully integrated,  (ii) any part which has a significant intensity,  but is not predicted by the scaling program  based on the crystal mosaicity and orientation matrix, (iii) the sum of calculated partialities different   from unity by more than a user-chosen value,  (iv) A redundancy of 1. In method 2, reject a part of a reflection if (i) the calculated partiality is less than a  user-chosen value,  (ii) the intensity is insignificant,  (iii) the calculated partiality is 1 and  the redundancy is 1.

10. Restraints and constraints

Scale factors will depend on intensity variations of the incident X-ray beam, variation of the developing conditions if films are used, crystal absorption and radiation damage. When using frozen crystals, scale factors will be mostly a measure of absorption variation as the crystal is rotated from frame to frame, although abrupt changes will occur when the intensity of the beam is changed, as occurs at the beginning of a new injection of electrons or positrons into the synchrotron ring (a `fill'). Hence, in general, scale factors can be constrained to follow an analytical function or restrained [adding a term w(GnGn+1)2 to ψ, where Gn and Gn + 1 are scale factors for the nth and (n + 1)th frame] to minimize variation between successive frames. Such procedures will increase Rmerge because there are fewer parameters, but will increase the accuracy of the measured intensities as additional reasonable physical conditions have been applied.

The angular mis-setting angles of a single crystal should remain entirely constant. Thus, in principle, the refinement of mis-setting angles should constrain the mis-setting angles to be the same for all frames associated with a single crystal in the data set. However, in practice, independent refinement of these angles can indicate problems in the data sets when there are discontinuities in the plots of setting angle versus frame number.

Unit-cell dimensions can be reasonably assumed to be the same for all crystals and might, therefore, be constrained to be such. However, the exact conditions of freezing may cause some crystal-to-crystal variation.

Mosaicity is likely to increase as radiation damage proceeds. Thus, restraint between the independently refined mosaicities of neighboring frames can be useful.

11. Generalization of the procedure for averaging reflection intensities

Once the frame scale factors are determined, they need to be applied to reflection intensities and error estimates. The intensities of reflections with the same reduced Miller indices can then be averaged.

Two methods of intensity averaging may be considered based on the two different expressions (8) and (9) for the estimates of full intensities. For method 1, the intensity average is

[\langle I_{h} \rangle = {{\textstyle \sum \limits_{i} I_{h_{i}} W_{h_{i}}} \over {\textstyle \sum \limits_{i} W_{h_{i}}}} = {{\textstyle \sum \limits_{i} \Big[ \sum \limits_{m}(I_{h_{im}} / G_{m})\Big] W_{h_{i}}} \over {\textstyle \sum \limits_{i} W_{h_{i}}}}. \eqno (14)]

When method 2 is used for averaging, the determination of 〈Ih〉 is more complicated because there are as many estimates of the full intensity Ihi as there are partial reflections him. Therefore, intensity averaging for reflection h has to be performed in two steps. Firstly, for every reflection hi, the intensity estimates from all partial observations are averaged,

[\langle I_{h_{i}} \rangle = {{\textstyle \sum \limits_{m} W_{h_{im}} [I_{h_{im}} / (G_{m} p_{h_{im}})]} \over {\textstyle \sum \limits_{m} W_{h_{im}}}}, \eqno (15)]

where the reciprocal variance weights are Whim = [G_{m}^{2}p^{2}_{h_{im}} / \sigma^{2} (I_{h_{im}})]. Secondly, the 〈Ihi〉 values are averaged as

[\langle I_{h} \rangle = (\textstyle \sum \limits_{i} W_{h_{i}} \langle I_{h_{i}} \rangle ) / \sum \limits_{i} W_{h_{i}}, \eqno(16)]

where W[_{h_{i}] = 1/σ2(〈I[_{h_{i}]〉) and σ(〈I[_{h_{i}]〉) can be derived from (15)[link].

While averaging estimated intensities of full reflections, special treatment has to be given to outliers and discordant pairs (Blessing, 1997[Blessing, R. H. (1997). J. Appl. Cryst. 30, 421-426.]). For samples of three or more equivalent reflections, it is necessary to consider the absolute values of the differences between individual intensities and the median of the sample, |I[_{h_{i}] − Imedian|. The outliers can be detected by several statistical tests and can then be either down-weighted or rejected. When the sample consists of only two reflections, they can be considered as a `discordant pair' if the difference between their intensities is not warranted by the estimated errors and, hence, both reflections can be rejected.

Averaging intensities by method 2 has an advantage over method 1 because outliers and discordant pairs can be `screened' at two levels: firstly, when the estimates of full intensity I[_{h_{i}], calculated by (9)[link] from different parts of the same reflection, are considered, and secondly, when the mean intensities 〈I[_{h_{i}]〉, calculated by (15)[link] from different reflections, are compared.

11.1. Scale factor versus frame number

If scale factors are to make physical sense, their behavior with respect to the frame number has to be in accordance with the known changes in the beam intensity, crystal condition and detector response. Conspicuous deviations from physically reasonable behavior may be attributed to deficiency of the scaling method.

The scaling of the φX174 procapsid data (data set 1 in Table 3[link]) was performed using methods 1 and 2 described here and SCALEPACK (Gewirth, 1996[Gewirth, D. (1996). The HKL Manual. A Description of the Programs DENZO, XDISPLAYF and SCALEPACK, 5th ed., pp. 87-90. New Haven: Yale University.]; Otwinowski & Minor, 1997[Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307-326.]; Fig. 2[link]). The graphs (a) and (b) in Fig. 2[link] have four segments corresponding to four synchrotron beam fills. All three methods give scale factors within 5% of each other. The only frames for which the results differ by as much as 15% are the first and last frames of each beam fill. Both method 1 and SCALEPACK produce physically wrong results in that the scale factors of these frames look like outliers compared with the scale factors of the neighboring frames. By contrast, method 2 provides consistent scale factors for such frames. Although the SCALEPACK algorithm for scaling frames with partial reflections has never been disclosed in the literature, the similar behavior of the results obtained by method 1 and SCALEPACK suggest that SCALEPACK might be using an algorithm similar to method 1.

Table 3
Experimental information on the data sets processed by the methods described here and by SCALEPACK

The data were integrated using the program DENZO (Gewirth, 1996[Gewirth, D. (1996). The HKL Manual. A Description of the Programs DENZO, XDISPLAYF and SCALEPACK, 5th ed., pp. 87-90. New Haven: Yale University.]; Otwinowski & Minor, 1997[Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307-326.]). The mosaicity reported by DENZO was used as an initial parameter for the scaling program.

Data set Compound name Ref. Space group Unit-cell parameters Mosaicity (°) Oscillation range (°) Total rotation (°) Data collection information
        a (Å) b (Å) c (Å) α (°) β (°) γ (°)        
1 φX174 procapsid protein (a) I213 766.9 766.9 766.9 90.0 90.0 90.0 0.35 0.30 37.50 CHESS, F1, Fuji IP, temperature = 120 K
2 Human rhinovirus 14 (b) P213 437.3 437.3 437.3 90.0 90.0 90.0 0.30 0.25 28.25 CHESS, F1, Fuji IP, temperature = 120 K
3 Sindbis virus capsid protein (114–264) (c) P1 35.98 59.54 71.05 109.4 101.5 90.1 0.70 1.00 201.40 Rigaku R-AXIS, temperature = 120 K
4 Alpha3 phage (d) P21 290.2 332.1 337.7 90.0 94.1 90.0 0.21–0.28 0.25 180.00 APS, 14BMC, MAR 345 scanner, temperature = 120 K
†References: (a) Dokland et al. (1997[Dokland, T., McKenna, R., Ilag, L. L., Bowman, B. R., Incardona, N. L., Fane, B. A. & Rossmann, M. G. (1997). Nature (London), 389, 308-313.]); (b) Rossmann et al. (1985[Rossmann, M. G., Arnold, E., Erickson, J. W., Frankenberger, E. A., Griffith, J. P., Hecht, H. J., Johnson, J. E., Kamer, G., Luo, M., Mosser, A. G., Rueckert, R. R., Sherry, B. & Vriend, G. (1985). Nature (London), 317, 145-153.]); M. G. Rossmann, C. A. Momany, B. Cheng & S. Chakravarty, unpublished results; (c) Choi et al. (1991[Choi, H. K., Tong, L., Minor, W., Dumas, P., Boege, U., Rossmann, M. G. & Wengler, G. (1991). Nature (London), 354, 37-43.], 1996[Choi, H. K., Lee, S., Zhang, Y. P., McKinney, B. R., Wengler, G., Rossmann, M. G. & Kuhn, R. J. (1996). J. Mol. Biol. 262, 151-167.]); (d) R. Bernal, B. A. Fane & M. G. Rossmann, unpublished results.
[Figure 2]
Figure 2
Unrestrained linear scale factor as a function of frame number of the φX174 procapsid data set. Results from (a) method 1 (filled circles) and method 2 (open circles) and (b) SCALEPACK. Comparison of (c) method 2 versus method 1 and (d) SCALEPACK versus method 1. (Reprinted with permission from Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.].)

Attempts at scaling a data set for a frozen crystal of HRV14 (data set 2 in Table 3)[link] failed with method 1 because of gaps in the rotation range for the first 20 frames, causing singularity of the normal equations matrix. When frames without useful neighbors were excluded, the cubic symmetry of the crystal was sufficient for successful scaling. Method 2, however, did not have any problems with the whole data set, and its results showed greater consistency than those obtained with SCALEPACK (Fig. 3[link]). SCALEPACK failed to refine the scale factors of those frames which did not have a full complement of abutting frames. Their scale factors remained at the initial value of 1. Also, there are other frames for which the scale factors found by SCALEPACK look like outliers compared with the scale factors of the neighboring frames.

[Figure 3]
Figure 3
Linear scale factor as a function of frame number for the HRV14 data set using SCALEPACK (open circles) and method 2 (filled circles). (Reprinted with permission from Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.]).

The accuracy of method 2 is also demonstrated by the scaling results for the Sindbis virus capsid protein (SCP), residues 114–264 (data set 3 in Table 3[link]). The behavior of the scale factor with respect to the frame number reflects the anisotropy of a thin plate-shaped crystal (Fig. 4[link]). For the first 38 frames (numbers 3–40), odd-numbered frames have higher scale factors than even-numbered frames. Data collection was stopped after frame number 40 and restarted. After frame number 41, odd-numbered frames have lower scale factors than even-numbered frames. This effect presumably relates to the use of the two alternative image plates with slightly different sensitivities in the R-AXIS camera.

[Figure 4]
Figure 4
Unrestrained linear scale factors, determined by method 2, as a function of even (filled circles) and odd (open circles) frame numbers for the SCP (114–264) data set. The sine-like pattern reflects the anisotropy of a thin plate-shaped crystal. (Reprinted with permission from Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.].)

11.2. R factor as a function of `sum of partialities' (method 1)

In order to determine the limits of tolerance which can be permitted when method 1 is used, the R factor was examined as a function of the sum of partialities for the φX174 procapsid data (Fig. 5[link]). For this evaluation, reflections with sum of partialities 1 ± 0.3 were used. The R factor changes sharply when the sum of partialities is outside 1 ± 0.15. Thus, ±0.15 were acceptable limits of tolerance for this data set.

[Figure 5]
Figure 5
R factor as a function of the difference of calculated sum of partialities and unity for the estimates of full reflections when method 1 is used for scaling and averaging of the φX174 procapsid data set. (Reprinted with permission from Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.].)

11.3. Statistics for rejecting reflections and data quality as a function of frame number

The percentage of rejected reflections with respect to the frame number in method 2 is more monotonic than in method 1 (Fig.  6[link]). In the latter method, the frames at the beginning and end of the crystal rotation and beam fills have an especially high rejection rate because there are insufficient data available to add up to full reflections (the reasons for rejecting reflections are listed in Table 2[link]).

[Figure 6]
Figure 6
Percentage of rejected reflections for method 1 versus method 2 for the φX174 procapsid data set. The reasons for rejecting reflections are listed in Table 2[link]. (a) Open circles represent method 1; (b) open squares represent method 2 with mosaicity refinement; (c) open diamonds represent method 2 with no mosaicity refinement. (Reprinted with permission from Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.].)

The behavior of the R factor versus frame number (Fig. 7[link]) is more monotonic when method 1 is used compared with method 2. In method 1, the data quality estimates for neighboring frames are strongly correlated because the full reflections used in the statistics are obtained by summing up partials from consecutive frames. In contrast, in method 2 every frame produces estimates of full reflection intensities independently of the neighboring frames. Therefore, the frame R factors calculated after scaling with method 2 truly represent the data quality for individual frames.

[Figure 7]
Figure 7
R factor as a function of the frame number for the φX174 procapsid data set using (a) method 1 and (b) method 2 with no mosaicity refinement (filled circles) and method 2 with mosaicity refinement (open circles). (Reprinted with permission from Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.].)

11.4. Observed versus calculated partiality

The relationship between observed and calculated partialities (Fig. 8[link]) deviates from the ideal line pobs = pcalc, especially for the smaller calculated partialities where pobs > pcalc. This suggests errors in measuring pobs or calculating pcalc. The latter may be improved by a post-refinement of the orientation matrix and crystal mosaicity (Rossmann et al., 1979[Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570-581.]).

[Figure 8]
Figure 8
The observed partialities plotted against calculated partialities for the φX174 procapsid data processed by method 2 with mosaicity refinement. The observed partialities for individual partial reflections were averaged in bins of calculated partialities. The broken line represents the ideal relationship pobs = pcalc. (Reprinted with permission from Bolotovsky et al., 1998[Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708-717.].)

11.5. Anisotropic mosaicity

Restraint-independent refinement of mosaicity can show both the anisotropic nature of the crystal (Fig. 9[link]) as well as the impact of radiation damage.

[Figure 9]
Figure 9
Variation of (unrestrained) mosaicity for a monoclinic crystal of the bacterial virus alpha3 showing the crystal anisotropy (data set 4 in Table 3[link]) (Ricardo Bernal, April Burch, Bentley Fane and Michael Rossmann, unpublished data).

11.6. Anomalous scattering

The quality of anomalous dispersion data can be assessed by measuring the scatter [\sigma_{I_{h}}] of measurements of non-centric reflections Ih and comparing it with the scatter, [\sigma^{+}_{I_{h}}] or [\sigma^{-}_{I_{h}}], of reflections differing only in absorption while excluding Bijvoet opposites. Thus,

[\langle \sigma_{I{_h}} \rangle = (1/h) \textstyle \sum \limits_{h} \{ [1/(n-1)] \sum \limits_{n} (I_{h} - I_{hn})^{2}\}^{1/2}, ]

with corresponding definitions of [\sigma^{+}_{I_{h}}] and [\sigma^{-}_{I_{h}}]. The ratios [\langle \sigma_{I_{h}} \rangle / \langle \sigma^{+}_{I_{h}} \rangle] and [\langle \sigma_{I_{h}} \rangle / \langle \sigma^{-}_{I_{h}} \rangle] should, therefore, be larger than unity for significant anomalous dispersion data (Fig. 10[link]).

[Figure 10]
Figure 10
Quality of anomalous dispersion data for an SeMet derivative of a dioxygenase Rieske ferredoxin (Christopher Colbert and Jeffrey Bolin, unpublished data). Note the much larger scatter among measurements of Ih for data measured at the absorption edge of Se (filled circles and empty circles) as opposed to measurements remote from the edge (filled squares and empty squares). The decreasing values of [\langle \sigma_{I_{h}} \rangle / \langle \sigma^{+}_{I_{h}} \rangle] and of [\langle \sigma_{I_{h}} \rangle / \langle \sigma^{-}_{I_{h}} \rangle] with resolution is a consequence of the decrease of Ih values, thus causing the error in measurements of Ih to approach the difference of intensity of Bijvoet opposites (measured by the inverse-beam procedure to eliminate absorption error).

12. Availability of source code

The autoindexing program source code has been written in C, implemented on an SGI O2 workstation and is available via the WWW at http://bilbo.bio.purdue.edu/~viruswww/Rossmann_home/rstest.html . The run time is sufficiently short for the autoindexing procedure to be run interactively.

The generalized procedure for scaling and averaging crystallographic data with partial reflections has been implemented as a C-language program SNP and tested on various data sets collected from crystals of biological macromolecules (Table 3[link]). The source code is available via the WWW (http://bilbo.bio.purdue.edu/~viruswww/Rossmann_home/rstest.html ).

Acknowledgements

This paper is largely based on two previous publications (Steller et al., 1997; Bolotovsky et al., 1998) concerning autoindexing and scaling and representing the work of Ingo Steller and Robert Bolotovsky, respectively, while postdoctoral fellows at Purdue University. We are very grateful for the support given to the development of DPS by Chris Nielson of ADSC and the staff of MacCHESS (including Steve Ealick, Dan Thiel and Marian Szebenyi) at Cornell University. Also, we would like to thank our colleagues at Purdue University and elsewhere who have provided many helpful suggestions. We are also anxious to acknowledge the outstanding help of Sharon Wilder in many parts of the work, including the preparation of this manuscript. This work was supported by a National Science Foundation grant (MCB-9527131) to MGR.

References

First citationArndt, U. W., Champness, J. N., Phizackerley, R. P. & Wonacott, A. J. (1973). J. Appl. Cryst. 6, 457–463. CrossRef CAS IUCr Journals Web of Science
First citationArndt, U. W. & Wonacott, A. J. (1977). The Rotation Method in Crystallography. Amsterdam: North-Holland.
First citationBlessing, R. H. (1997). J. Appl. Cryst. 30, 421–426. CrossRef CAS Web of Science IUCr Journals
First citationBolotovsky, R. & Coppens, P. (1997). J. Appl. Cryst. 30, 65–70. CrossRef CAS Web of Science IUCr Journals
First citationBolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708–717. Web of Science CrossRef CAS IUCr Journals
First citationBricogne, G. (1986). Editor. Proceedings of the EEC Cooperative Workshop on Position-Sensitive Detector Software (Phase III), pp. 28. Paris: LURE.
First citationBurzlaff, H., Zimmermann, H. & de Wolff, P. M. (1992). International Tables for Crystallography, Vol. A, edited by T. Hahn, pp. 738–749. Dordrecht: Kluwer Academic Publishers.
First citationCampbell, J. W. (1997). CCP4 Newslett. 33, 5–16.
First citationChoi, H. K., Lee, S., Zhang, Y. P., McKinney, B. R., Wengler, G., Rossmann, M. G. & Kuhn, R. J. (1996). J. Mol. Biol. 262, 151–167. CrossRef CAS PubMed Web of Science
First citationChoi, H. K., Tong, L., Minor, W., Dumas, P., Boege, U., Rossmann, M. G. & Wengler, G. (1991). Nature (London), 354, 37–43. CrossRef PubMed CAS Web of Science
First citationDokland, T., McKenna, R., Ilag, L. L., Bowman, B. R., Incardona, N. L., Fane, B. A. & Rossmann, M. G. (1997). Nature (London), 389, 308–313. CrossRef CAS PubMed Web of Science
First citationDuisenberg, A. J. M. (1992). J. Appl. Cryst. 25, 92–96. CrossRef CAS Web of Science IUCr Journals
First citationFox, G. C. & Holmes, K. C. (1966). Acta Cryst. 20, 886–891. CrossRef CAS IUCr Journals Web of Science
First citationGewirth, D. (1996). The HKL Manual. A Description of the Programs DENZO, XDISPLAYF and SCALEPACK, 5th ed., pp. 87–90. New Haven: Yale University.
First citationGreenhough, T. J. & Helliwell, J. R. (1982). J. Appl. Cryst. 15, 338–351. CrossRef CAS Web of Science IUCr Journals
First citationHamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129–130. CrossRef IUCr Journals Web of Science
First citationHigashi, T. (1990). J. Appl. Cryst. 23, 253–257. CrossRef CAS Web of Science IUCr Journals
First citationKabsch, W. (1988). J. Appl. Cryst. 21, 67–71. CrossRef CAS Web of Science IUCr Journals
First citationKabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals
First citationKim, S. (1989). J. Appl. Cryst. 22, 53–60. CrossRef CAS Web of Science IUCr Journals
First citationLeslie, A. G. W. (1992). Crystallographic Computing 5. From Chemistry to Biology, edited by D. Moras, A. D. Pojarny & J. C. Thierry. Oxford University Press.
First citationOtwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326. CrossRef CAS Web of Science
First citationRossmann, M. G. (1979). J. Appl. Cryst. 12, 225–238. CrossRef CAS IUCr Journals Web of Science
First citationRossmann, M. G., Arnold, E., Erickson, J. W., Frankenberger, E. A., Griffith, J. P., Hecht, H. J., Johnson, J. E., Kamer, G., Luo, M., Mosser, A. G., Rueckert, R. R., Sherry, B. & Vriend, G. (1985). Nature (London), 317, 145–153. CrossRef CAS PubMed Web of Science
First citationRossmann, M. G. & Erickson, J. W. (1983). J. Appl. Cryst. 16, 629–636. CrossRef CAS Web of Science IUCr Journals
First citationRossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570–581. CrossRef CAS IUCr Journals Web of Science
First citationSteller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036–1040. Web of Science CrossRef CAS IUCr Journals
First citationVriend, G. & Rossmann, M. G. (1987). J. Appl. Cryst. 20, 338–343. CrossRef CAS Web of Science IUCr Journals
First citationWinkler, F. K., Schutt, C. E. & Harrison, S. C. (1979). Acta Cryst. A35, 901–911. CrossRef CAS IUCr Journals Web of Science
First citationXuong, N., Kraut, J., Seely, O., Freer, S. T. & Wright, C. S. (1968). Acta Cryst. B24, 289–290. CrossRef IUCr Journals

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds