Robust indexing for automatic data collection

Improved methods for indexing diffraction patterns from macromolecular crystals are presented. The novel procedures include a more robust way to verify the position of the incident X-ray beam on the detector, an algorithm to verify that the deduced lattice basis is consistent with the observations, and an alternative approach to identify the metric symmetry of the lattice.


Introduction
Large-scale macromolecular crystallography projects, including structural genomics efforts (Stevens et al., 2001), are placing increasing demands on synchrotron beamline facilities worldwide. In response, new methods are being developed to increase ef®ciency and throughput. Many beamlines are now being equipped with sample-handling robots Snell et al., 2004), and new graphical user interfaces provide the experimentalist with¯exible control over the data acquisition process (McPhillips et al., 2002). Additional efforts are under way to provide a measure of automation to the subsequent stages of data reduction (Leslie et al., 2002) and structure solution Brunzelle et al., 2003).
Availability of convenient crystal handling at the beamline has enabled users to perform rapid screening experiments, wherein large numbers of similar crystals are brie¯y examined, with the best ones being identi®ed for later collection of full data sets. Reasons to screen multiple samples include optimization of the cloning, expression, and puri®cation techniques involved in protein production, determination of the most favorable crystallization and cryocooling conditions, and investigation of large numbers of macromolecule±ligand complexes. Furthermore, even crystals prepared under identical conditions can be heterogeneous. For each sample, the typical protocol involves acquiring a narrow (0.25±1.0 rotation) oscillation image at a standard distance, or if time permits, acquiring two images at rotation settings separated by 90 , to reject crystals with unacceptably high anisotropic diffraction. If control is fully automated, the data can be acquired in 1±2 min per sample.
Once acquired, images must be analyzed to determine if the diffraction pattern can be indexed. Indexing the crystal lattice and determining the likely Bravais symmetry permits the user to predict whether a given crystal can potentially yield a complete data set. This step must be completed in real time so that the user can select samples for further study. Presently there are several software packages available to assist with this (P¯ugrath, 1997, 1999Otwinowki & Minor, 1997;Leslie, 2001;Kabsch, 2001). Most include graphical user interfaces, which serve the important function of allowing the experimenter to visually con®rm whether the diffraction pattern predicted by indexing matches the observations. If indexing is to be eliminated as the rate-limiting step, it must be very reliable even in the absence of this visual inspection. Indeed, it would be ideal if the indexing program ran in the background concurrently with data collection, so that results appear in close to real time as the images are acquired. Naive attempts to automate this process with shell scripts, however, revealed systematic problems with determining the beam center, lattice basis and symmetry. As we will show, these common problems can be detected algorithmically and corrected automatically. We have developed LABELIT (Lawrence Berkeley Laboratory Indexing Toolbox), a Python/C++ package capable of handling dif®cult indexing cases.

Computational methods
The overall approach to indexing is summarized in the¯ow chart in Fig. 1. After the brightest Bragg spots are selected, spot positions are converted to reciprocal space, and the spot distribution is analyzed to detect periodicities corresponding to lattice spacings. The analysis of lattice spacings also gives suf®cient information to improve the initial estimate for the direct-beam position. Three vectors showing signi®cant periodicity in the spot distribution are chosen as the three basis vectors of the unit cell. A computational check is made to assure that the basis forms a primitive lattice rather than one in which some predicted spots are systematically absent. Once the model parameters are re®ned in a triclinic setting, tests are performed on the unit-cell dimensions to detect possible symmetry elements, which are then combined to produce a list of Bravais types consistent with the data. Cell re®nement with symmetry-based restraints produces a ®nal set of indexing solutions suitable for other procedures, such as determining the data collection strategy, or integrating the data set.
Although Fig. 1 depicts data analysis running continuously through from start to ®nish, LABELIT will terminate and provide a report of dif®culties if the input data are unsuitable for processing at any step.

Choice of candidate Bragg reflections for indexing
In order to facilitate indexing, an effort is made to select candidate spots that are most likely to be Bragg re¯ections. The algorithm, DISTL, is described in detail elsewhere (Zhang et al., 2004). Of all local maxima on the image, only those with a peak height higher than a cutoff multiple of the local background noise are considered, thus eliminating weak re¯ections. The image is then divided into very thin concentric shells centered around the direct-beam position, and the pixel intensity distribution is examined to eliminate ice or powder rings. Filters are then applied based on spot size, intensity and shape. Candidate Bragg re¯ections are expected to have a unimodal distribution of pixel intensities; that is, a single peak rather than a group of closely associated local maxima. Therefore, spots with more than two local maxima are rejected; those with two are permitted because some mediumintensity spots have bimodal distributions due to statistical noise. Also, pairs of spots are rejected when they are separated by less than 1.2 diameters, assuring that artifacts of crystal splitting do not degrade the indexing process. The upper resolution limit is chosen conservatively (Method 2 of Zhang et al., 2004) to avoid re¯ections at very high diffraction angles, which might cause indexing to fail (if the unit-cell dimensions are large enough, high-resolution spots can overlap, making it dif®cult to assign accurate Miller indices). If fewer than 40 good spots on any image remain, no attempt is made to index the lattice. Ideally, 300 spots with the highest signal-to-noise ratios are chosen on each image.
Ignoring the weakly diffracting Bragg re¯ections poses a potential problem if there is pseudo-centering in the lattice, which creates systematically weak re¯ections that can be overlooked. Unfortunately, it is not possible to correct this situation simply by including all weak re¯ections, since the indexing process would likely be overwhelmed by experimental noise. We favor the approach of initially indexing with the brightest Bragg re¯ections, and then performing a targeted search for systematically weak spots if the deduced lattice is centered. Although we have not implemented this yet, we plan to automate this process with LABELIT in the near future.

Detection by Fourier analysis of likely basis vectors
The crystal lattice is deduced starting with the set of N candidate Bragg spots determined according to x2.1. At the outset the position of each spot on the detector is measured. The rotational setting ' at which the spot satis®es the re¯ection conditions is not known accurately, but is taken here to be the mid-value of the oscillation range. These positional and angular data are then used to derive the reciprocal-space position x for each spot, following the conventions of Rossmann (1979). In cases using images at two different rotation settings (usually 90 apart in ', where ' is the goniometer rotation), the respective lists of reciprocal-space vectors are merged. It should be noted that merging the lists in this way implicitly ignores the possibility that spots at the same detector coordinates on adjacent images might be partial slices through the same Bragg peak. We avoid this problem by choosing not to index pairs of images that are adjacent or separated by less than 4 in '.
The procedure of Steller et al. (1997) is then used to determine which unit directions t are likely to form basis vectors for the periodic crystal lattice. Although we use the published method exactly, it is summarized here to provide a foundation for the following section. For every direction t chosen from a large set evenly spaced within the hemisphere, the projections p are calculated for all observed reciprocalspace points x onto t, The range of p values is divided into m bins, each having an appropriately granular width Áp. This allows us to construct a reciprocal-space frequency series f(j), giving the number of observed projections in the jth interval. Peaks in this series ( Fig. 2a) suggest the locations of possible reciprocal-lattice planes perpendicular to t. To ®nd out how well the observed reciprocal-space points are described by periodic planes, one can take the discrete Fourier transform Overall indexing procedure.
Peaks in the power spectrum |F(k)| correspond to strong periodicities along t (Fig. 2c). In particular, the ®rst and largest peak at k = l (not counting the peak at k = 0) is used to quantify the periodicity. After compiling a short list of t directions having the largest values of |F(l)|, each of these directions is re®ned with a ®ne grid search to maximize the |F(l)| value. Finally, directions that are collinear duplicates are rejected and the list is resorted, giving a set of $20 candidate directions {t}. Each candidate represents a possible unit-cell basis vector with real-space length d l=mÁp: 3 Only three directions are eventually chosen as true basis vectors (x2.4). The other directions in the set represent either linear combinations of the true basis vectors, lattice vectors due to the presence of a second crystal, or false periodicities.

A fast grid search to improve the direct-beam position
Since the direct X-ray beam often cannot be directly observed on the diffraction image, its position must be determined independently. This raises the possibility of systematic error in one's prior belief about the beam position. If the prior belief is inaccurate, this will be re¯ected in the calculated reciprocal-space coordinates x of each observed Bragg re¯ection (Rossmann, 1979) and this will in turn compromise the autoindexing procedure. However, it is possible to use the Fourier coef®cients F(k) derived from the given X-ray beam position to infer a better estimate of the true beam position. As will be described in detail in x3.1, a better estimate can be derived even if the input beam position differs from the true beam position by more than the smallest interspot spacing.
Recall that in general the frequency series f(j) may be recovered by the backwards Fourier transform where the complex coef®cients F(k) exhibit Hermitian symmetry, F(k) = F(m À k)*. Assume now that a particular direction t 0 P {t} is a basis vector for the unit cell, and therefore exhibits strong periodic groupings when the reciprocal-space points are considered in projection (Fig. 2a). The essential information regarding this periodicity can be effectively modelled with the single Fourier coef®cient F(l). By replacing the Fourier sum in equation (4) with a single term where k = l, f(j) can be approximated with the simpli®ed expression f j 9 Pj cos l 2jl=m; 5 where l is the polar angle in the conventional representation F(l) A l expi l .
As expected, crests in the sinusoidal plot of equation (5) (Fig. 2b) closely correlate with the projections of observed reciprocal-lattice points in Fig. 2(a). Since the true beam position coincides with a reciprocal-lattice plane at x = 0, it will fall on one of the crests. This implies that P(j) in equation (5) can be viewed as giving the un-normalized conditional probability (in the Bayesian sense) that the true beam center projects onto direction t 0 at interval j, given the prior belief about the beam center position.
The true beam center cannot be deduced from equation (5) alone. As depicted in Fig. 2(b), each of several crests could potentially correspond to the true beam. Moreover, even if a particular crest at interval j is singled out, an in®nite number of possible beam positions in the laboratory frame would be associated with this interval. This situation is summarized in Fig. 3(a), a contour map showing the probability that a given pair of laboratory coordinates is the true beam center. However, if three or more linearly independent directions t 0 , t 1 , t 2 P {t} are considered simultaneously, the possible laboratory coordinates of the true beam center can be constrained to a small set of points, as shown in Fig. 3(b). This contour plot has been computed as the sum of Fig. 3(a) and probability maps from two additional directions. For this example, the three directions t 0 , t 1 , t 2 were selected to be the primitive axes of the unit cell.
In practice, one does not know which directions t 0 , t 1 , t 2 will ultimately be chosen as basis vectors. Instead, probability fringes from all directions in the set {t} can be combined to help assure that the beam position is suf®ciently overdetermined. An example of the resulting probability map is shown in Fig. 3(c). As will be discussed in x3.1, the search for the true beam position is con®ned to an area (indicated by a black circle in Fig. 3c) within a radius S centered on the priorbelief beam center. Within this search radius, the true beam position is taken to be the peak position of the largest cluster, where clusters are ranked by integrated area. In the example illustrated, the true beam position is at the center of the panel.
Note in Fig. 3(c) that if the search radius S is made too large, there is the potential of misidentifying one of the other red Analysis of observed Bragg re¯ections projected in reciprocal space onto a direction later determined to be the a axis of the sample's orthorhombic unit cell. Panel (a) shows the histogram f(j) of re¯ections projected onto this axis, while panel (c) gives its power spectrum. The peak at l = 18 corresponds to a lattice periodicity along this axis of 36.4 A Ê . The single Fourier coef®cient F(18) can be used [equation (5)] to create an approximate model for f(j), shown in panel (b). Note that the projection of the true beam position onto the axis, indicated by an asterisk (*), corresponds exactly to a crest in this model. Panels (a)  peaks (not the one in the center) as the true beam position. This ambiguity disappears if data are available from two images collected at rotational settings 90 apart. If probability fringes from two images are combined, an unambiguous true beam position may be obtained over a much larger search radius (Fig. 3d).
For use in the next section, the directions {t} must be improved based on the new estimated beam position. When two images are used, it is suf®cient simply to re-re®ne each existing t using the ®ne grid search mentioned in x2.2. However, if only one image is used for indexing, the t directions are poorly determined at this point, and are unreliable when used for subsequent indexing steps. In this case, a completely new search for t directions is performed, with the new beam position as input.

Indexing the diffraction pattern
From the set {t}, three unit directions and associated realspace lengths [equation (3)] can now be chosen to form basis vectors for the unit cell: It is convenient to express this basis as the matrix of cell-axis components taken with respect to the orthogonal camera axes (Rossmann, 1979), calculated when ' = 0. Also, the inverse of this matrix gives the reciprocal-space components: With these component matrices in hand, it is now possible to take the reciprocal-space coordinates of each observed Bragg spot, which were previously expressed in the reciprocal orthonormal basis (x in x2.2), and re-express them in terms of the reciprocal unit-cell basis vectors: is a rotation matrix around the camera's spindle corresponding to the ' setting of the particular image. Integervalued Miller indices h are computed by rounding the realvalued components of f. It is important to realise that this simplistic method for calculating h does not account for the ®nite oscillation range Á' of each diffraction image, which tends to produce overlapping spots whose Miller index assignments are ambiguous. To guard against this, f is recalculated at each limit of the oscillation range (' + Á'/2 and ' À Á'/2). If the same value of h is not produced in each case, the particular Bragg spot under consideration is disregarded for all subsequent calculations. Similarly, Bragg spots are ignored when they are too close to the rotation spindle for accurate evaluation.

Detection and correction of a non-primitive basis
As noted in x2.3, the set {t} is most likely to contain vectors that can form a primitive basis of the crystal lattice. However, it is also possible for {t} to contain directions that are linear combinations of primitive basis vectors, which should not be used to index the lattice. Fig. 4(a) illustrates a case where choosing a group of three highly ranked directions (i.e. directions with large |F(l)| values) leads to misindexing. The apparent reciprocal cell basis (Fig. 4a, inset) predicts too many Bragg spots; further inspection reveals that re¯ections are only observed when h + l = 2n, where n is an integer. In general, the data must be scrutinized to ®nd re¯ection conditions of the form g Á h g 0 h g 1 k g 2 l Mn; 9 where g 0 , g 1 and g 2 are small integer coef®cients and the modulus M is a small prime number, usually 2, 3 or 5. A transformation must then be applied to the incorrect basis [a* H , b* H , c* H ] to produce the primitive one, Map of a small portion of the imaging detector centered on the true beam position, depicting conditional probability contours for the location of the direct beam, given an input beam position (black dot). This particular input beam position is 0.8 mm from the true value. Probabilities were determined using the projections of reciprocal-space points from one image onto (

The matrix [T] must have integer coef®cients and determinant
M, but is otherwise not uniquely determined. We propose the following algorithm to enumerate the re¯ection conditions [g, M] and the associated transformations [T].
Step 1. Construct the list G 0 of all possible integer 3-tuples, with each tuple element having a magnitude up to the maximum expected modulus M; typically g = (À5 g 0 5, À5 g 1 5, À5 g 2 5). Sort the list in order of increasing |g|, and omit g = (0, 0, 0).
Step 2. Construct G 1 , a copy of G 0 . Remove all items in G 1 having g Á g larger than a cutoff value, usually 6. Also delete tuples collinear to items higher on the list, i.e. g and g are collinear if g Â g = 0. The remaining elements of G 1 are possible values for g in equation (9). Note that the cutoff values M 5 and g Á g 6 are chosen empirically to permit the indexing of a large test set of diffraction images using the fewest computational cycles. These cutoffs give 37 g values in combination with three prime moduli (2, 3, 5) to give 111 formulae for re¯ection conditions.
Step 3. For each re¯ection condition [g, M] construct the matrix [T]. Begin by considering that every point on the reciprocal lattice obeys equation (9). In particular, the correct basis vectors will also follow this rule. Therefore, for the ®rst row of [T] (the coef®cients giving the correct basis vector a*) pick the ®rst item g P G 0 that satis®es g Á g = Mn. Tuple g becomes the ®rst row of [T]. For the second row of [T], select the ®rst item g P G 0 not collinear with g and satisfying the equation g Á g = Mn. Finally, the third row of [T] becomes the ®rst item g P G 0 not coplanar with g and g , and satisfying the equation g Á g = Mn. If the determinant of [T] is negative, the ®rst and second rows are switched.
In order to decide whether a basis set is primitive, each of the 111 re¯ection conditions enumerated in Step 2 is separately considered. Since the sample size of candidate Bragg spots is at most 600, the computational loop through all conditions is quite rapid ($7 ms on a 2.8 GHz Intel Xeon processor). Allowing for a generous percentage of outliers (typically 20%), if the remaining spot candidates ful®ll equation (9), a match is declared and the basis set is transformed with [T]. In Fig. 4(b), the correct basis set is immediately obtained from the matrix The loop is then repeated to search for any further systematic absences. Repetition of the loop allows us to limit the procedure to prime number moduli.

Selecting the best basis combination
The choice of basis vectors noted in equation (6) is not unique, since there are $20 directions in the set {t} to choose from. Certain combinations of directions may be immediately ruled out since they lead to unit cells with nearly zero volume. Speci®cally, if V is the cell volume and a, b and c are the cell axis lengths, a cutoff requirement that V > abc/100 does not lead to a signi®cant loss of generality. For the remaining candidate bases, primitiveness is imposed (x2.5) and the basis choice is scored using a number of measures:  entering into the calculation of (a) after overlaps and axial spots are removed; (c) the root-mean-squared difference between observed and predicted laboratory coordinates of the candidate Bragg spots (r.m.s.d.); and (d) the fraction of candidate Bragg spots correctly predicted by the basis choice. Measures (a)±(d) provide the raw comparisons needed to pick a single high-scoring basis for all further work, with measures (c) and (d) being most useful.
The heuristic for choosing the basis was formulated empirically with the goal of producing a correct indexing solution in the least amount of computational time. With a well indexing crystal, many combinations of directions from {t} lead to similarly high scores, and give nearly identical unit-cell parameters. In such cases it is suf®cient to try only a few combinations before making the ®nal basis selection. With poor crystals it is often necessary to do an exhaustive search of possible basis vectors from {t} before choosing the best basis, or deciding that the diffraction pattern cannot be indexed. The actual method in the package represents an attempt to accommodate these two extremes, with the realisation that further ®ne-tuning may be bene®cial. The implementation is presented as a high-level script in order to allow future changes or adaptations by the end user.

Cell reduction and refinement
For subsequent symmetry determination, it is necessary to calculate the transformation to a reduced basis as discussed in x9.3 of the International Tables for Crystallography, Vol. A (Burzlaff et al., 1996). The essential requirement for the subsequent steps is that the reduced basis has vectors of minimum length. The reduced basis de®ned in the International Tables for Crystallography ful®lls this requirement, but conventional iterative cell reduction algorithms leading to Buerger-reduced cells (Buerger, 1957;Gruber, 1973) or Niggli-reduced cells (Kr Ïivy Â & Gruber, 1976) are numerically unstable. A comprehensive treatment of this problem is given by Grosse-Kunstleve et al. (2004a). Except where otherwise speci®ed below, we adopted the minimum reduction presented in that work because it is fast and combines numerical stability with maximum portability.
After reduction, 12 model parameters are re®ned using conjugate-gradient minimization, with the minimization target being the root-mean-square difference (r.m.s.d.) between the observed and predicted Bragg spot positions introduced in x2.6. It should be emphasized that the target function includes only abstracted information about the positions of the $600 spots chosen for autoindexing; no information is present about pixel intensities on the original image. The ®rst round of minimization adjusts the x and y coordinates of the direct beam, the next adds the crystal-to-detector distance, and the ®nal round adds the nine components of the [A] matrix. First derivatives of the target function with respect to each parameter are calculated for the LBFGS minimization algorithm (Liu & Nocedal, 1989) implemented in our package CCTBX (Grosse-Kunstleve et al., 2002). After minimization, the minimum reduction is applied again.
An estimate of the effective mosaic spread of the crystal is obtained separately. Diffraction patterns are calculated (Rossmann, 1979) using many trial values of mosaic spread ranging from 0 to 1.5 . The effective mosaic spread is taken to be the minimum value, which correctly predicts the observed positions of 80% of the $600 Bragg spot candidates used for indexing. The 80% requirement is chosen to allow a small fraction of outliers due to non-Bragg scattering or other pathologies. If the image is so poor that no value of mosaicity less than 1.5 will cover 80% of the spots, no further estimate is made.

Determination of the metric symmetry
With the diffraction pattern indexed, the ®nal issue addressed is the crystal symmetry. Knowing the reduced basis, it is possible to ®nd the highest possible Bravais symmetry consistent with its metric properties. Lower-symmetry lattices cannot be ruled out until the Bragg spot intensity data are scaled.
A fundamental concern here is that all data derived from experimental observations have some degree of uncertainty, leading to an imperfect knowledge of the reduced basis. Since lattice symmetry is by de®nition exact, any algorithm to deduce the Bravais type must necessarily use tolerances when testing symmetry conditions. We evaluated two distinct procedures against this criterion, ultimately adopting one of them.
In method A, the Niggli-reduced basis (Kr Ïivy Â & Gruber, 1976) is classi®ed in terms of the 44 lattice characters listed in Table 9.3.1 of the International Tables for Crystallography (Burzlaff et al., 1996). This approach tests elements of the metric tensor For example, one of the primitive tetragonal characters requires that (i) A = B, (ii) D = E = F = 0, and (iii) DEF 0. Equality tests such as (i) are evaluated within a tolerance parameter, set to 4% to accommodate normal levels of experimental uncertainty. For testing orthogonality conditions such as (ii) D = 0, similar reasoning implies that this expression should be considered true if the magnitude of the direction cosine between b and c is <0.04. It is more dif®cult to accommodate experimental uncertainty in evaluating condition (iii). If D = 0 or E = 0 or F = 0 (within 4% tolerance) then expression (iii) must be forced to be true even if DEF is numerically > 0. When necessary, it is possible to impose this condition by pre-multiplying two of the three basis vectors (a, b, c) by À1. While these procedures are adequate for most cells, we were able to identify cases where in®nitesimal uncertainties in the basis vectors caused the Bravais type to be misidenti®ed (x3.3). We therefore adopted method B, in which the full Bravais symmetry is generated from a list of twofold rotational axes calculated from either the minimum-reduced or Nigglireduced basis. Given the cell dimensions, the procedure of Le Page (1982) can identify a twofold axis by asking whether normal vectors to sets of real-and reciprocal-space planes coincide within a given angular tolerance . The tolerance is normally set to 1.4 , the minimum value needed to accommodate a wide number of test data sets. It is only necessary to consider a prede®ned list of 1379 pairs of normals to exhaustively ®nd all candidate twofolds, and the code to implement this is fast and compact. Each discovered twofold is then converted to a matrix operator representation (Grosse-Kunstleve et al., 2004b). These symmetry operators are used as generators in a group-multiplication procedure to produce the complete symmetry group, which in turn is identi®ed as one of the 14 Bravais types. An auxiliary procedure lists all possible subgroups. These are ranked by the maximum tolerance needed to accommodate all the twofold rotational axes of the subgroup. Transformations to standard settings are determined automatically according to Grosse-Kunstleve (1999).

Final restrained minimization
With the list of candidate Bravais settings, ®nal parameter minimizations are performed (one for each candidate setting) to impose the metric conditions implied by the symmetry. Conjugate-gradient minimization is performed on the 12 parameters described above (x2.7), plus up to ®ve additional restraints derived from symmetry. The derivation of these restraints is discussed in a separate paper (Grosse-Kunstleve et al., 2004b). At this point, it is sometimes possible to rule out the highest-symmetry candidate settings; for example, if the highest-symmetry setting produces a re®ned residual twice as high as that for the triclinic setting.
Indexing is now complete, with the ®nal result being a set of parameters for each of the remaining candidate Bravais types. Optionally, these parameters may be converted into ®les suitable for MOSFLM (Leslie, 2001) input, so that Bragg re¯ections can be integrated and further analysis performed.

Estimation of the direct-beam position is robust
It is well known that indexing relies critically on knowing the true position m at which the incident X-ray beam intersects the imaging detector. Although the beam coordinates can be re®ned to some degree after the indexing solution is found, the initial error Ám in the position must be small enough to converge to the correct solution. The zone of convergence may be estimated by considering the unit-cell dimensions of the crystal. If the reduced basis vectors have lengths a, b, c, then the smallest spot-to-spot separation in the low-angle portion of the image will be of order L 9 D/ max (a, b, c), where is the X-ray wavelength and D is the crystal-to-detector distance. Clearly, if Ám is of order L or higher, it will not be possible to index the diffraction pattern correctly.
To determine if the zone of convergence could be extended using the Fourier coef®cient method, we considered a much larger region on the diffraction image, within a radius of 2.5L of the true beam center. In this experiment, a rectangular grid was superimposed onto this region, and each point r was separately treated as a prior-belief beam center, in a grid search for the true beam center m using the method outlined in x2.3. To determine if r lay within the zone of convergence, the true beam position was sought within a radius S = 1.3|r À m|. Determination of the new putative beam position was followed by indexing, least-squares parameter re®nement, and Bravais lattice selection as outlined above (xx2.4±2.8). The procedure was considered successful for a particular point r if it gave the correct beam center, Bravais lattice and unit cell. In a control experiment, we used the grid point r directly for the indexing step, relying only on subsequent least-squares parameter re®nement to improve the beam position.
A typical set of results is shown in Fig. 5. As expected, the control experiment produces a reliable solution only with an initial beam position error Ám 0.4L (Figs. 5a and 5b). With the Fourier coef®cient method, this particular image can be reliably indexed with any Ám 0.6L (Fig. 5c). The results improve even more dramatically when Bragg spots are combined from two images with ' settings 90 apart (Fig. 5d). In this case, the correct solution is obtained whenever Ám 1.2L = 1.8 mm. Note that to produce this result from the analysis of two images, one must assume that the detector remains stationary throughout the experiment. This is generally true for modern charge couple device (CCD) detectors and stationary phosphor imaging plates, but not for earlier detectors such as ®lm or manually exchangeable imaging plates.
This simulation suggests that the beam position is likely to be discovered if Bragg spots are combined from two images and the search radius is set to S = L. Since the reduced basis is not available a priori to calculate L, it is reasonable instead to use the candidate cell lengths {d} corresponding to each direction in the set {t}. The search radius is set to S D=maxfdg: 13 At present, knowing the beam center is generally considered to be a solved problem. Synchrotron beamlines, for example, often use a separate experiment to determine the beam coordinates before the crystal sample is exposed. However, there is a small but ®nite failure rate associated with such procedures. Invariably the ®nal analysis requires visual inspection of the diffraction image, to con®rm that the indexing solution agrees with the observed Bragg spots. If indexing fails, the correctness of the direct-beam position is typically the ®rst item to be checked. In future applications, such as automated data collection, it will be necessary for indexing to occur automatically and nearly¯awlessly, without time-consuming manual intervention. The Fourier coef®cient method is attractive in this regard. It relaxes the stringency with which the beam center must be determined by other methods, and its built-in grid search effectively replaces the trial-and-error methods of visual inspection. It will be especially useful for crystals with large unit cells, where the indexing of closely spaced Bragg re¯ections is sensitive to small errors in the prior beam estimate.
In the preceding discussion, it is understood that the priorbelief beam position is obtained from information tabulated in the image ®le header, created at the time the data are acquired. A well known issue is that this tabulated position may be expressed in eight possible coordinate systems, with various detector manufacturers having chosen among the different conventions (Gewirth, 2003). It is important for the coordinate system to be properly identi®ed prior to indexing, as it enters into the calculation of reciprocal-space coordinates x. Once this determination has been made, it is normally applicable to all images collected with a particular detector at a particular beamline.

Re-indexing solves a long-standing problem
Any procedure for deducing the lattice basis from experimental data must address the question of whether the resulting basis is primitive (Fig. 4). Caution is required because the candidate basis vectors in the list {t} are necessarily ranked by their ability to describe spatial periodicity in the data, that is, in order of highest |F(l)|, and not by shortest real-space vector length d. Yet it is imperative that the chosen vectors have minimal real-space length to assure that they are not linear combinations of the true basis vectors, as described in x2.5. Our test suite of 177 successfully indexed crystals provides numerous cautionary examples: in 90 of these samples, at least one combination of high-ranking candidate basis vectors [equation (6)] needs to be corrected by the procedure of x2.5, to assure proper indexing in the triclinic setting. Although this problem has long been alluded to in the literature (see for example x5.1 of Henry & Lonsdale, 1965), it has not, to our knowledge, been addressed directly by published procedures for indexing macromolecular diffraction data. In the paper by Steller et al. (1997), for example, basis vectors are chosen by the particular criterion of minimizing the number of re¯ections for which any component of f À h (x2.4) is greater than 0.2. We have found that using this criterion (or any of the other criteria suggested in x2.6) produces misindexing of the type illustrated in Fig. 4(a), in a small percentage of cases.
To compensate for this lack of adequate methods, macromolecular crystallography users have relied until now on interactive graphical indexing programs, which allow indexing by trial and error. If a basis set looks incorrect, program parameters such as the beam center and crystal-to-detector distance are slightly altered, which has the effect of producing a different basis choice. In the present context, however, we aim to index crystal lattices without the need for visual inspection of the result. The test presented in equation (9) permits automatic recognition of this important class of misindexed results, thus improving the reliability of any automated system.

Correct identification of rhombohedral symmetry by detection of twofolds
We now examine an instance in which correct identi®cation of the primitive basis can nevertheless lead to improper identi®cation of the lattice symmetry, when method A (x2.8) is used for symmetry determination. Consider case (i), a hexagonal rhombohedral setting, with a = b = 10, c = 30, as depicted in Fig. 6(a), with the Niggli-reduced basis (a, b, c) as indicated. Using this basis to determine the metric symmetry of the unit cell, it can be shown algebraically that the metric tensor [equation (12)] satis®es these conditions: A = B; D = E = F = A/2. Looking for these conditions in Table 9.3.1 of the International Tables for Crystallography (Burzlaff et al., 1996), one indeed discovers that they de®ne character #9, one of four hexagonal rhombohedral characters. Note that since there is no prior information to bias the choice of basis, there is no guarantee that (a, b, c) will be selected as a candidate basis for the cell; in fact, it is equally likely that the vectors (a H , b H , c H ) will be selected as shown in Fig. 6(b), case (ii). Although this basis is not reduced, application of the Kr Ïivy Â & Gruber (1976) Figure 5 Ability to index the diffraction pattern as described in x3.1, given an input beam position at various coordinates on the detector. The portion of the detector shown is the same as in Fig. 3, with the true beam position shown as a black dot. The sample is the same as in Figs. 2 and 3, with the lowangle spot-to-spot separation from the longest cell dimension being L = 1.5 mm. Red pixels are input beam positions that lead to correct re®ned beam parameters, unit-cell dimensions and Bravais symmetry. Positions where otherwise correct indexing gives monoclinic symmetry (instead of orthorhombic) are given in yellow. Incorrect indexing is shown in lavender, and white pixels indicate that no indexing was possible. The ®rst two panels show the control where the input beam position is used directly for indexing, with Bragg re¯ections from either one (a) or two (b) images. Note that in the single image used for panel (a), the c axis is parallel to the incident beam. Consequently, two lattice angles are poorly determined, accounting for the preponderance of input beam positions giving monoclinic rather than orthorhombic symmetry. When two images are used (b) this ambiguity disappears. The lattice-like arrangement of lavender patches in panel (a) corresponds to the lattice of probable (but incorrect) beam positions in Fig. 3(c). In the last two panels, the true beam position is predetermined by a grid search, based on one (c) or two (d) images. algorithm immediately recovers the Niggli-reduced basis (a, b, c), and again the conditions are ful®lled for rhombohedral character #9. Now we include experimental uncertainty in the basis vectors. Again considering case (i), beginning with basis (a, b, c) we can ask what will happen if vectors b or c are altered by small angular amounts. For either vector, a small perturbation does not alter the fact that the basis is in Nigglireduced form. Method A (x2.8) allows a tolerance on the metric conditions, thus the ®nding that the likely lattice is of character #9 is unchanged. This is exactly as expected: a small uncertainty should not change the ®nal conclusions about symmetry. In contrast, suppose that the starting point is the case (ii) basis set (a H , b H , c H ), and consider perturbations of either b H or c H in the ab plane as shown in Fig. 6(c). Here the situation is quite different: application of the Kr Ïivy Â -Gruber algorithm gives the Niggli-reduced basis (a, b, c) when either b H or c H is rotated slightly counterclockwise, but a different Niggli-reduced basis when either is rotated clockwise. The alternate reduced basis satis®es different metric conditions, giving a monoclinic C-centered cell. Thus in case (ii), both experimental uncertainty and in®nitesimally small theoretical perturbations can lead to misidenti®cation of symmetry under method A.
It is appropriate to ask under what experimental conditions this can occur. We investigated the ability to determine the symmetry of a crystal in space group R32, with a = b = 143 A Ê , c = 519 A Ê . As explained in x2.6, the candidate basis (three directions) is selected from the set {t}, based on a ranking of how well the basis vectors can predict the input Bragg spots.   Determination of metric symmetry from a single oscillation image as a function of the input beam position. The true beam position is indicated by a black dot. Green pixels indicate correct determination of rhombohedral symmetry, while yellow denotes input parameters that incorrectly lead to a monoclinic C-centered lattice, and cyan denotes a triclinic lattice. Grey pixels indicate that no indexing is possible. Method A (x2.8) was used to determine symmetry in panel (a), while method B was used in panel (b).
Small differences in experimental parameters, particularly the direct-beam position, can in¯uence the relative ranking of the candidate bases. We set up a two-dimensional grid around the true beam position, with each grid element in turn being considered as the assumed beam center. For the purpose of this test, no Fourier coef®cients were used to re-determine the beam center (x2.3), nor was any parameter re®nement undertaken; we simply took the highest scoring candidate basis, applied the Kr Ïivy Â -Gruber reduction, and attempted to determine the Bravais symmetry. Fig. 7(a) shows how well method A determines the symmetry when different grid points are assumed to be the beam center. The correct rhombohedral symmetry is deduced in fewer than half of the cases. Moreover, there is no zone of convergence for either the correct rhombohedral or the incorrect monoclinic symmetry; the symmetry determination oscillates in a chaotic manner as the imposed beam center moves across the image. Both symmetries can be deduced with input beam estimates within 0.1 mm of the true beam center. While Fig. 7(a) shows the results of indexing one 0.8 oscillation image, similar results were obtained for joint indexing of two images collected 90 apart in ' (not shown). In contrast, method B produces the correct solution across most of the grid (Fig. 7b). These results support method B as a computationally tractable alternative that can unambiguously identify metric symmetry elements in the lattice. The method is thus well suited for inclusion in automatic systems.

Conclusions
While automated processing carries great potential bene®t for the beamline user, it also places high demands for robustness upon its component algorithms and software. Problem areas that can be instantly recognized by the human experimentalist using a graphical interface may go unnoticed by an automated system, with potentially disastrous results for subsequent analysis steps. The methods presented above, if incorporated into the beamline control environment, will quickly produce reliable indexing and symmetry solutions immediately after the images have been acquired. A 2.8 GHz Intel Xeon processor typically requires 7 s to choose Bragg re¯ections from a pair of 10 Mbyte images, and 11 s for the remainder of the analysis.
It is particularly striking how a pair of images (Fig. 5d) can yield a much more robust direct-beam position than a single image. It is also likely that the derived unit-cell dimensions are more accurate, since two images together sample reciprocal space more completely. With modern, scriptable beamline control software, it is typically easy to set up standard protocols to acquire these two images. One need only assume that the sample is rigidly af®xed to the goniometer and is well centered in the beam. Thus it is important to exercise care when mounting and cryocooling the crystal.
We have successfully used LABELIT to process images from about 50 crystal forms. Notably, once the program parameters are switched to accept a high-resolution cutoff of 15 A Ê , LABELIT is able to index correctly tetragonal I-centered ribosome crystals with unit-cell dimensions a = b = 674 A Ê , c = 2776 A Ê (A. Vila-Sanjurjo & J. Cate, unpublished results). Therefore, the toolbox is expected to be generally applicable to all crystallographic experiments that use the oscillation method.

Program availability
LABELIT is organized as a hybrid software package in which the high-level scripts directing the algorithm use the Python scripting language, while the numerically intensive calculations are executed by optimized compiled C++ code. LABELIT makes extensive use of code objects from our open-source Computational Crystallography Toolbox , and is therefore an example of the bene®ts of code reuse. Python bindings for C++ objects are provided by the Boost.Python library (Abrahams & Grosse-Kunstleve, 2003), and SCons (http://www.scons.org) is used as a build facility. LABELIT can be installed on a number of computing platforms, and customized for various data collection environments by modifying the included example scripts.
LABELIT will be available as a Web service at http:// cci.lbl.gov/labelit, enabling general users to upload raw image data and retrieve the model parameters from the resulting indexing solutions. LABELIT will also be available for download to non-commercial users.