research papers
Raytracing analytical absorption correction for Xray crystallography based on tomographic reconstructions
^{a}Oxford eResearch Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford OX1 3QG, United Kingdom, ^{b}Diamond Light Source, Harwell Science & Innovation Campus, Didcot OX11 0DE, United Kingdom, ^{c}Rosalind Franklin Institute, Harwell Science & Innovation Campus, Didcot OX11 0QX, United Kingdom, ^{d}Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom, and ^{e}Department of Life Sciences, Imperial College London, Exhibition Road, London SW7 2AZ, United Kingdom
^{*}Correspondence email: wes.armour@oerc.ox.ac.uk, armin.wagner@diamond.ac.uk
Processing of singlecrystal Xray diffraction data from area detectors can be separated into two steps. First, raw intensities are obtained by integration of the diffraction images, and then data correction and reduction are performed to determine structurefactor amplitudes and their uncertainties. The second step considers the diffraction geometry, sample illumination, decay, absorption and other effects. While absorption is only a minor effect in standard macromolecular crystallography (MX), it can become the largest source of uncertainty for experiments performed at long wavelengths. Current software packages for MX typically employ empirical models to correct for the effects of absorption, with the corrections determined through the procedure of minimizing the differences in intensities between symmetryequivalent reflections; these models are well suited to capturing smoothly varying experimental effects. However, for very long wavelengths, empirical methods become an unreliable approach to model strong absorption effects with high fidelity. This problem is particularly acute when data multiplicity is low. This paper presents an analytical absorption correction strategy (implemented in new software AnACor) based on a volumetric model of the sample derived from Xray tomography. Individual path lengths through the different sample materials for all reflections are determined by a raytracing method. Several approaches for absorption corrections (spherical harmonics correction, analytical absorption correction and a combination of the two) are compared for two samples, the membrane protein OmpK36 GD, measured at a wavelength of λ = 3.54 Å, and chlorite dismutase, measured at λ = 4.13 Å. Data set statistics, the peak heights in the anomalous difference Fourier maps and the success of experimental phasing are used to compare the results from the different absorption correction approaches. The strategies using the new analytical absorption correction are shown to be superior to the standard spherical harmonics corrections. While the improvements are modest in the 3.54 Å data, the analytical absorption correction outperforms spherical harmonics in the longerwavelength data (λ = 4.13 Å), which is also reflected in the reduced amount of data being required for successful experimental phasing.
1. Introduction
In Xray crystallography, intensities of reflections are proportional to the square of their structurefactor amplitudes ( ). Several factors need to be considered when calculating structurefactor amplitudes from measured intensities, such as ). Away from absorption edges, sample absorption is approximately proportional to the cube of the wavelength (Arndt, 1984). It depends on the chemical composition, density, and shape and size of the sample which includes the crystal, as well as the surrounding materials like sample mount, mother liquor, or oils and glues used to mount the crystals. Highquality relies on accurate structurefactor amplitudes. Hence, correcting the measured intensities by calculating absorption correction factors is critical. For a crystal which is not surrounded by mother liquor or mounted in a loop, the Bragg intensities after absorption correction are given by , and the absorption correction factor for the reflection h in a crystallography experiment is given by
sample illumination, decay and absorption corrections (Monaco & Artioli, 2002where L_{1}(x, y, z) and L_{2}(x, y, z) (hereafter referred to as L_{1} and L_{2}) are the incident and diffracted Xray path lengths for each crystal element dV, and μ is the of the crystal (Albrecht, 1939). Since the resulting volumetric integral calculation is intractable for irregularly shaped crystals, absorption correction for multifaced crystals has been performed by numerical methods (Busing & Levy, 1957; DeTitta, 1985). As an alternative approach, the crystal can be partitioned into fundamental tetrahedra to calculate the integral over all the tetrahedra (Howells, 1950; de Meulenaer & Tompa, 1965; Clark & Reid, 1995). Both analytical and numerical absorption corrections require an accurate description of the shape and dimensions of the crystal. One solution from the APEX3 software (Bruker, 2012) is to determine and index all the crystal faces visually and perform an analytical absorption correction. However, this is difficult when the shape of the crystal is not a regular polyhedron. In addition, the presence of other materials surrounding the crystal, such as mother liquor and sample mount, adds further complication: these materials with different absorption coefficients only contribute to the absorption effect, not to the diffraction. Semiempirical methods (North et al., 1968; Kopfmann & Huber, 1968) based on intensity measurements and assumptions on the incident and diffracted beams do not rely on knowledge of the sample shape. However, they require multiaxis goniometers, and the additional data needed for the azimuthal scans can contribute significantly to radiation damage on modern synchrotron light sources. Empirical methods which are independent of the sample geometry were developed either based on Fourier series of the incident and diffracted beams (Katayama et al., 1972; Walker & Stuart, 1983) or by using spherical harmonics (Blessing, 1995) to minimize the residual between the intensities for symmetryrelated reflections. With the introduction of large area detectors, these numerical methods to obtain an empirical correction for absorption have become popular. Spherical harmonics are now the basis for absorption correction in most data reduction software packages for macromolecular crystallography (MX), such as AIMLESS (Evans & Murshudov, 2013), hkl3000 (Minor et al., 2006), SADABS (Sheldrick, 1996) and DIALS (Winter et al., 2018; BeilstenEdmands et al., 2020), while XDS uses alternative numerical methods without spherical harmonics (Kabsch, 2010). However, the efficacy of empirical methods depends on having a large number of symmetryequivalent reflections, which can be difficult to achieve when data multiplicity is low, e.g. in the case of radiationsensitive crystals in lowsymmetry space groups.
As the analytical absorption correction does not depend on refining parameters to minimize differences between structurefactor amplitudes of symmetryrelated reflections, its success does not rely on data multiplicity. To analytically calculate absorption correction factors for a sample with irregular shape, its shape and orientation have to be characterized in detail. Previous work using optical microscopy to reconstruct a 3D model of the sample, containing crystal, sample mount and mother liquor, showed that absorption correction was viable and advantageous at lower levels of data multiplicity (Leal et al., 2008; Strutz, 2011). An alternative approach to obtain a 3D model of the sample is Xray tomography, which has been applied to either characterize or visualize crystals (Merrifield et al., 2011; Warren et al., 2013). The use of tomographic reconstructions and segmentations as a basis for absorption correction has previously been suggested by Brockhauser et al. (2008). This enables the calculation of Xray path lengths through the different materials in the sample (crystal, sample mount and mother liquor), as illustrated in Fig. 1.
While Xray absorption is not normally considered an issue at standard wavelengths in MX, it is a major limiting factor in longwavelength crystallography. Beamline I23 at Diamond Light Source, UK (Wagner et al., 2016), is a unique synchrotron instrument operating in a wavelength range between 1.1 and 5.9 Å, giving access to the absorption edges of several light elements of biological significance, such as calcium, potassium, chlorine, sulfur and phosphorus. The largest anomalous signal for sulfur is expected close to its (λ = 5.02 Å). However, the difficulties in correcting for increased sample absorption at very long wavelengths compromise the overall data quality, resulting in reduced measured anomalous signal. Applying standard absorption correction protocols, the optimal wavelength for singlewavelength anomalous diffraction experiments based on sulfur (SSAD) is found to be λ = 2.75 Å (El Omari et al., 2023), clearly indicating the need for more sophisticated methods to exploit the full potential of longwavelength crystallography.
In this paper, we introduce AnACor, a computer program that employs a raytracing method to estimate the path lengths of the incident and diffracted Xrays through the sample from a tomographic reconstruction, to calculate absorption correction factors for longwavelength Xray diffraction data. The effectiveness of AnACor is demonstrated for longwavelength data sets collected at 3.54 Å, on a crystal of the membrane protein OmpK36 GD, and at 4.13 Å, on a crystal of the hemebinding enzyme chlorite dismutase (Cld). OmpK36 GD, referred to as simply `OmpK36', is a 373 amino acid outer membrane porin from Klebsiella pneumonia involved in nutrient and antibiotic diffusion in gram negative bacteria (Wong et al., 2019), while Cld is a hemebcontaining homodimeric oxidoreductase from Cyanothece sp. PCC7425, consisting of 181 amino acids per monomer. The choice of these two samples for this study was motivated by their crystallization in lowsymmetry space groups, posing a challenge for the conventional absorption correction methods used in standard Xray diffraction scaling programs.
2. Methods
2.1. Experiment workflow and data preparation
Crystals of OmpK36 were prepared and cryoprotected as previously described with no modification (Wong et al., 2019). OmpK36 crystallized as rods in C2, with three monomers present in the Large sampletosample variations required extensive screening of crystals. The crystal selected for this study had dimensions of 260 × 30 × 30 µm. Cld crystals were produced using a protocol based on previously reported conditions (Schaffner et al., 2017) with further details provided in the supporting information, section S2. The crystal used in this study had dimensions of 190 × 150 × 90 µm and indexed in P1, with two monomers in the asymmetric unit.
All experiments were performed at the longwavelength MX beamline I23 at Diamond Light Source, UK. The invacuum sample environment comprises the cylindrical P12M detector and a multiaxis goniometer to enable collection of complete diffraction data from crystals in lowsymmetry space groups even at the longest wavelengths. A tomography camera is integrated into the beamline sample environment, allowing easy transition between the two experimental modes (Kazantsev et al., 2021). The sample preparation for invacuum data collection followed the standard protocol for beamline I23 (Duman et al., 2021). For the OmpK36 crystal, 3 × 360° of data were collected at a wavelength of λ = 3.54 Å with 0.1 s exposure per 0.1° rotation angle and a beam transmission of 50%, with a tophat Xray beam adjusted to 240 × 150 µm. To ensure completeness of the data, two of the three data sets were collected using kappa goniometry, with the kappa axis rotated to −70° and the phi axis positioned at 0° and −120°. Each of the three data sets was measured with a of 1.36 × 10^{11} photons s^{−1}, which resulted in a total absorbed dose of 6.5 MGy per data set, as calculated by Raddose3D (Zeldin et al., 2013). Since the Cld crystal diffracted to higher resolution than the OmpK36 crystal, we chose a lowdose data collection strategy. In total 22 × 360° were collected at a wavelength of λ = 4.13 Å with a 350 × 350 µm tophat beam, using an exposure of 0.1 s per 0.1°. With a beam transmission of 5%, the measured of 6.7 × 10^{9} photons s^{−1} yielded an absorbed dose of 0.1 MGy per data set. Two of the 22 data sets were collected with the kappa and phi goniometer axes at 0°, while the rest were recorded at κ = −70° and 20 different phi values, between −120° and 120°. The diffraction data were indexed and integrated with DIALS (Winter et al., 2018), providing a kappa/phi orientation matrix, raw intensities, incident vectors, scattering vectors and goniometer angles.
The diffraction experiment was immediately followed by tomography data collection at the same Xray wavelength. One 180° tomography data set was collected for each crystal, with the kappa and phi axes set at 0° and a beam size of 700 × 700 µm and 100% transmission, using a propagation distance of 4.9 mm between scintillator and sample. For OmpK36 1800 projections, 30 flatfield images (without sample) and 30 dark images (without Xrays) were collected with an exposure of 0.15 s per 0.1° rotation. The measured ^{12} photons s^{−1}, resulting in a total absorbed dose of 4.8 MGy. For the Cld crystal, 900 projections, 20 flatfield and 20 dark images were collected with an exposure of 0.28 s per 0.2° rotation and a measured of 4.3 × 10^{11} photons s^{−1}, yielding a total absorbed dose of 0.8 MGy.
for this data set was 1.5 × 10The tomography data were processed using the SAVU pipeline (Kazantsev et al., 2022), with a processing routine consisting of standard flatfield correction, followed by ring artefact removal (Vo et al., 2018) and reconstruction. For OmpK36, the reconstruction step was performed by iterative methods via the ToMoBAR module in SAVU (Kazantsev et al., 2021), as its edgeenhancing properties gave improved results. For Cld, where the data showed better contrast, the filterback projection (TomoPy) module (Gürsoy et al., 2014) was used instead. No contrast transfer function correction was applied in the processing. Flatfield images, raw projections and flatfieldcorrected projections for both samples are shown in Fig. 2. For ease of segmentation, reconstruction was performed on cropped data, to eliminate as much of the background as possible and reduce the size of the images. The OmpK36 data were cropped from an initial volume of 1600 × 1200 × 1200 voxels to 1220 × 1001 × 1001 voxels, while the Cld data were reduced to 1310 × 1181 × 1181 voxels. The pixel size in the tomography images, determined from previous beamline calibrations, was 0.3 × 0.3 µm. Manual segmentation was performed with the visualization software Avizo (Thermo Fisher), providing a 3D model with every voxel annotated as one of the different sample materials. On the basis of the sample 3D models, the absorption correction factors were calculated and exported to the scaling module in DIALS (BeilstenEdmands et al., 2020) to further correct the diffraction intensities. Published structures, Protein Data Bank (PDB) ID 6rck (Wong et al., 2019) for OmpK36, and PDB ID 5mau (Schaffner et al., 2017) for Cld, were used as starting models for the Dimple pipeline (https://ccp4.github.io/dimple/). The `  anode' option (Thorn & Sheldrick, 2011) was used to calculate anomalous difference Fourier maps and anomalous peak heights and the option `  freerflags' in the Refmac (Murshudov et al., 1997) step ensured the same R_{free} flags for all absorption correction strategies. The Crank2 phasing pipeline (Skubák & Pannu, 2013) was used for experimental phasing by singlewavelength anomalous diffraction (SAD) with identical input parameters for the different strategies: the AFRO and PRASA modules were chosen for the F_{A} estimation and determination steps, respectively, with the latter step using 4000 trials and resolution cutoffs of 2.7 Å for Cld and 3.4 Å for OmpK36.
2.2. Analytical absorption correction
For the calculation of the absorption correction factors, the integral [equation (1)] is calculated over the crystal volume (Angel, 2004) as the only source of Xray diffraction. To move from the continuous integral in equation (1) to a discrete equation, we replace crystal elements dV by crystal voxels ΔV from the tomographic reconstruction (Leal et al., 2008). This allows substitution of the integral over the volume V with a sum over the crystal voxels. Hence, the integral in equation (1) can be rewritten discretely as
where N is the number of crystal voxels in the 3D model exposed to the Xray beam. The sample in a crystallography experiment typically contains more than one material; therefore, the calculation of the absorption correction factor for a crystal voxel can be rewritten as
where L_{m}^{(n)} represents the sum of the incident path length L^{(n)}_{m1} and the diffracted path length L^{(n)}_{m2} through the material m as shown in Fig. 1.
The final squared structurefactor amplitudes are obtained after combining their absorption correction factors with the overall scale factor,
and other standard correction and scaling techniques.2.3. Absorption coefficients
Absorption coefficients are determined experimentally using the intensity values in the flatfieldcorrected tomograms [Figs. 2(c), 2(f)] as estimates of the ratio between the incident and transmitted intensities. The distances through each material required for the calculation are obtained from the 3D segmentation models. The 3D models of the OmpK36 and Cld samples in different orientations are presented in Fig. 3. To make sure the transmitted intensities on the tomograms and the path lengths from the segmentation model are aligned, a Python script is used to superpose the 2D projection of the model onto the tomogram. The areas of the flatfieldcorrected tomograms affected by phase contrast are excluded from the analysis by applying morphological shrinking. Transmission values are taken from areas in the flatfieldcorrected projection images where only solvent is present using the pixels with the 50% longest linear path lengths through the mother liquor. Next, Beer–Lambert's law is applied on a pixelbypixel basis to calculate the absorption coefficients. The mother liquor is then defined as the median of the resulting absorption coefficients. This value is used in the calculation of the absorption coefficients for the other materials (e.g. crystal or protein/detergent aggregate) according to their corresponding path lengths. A library of loop absorption coefficients based on tomography reconstructions of empty loops is available for the different loops used on the I23 beamline. The measured absorption coefficients are presented in Table 1. The composition and density of the protein/detergent aggregate are unknown, but its largest of all materials is consistent with the flatfieldcorrected projection image presented in Fig. 2(c).

2.4. Implementation details
A raytracing method is applied to compute the path lengths L^{(n)}_{m} for each crystal voxel n of the reflection h in equation (3). For a crystal voxel n, it assumes an incoming and a diffracted Xray beam originating from the voxel. These Xrays, after applying the rotational matrix of the goniometer of the reflection h, will propagate through the 3D segmented model. The coordinates of each voxel, along with its corresponding material label, are recorded. Then, the path lengths L^{(n)}_{m} of material m can be determined by the distance between the coordinates of the boundaries of the materials. By combining the absorption coefficients of the corresponding materials, the absorption factor for the crystal voxel n can be determined [equation (3)]. Finally, the total absorption factor for the reflection h is calculated by summing for all crystal voxels according to equation (2).
It is computationally intensive to rotate the overall 3D segmented model for each absorption factor calculation according to the rotational matrix of the goniometer . Instead, AnACor rotates the vectors of the incoming and diffracted beams to calculate the path lengths by inverting the goniometer matrix. The tomography experiments are always performed at kappa/phi orientations κ = 0° and ϕ = 0°. To correct data from diffraction experiments with varying kappa/phi orientations, it is essential to transform the vectors of both the incoming and diffracted beams with the kappa/phi orientation matrices taken from the DIALS experiment model. Hence, the overall transformed vectors of these beams are in the form of , where is either the vector of the incoming or that of the diffracted beam taken from the DIALS reflection data. The resulting directional vectors are used in the raytracing method. The incident beam is assumed to have a tophat profile, so no additional beam profile correction is used. If the crystal is larger than the incident Xray beam, a discriminator in the raytracing algorithm is used to determine whether a crystal voxel is inside the Xray beam.
The absorption correction software AnACor 1.0 is written in Python to facilitate future integration into DIALS (Winter et al., 2018). In order to enhance computational efficiency, NumPy 1.23.2 (Harris et al., 2020) is used for data loading and preprocessing. Numba 0.56.2 (Lam et al., 2015) is used for JIT (justintime) compilation. A typical protein crystallography data set contains hundreds of thousands of reflections. There are typically millions of crystal voxels in a 3D model, and each path length calculation can involve determining thousands of voxels along the incident and diffracted Xray paths. Consequently, calculating all absorption correction factors for samples in protein crystallography is computationally expensive. To mitigate this, a systematic sampling method with a of 2000 is applied. This sampling approach relies on the sorted arrangement of the crystal voxels, which helps in identifying the subsections of the crystal where the path lengths (L_{1} and L_{2}) are similar. Selecting every 2000th voxel from this sorted list ensures that sampling is consistently applied across the crystal. Therefore, it can capture the essential characteristics of the sample with far fewer data points, maintaining accuracy in equation (2) calculations while reducing computational load.
Parallel computing is used by the builtin multiprocessing package in Python, and the calculations of all the reflections are evenly distributed to each CPU core. After applying sampling and parallel computing, on a cluster node with 48 CPU cores, the computational time for the analytical absorption correction of one data set of OmpK36 and Cld is about 40 and 30 min, respectively, with total RAM usage of around 200 GB.
To evaluate the accuracy of the raytracing method with and without tomographic volume sampling, the absorption factor calculations were compared with previously published numerical solutions (Maslen, 2004). Three simulated shapes were considered: cubic, cylindrical and spherical, consisting of crystal material only. For consistency, a voxel size of 0.3 × 0.3 µm and the same of 2000 were applied. Both approaches gave errors smaller than 0.5% for cubic and cylindrical shapes. The errors for the spherical shape were smaller than 0.75% with the exception of those at 90°. The results for a smaller voxel size of 0.1 × 0.1 µm indicate that the error is dominated by the pixel size rather than the sampling. More details can be found in the supporting information, section S1.
The codes and further explanations of the algorithm are available at https://github.com/yishunlu222/AnACor_public.
2.5. Absorption correction strategies
Data scaling is performed by the dials.scale program in DIALS (BeilstenEdmands et al., 2020) using the following custom scaling model:
where is the overall inverse scale factor that needs to be determined for the lth observation of symmetryunique reflection h. The scale factors are determined by optimizing the scaling model parameters using a leastsquares target function as previously described (BeilstenEdmands et al., 2020). , and are, respectively, the scale term, the decay term and the spherical harmonics correction term of the default physical model. The absorption correction factors are precalculated by AnACor for each reflection and not optimized during the scaling process.
The scale term models intensity variations as a function of rotation, while the decay term is a function of resolution and rotation. The spherical harmonics term corrects the intensities with a model dependent on the incoming and scattered beam paths. The `absorption_level = high' option in dials.scale (Winter et al., 2022) was used for all approaches that included this term, which reduces the program's restraints on and uses six orders of spherical harmonics basis functions, to allow high and complex levels of absorption to be modelled. The `anomalous = False' option in dials.scale was used, as the low multiplicity of individual data sets was found to lead to unstable error model for some data sets when the option `anomalous = True' was used.
To evaluate the analytical absorption correction by raytracing in AnACor, four approaches are compared:
(i) No absorption correction (labelled as NO) ().
(ii) Spherical harmonics correction (default in dials.scale, SH) ().
(iii) Analytical absorption correction described in this work (AC) ().
(iv) Analytical absorption correction described in this work, combined with spherical harmonics correction (ACSH) ().
The parameters for each part of the scaling model (except ) are jointly refined against the integrated intensities in each case and therefore will be different in each approach, i.e. . The combination of the analytical absorption correction with spherical harmonics allows the effect of absorption to be corrected by an accurate analytical model, while still enabling the spherical harmonics model to correct for any residual effects.
3. Results
In crystallography, various metrics, such as R factors (Weiss & Hilgenfeld, 1997; Diederichs & Karplus, 1997; Weiss, 2001), correlation coefficients (Karplus & Diederichs, 2012) and signaltonoise ratios, are used to evaluate data quality. Additionally, for longwavelength crystallography peak heights in the phased anomalous difference Fourier maps are important quality indicators (Yang et al., 2003). These metrics are used in combination with the success of experimental phasing by SAD to assess the three different absorption correction strategies and compare them with scaling without absorption correction.
Merging and . As expected, for both samples, all four strategies result in similar resolution ranges, completeness and number of unique reflections. All three approaches to deal with absorption unsurprisingly lead to significant improvements in data quality over the data without correction.
(based on three data sets for OmpK36 and 22 for Cld) are presented in Table 2

For OmpK36, the analytical absorption correction (AC) gives equivalent merging R factors to spherical harmonics correction (SH), with an overall R_{merge} of 0.119 for both. Notably, the AC strategy leads to an increase in the mean I/σ(I), from 16.42 (SH) to 21.37 (AC), and a stronger anomalous signal, as measured by the anomalous slope (1.69 with AC, as opposed to 1.31 with SH). The anomalous slope (Evans, 2006) is the slope of the central region of a normal probability plot of anomalous differences: a slope greater than one indicates that the anomalous differences are larger than their uncertainties in aggregate. The combination of AC and SH corrections (ACSH) gives further improvements in the merging R factors, signaltonoise ratio and anomalous signal, with the R_{merge} decreasing to 0.105, the mean I/σ(I) increasing to 24.92 and the anomalous slope increasing to 1.91. In Fig. 4(a), the anomalous peak heights from sulfur atoms for the three correction strategies are compared with no absorption correction for OmpK36. In total 12 sulfur atoms are found, from two methionine residues and two sulfates in the trimeric structure. A significant increase in peak heights is observed with all three absorption correction methods. AC generally gives better results than SH, with the exception of the heights of MET310 in chain B and SO41 in chain C, which are larger in the SH data. Overall, the ACSH strategy brings further improvements in peak heights, except for the weakest anomalous peak, SO42, where AC and ACSH perform similarly. Detailed information on the anomalous peaks of OmpK36 can be found in Tables S3–S6 in the supporting information. The for all strategies follow a similar trend to the merging statistics, with R factors being the lowest for ACSH. SAD phasing was performed as a further test of the efficacy of analytical absorption corrections. Phasing was attempted with one, two out of three and all three data sets available. The results, summarized in Table 3, show that the ACSH strategy outperforms the others in requiring only two data sets for successful phasing despite the overall completeness of 89.2% and multiplicity of 8.3. Both AC and SH need all three data sets (98.9% overall completeness, multiplicity of 11.0), while the NO strategy is unsuccessful. The numbers of correct residues automatically built into the experimental maps are identical between the three successful strategies, indicating that the quality of the maps is of similar standard and the lower data completeness used for the ACSH approach has no impact.

For Cld, the merging R factors, I/σ(I) and anomalous slopes are noticeably better for AC compared with SH. All merging statistics show further improvement for the combined ACSH correction. In contrast to OmpK36, where data quality indicators changed little between the SH and AC strategies, for Cld, the analytical absorption correction strategy (AC) gives substantially better data statistics compared with SH. For instance, in terms of the merging R factors, we observe a decrease of the R_{merge} from 0.163 with SH to 0.112 with AC and a further decrease to 0.095 with the ACSH treatment. There is also an increase in the overall mean I/σ(I) from 20.22 for SH to 44.73 for the ACSH strategy with the highresolution shell I/σ(I) following this trend. The anomalous slope value increases from 1.36 with SH to 2.48 and 2.5 for AC and ACSH, respectively. This indicates an impressive improvement in the anomalous signal as a result of applying analytical absorption corrections.
The anomalous peak heights for the different absorption correction strategies for Cld are shown in Fig. 4(b). In addition to three methionines and one cysteine per polypeptide chain, each Cld monomer also binds an Fecontaining heme ligand and a Cl^{−} anion. A single SO_{4}^{2−} anion could be identified for the dimer, bringing the total number of anomalous scatterers to 13. SH leads to higher anomalous peak heights compared with no absorption correction. In line with the improved merging statistics, the anomalous signal in AC and ACSH is stronger than that in SH. ACSH gives the highest anomalous peak heights overall. While for OmpK36 the improvements in peak heights given by the AC and ACSH strategies over SH are quite modest, for Cld the increase from SH to AC/ACSH is more substantial. For the largest peaks, MET99 and CYS132, we observe increases in peak heights from 14 to 17 and 18σ for AC and ACSH, respectively. Further details of anomalous peak heights for Cld may be found in the supporting information, Tables S7–S10. The experimental phasing results for this sample (presented in Table 3) show that the AC and ACSH strategies perform very similarly, with a successful phasing outcome requiring only two out of 22 data sets, with an overall completeness of 83.3% and overall multiplicity of 4.4. For the SH strategy, three data sets are needed, with a higher overall completeness of 94.7% and multiplicity of 5.8. These results follow the same pattern seen with the data quality indicators discussed above, where the AC strategy outperforms the SH approach. Experimental phasing is unsuccessful for the Cld data with no absorption corrections, even after merging all 22 data sets.
To illustrate the extent of the AC and SH corrections, histograms of the perreflection analytical absorption correction factors () and spherical harmonics correction terms () are presented in Fig. 5 for OmpK36 and Cld. For both data sets, when employing the SH correction strategy, the resulting spherical harmonics terms () are distributed over a large range (0.5–1.5). When employing the ACSH strategy, the inclusion of the absorption correction factors () (shown on the right of Fig. 5) leads to unimodal distributions over a narrower range (0.7–1.3) centred around 1. As the `no correction value' for the SH model is = 1.0, fitting the additional spherical harmonics terms in the ACSH strategy results in further improvement in the internal consistency compared with AC alone, allowing correction for additional systematic effects present in the data.
4. Discussion and conclusion
In this study we demonstrate the successful application of analytical absorption corrections based on 3D reconstructions from Xray tomography implemented in AnACor. We describe the algorithm for calculating the path lengths from 3D models by a raytracing method. Two very long wavelength experiments from crystals of the proteins OmpK36 and Cld indicate that this approach substantially improves data quality and the success of experimental phasing compared with the standard scaling protocol based on spherical harmonics. Scaling without any absorption correction is presented as a control and unsurprisingly yields the poorest data quality statistics and anomalous peak heights, and for both samples experimental phasing is unsuccessful. This clearly indicates that data quality is severely affected by absorption effects, demonstrating the need for absorption corrections.
Data from OmpK36, which crystallizes in the monoclinic C2, were collected at a wavelength of λ = 3.54 Å. A clear trend is visible: the analytical absorption correction (AC) is better than the spherical harmonics correction (SH) and the combination of the two (ACSH) improves the data even further. While the overall improvements on statistics are small, the fact that the OmpK36 structure could be solved after ACSH correction using only 2/3 of the data needed for the AC and SH strategies clearly highlights the importance of such an improvement. For the Cld data (P1, λ = 4.13 Å) the same trend is observed. However, while the difference between AC and ACSH is small, they outperform the spherical harmonics correction. This is in particular reflected in the outcome from experimental phasing, where two data sets are sufficient for both AC and ACSH, while three data sets are needed to solve the structure from data corrected by SH. In general, the combined approach of ACSH gives the best results for both samples/wavelengths, as it can model additional systematic effects present in the experimental data.
Xray absorption increases with the cube of the wavelength, so a change from λ = 1.0 Å to λ = 4.13 Å leads to a 70fold increase in absorption coefficients. The analytical absorption correction compensates for this increase, reflected in the narrow unimodal distribution of the resulting spherical harmonics terms centred around 1.0 in the two ACSH cases. Both samples used in this study crystallize in either monoclinic (OmpK36) or triclinic (Cld) space groups. This in combination with the asymmetry of the cylindrical P12M detector, with an aspect ratio of 2:1, leads to a low overall data multiplicity of five for OmpK36 and only three in the case of Cld, as well as poor data completeness for a single 360° data set. In contrast to the spherical harmonics, the analytical absorption correction is not dependent on multiple observations, and hence is ideally suited for crystals in lowsymmetry space groups or for radiationsensitive crystals at long wavelengths.
AnACor is able to correct data in multiple crystal orientations and for cases where the beam is smaller than the sample. Future work will allow the use of experimentally determined beam profiles and increase the efficiency and speed of the software. Currently, the bottleneck is the manual segmentation step to create the 3D models. The increased phase contrast at long wavelengths and limitations with the current beamline hardware, in particular the sphere of confusion of the goniometer, lead to blurred boundaries in the tomographic reconstructions. The resulting inaccuracies in the segmented 3D model can affect both the path length and the calculations. The next stage of this work is therefore to understand, quantify and reduce these errors impacting the 3D model. Analytical absorption corrections are beneficial not only for longwavelength macromolecular crystallography but also for highly absorbing samples in chemical crystallography. In this work the segmented 3D model is obtained by Xray tomography on beamline I23 at Diamond Light Source. However, AnACor can also be used for analytical absorption corrections for data from other sources, as long as a file with annotated voxels is provided and the relation between the coordinate systems of the 3D model and the diffraction experiment is known.
Supporting information
Supporting information. DOI: https://doi.org/10.1107/S1600576724002243/yr5123sup1.pdf
Footnotes
‡Joint first authors
Acknowledgements
The authors acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility in carrying out this work (https://dx.doi.org/10.5281/zenodo.22558).
Funding information
AMO and JJAGK were supported by Diamond Light Source and the UK Science and Technology Facilities Council (STFC). AMO acknowledges the Biotechnology and Biological Sciences Research Council, and is the recipient of a Wellcome Investigator Award 210734/Z/18/Z and a Royal Society Wolfson Fellowship RSWF\R2\182017.
References
Albrecht, G. (1939). Rev. Sci. Instrum. 10, 221–222. CrossRef Google Scholar
Angel, R. J. (2004). J. Appl. Cryst. 37, 486–492. Web of Science CrossRef CAS IUCr Journals Google Scholar
Arndt, U. W. (1984). J. Appl. Cryst. 17, 118–119. CrossRef CAS Web of Science IUCr Journals Google Scholar
BeilstenEdmands, J., Winter, G., Gildea, R., Parkhurst, J., Waterman, D. & Evans, G. (2020). Acta Cryst. D76, 385–399. Web of Science CrossRef IUCr Journals Google Scholar
Blessing, R. H. (1995). Acta Cryst. A51, 33–38. CrossRef CAS Web of Science IUCr Journals Google Scholar
Brockhauser, S., Di Michiel, M., McGeehan, J. E., McCarthy, A. A. & Ravelli, R. B. G. (2008). J. Appl. Cryst. 41, 1057–1066. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bruker (2012). APEX. Bruker AXS Inc., Madison, Wisconsin, USA. Google Scholar
Busing, W. R. & Levy, H. A. (1957). Acta Cryst. 10, 180–182. CrossRef CAS IUCr Journals Web of Science Google Scholar
Clark, R. C. & Reid, J. S. (1995). Acta Cryst. A51, 887–897. CrossRef CAS Web of Science IUCr Journals Google Scholar
DeTitta, G. T. (1985). J. Appl. Cryst. 18, 75–79. CrossRef CAS Web of Science IUCr Journals Google Scholar
Diederichs, K. & Karplus, P. A. (1997). Nat. Struct. Mol. Biol. 4, 269–275. CrossRef CAS Web of Science Google Scholar
Duman, R., Orr, C. M., Mykhaylyk, V., El Omari, K., Pocock, R., Grama, V. & Wagner, A. (2021). J. Vis. Exp. 170, e62364. Google Scholar
El Omari, K., Duman, R., Mykhaylyk, V., Orr, C. M., LatimerSmith, M., Winter, G., Grama, V., Qu, F., Bountra, K., Kwong, H. S., Romano, M., Reis, R. I., Vogeley, L., Vecchia, L., Owen, C. D., Wittmann, S., Renner, M., Senda, M., Matsugaki, N., Kawano, Y., Bowden, T. A., Moraes, I., Grimes, J. M., Mancini, E. J., Walsh, M. A., Guzzo, C. R., Owens, R. J., Jones, E. Y., Brown, D. G., Stuart, D. I., Beis, K. & Wagner, A. (2023). Commun. Chem. 6, 219. Web of Science CrossRef PubMed Google Scholar
Evans, P. (2006). Acta Cryst. D62, 72–82. Web of Science CrossRef CAS IUCr Journals Google Scholar
Evans, P. R. & Murshudov, G. N. (2013). Acta Cryst. D69, 1204–1214. Web of Science CrossRef CAS IUCr Journals Google Scholar
Gürsoy, D., De Carlo, F., Xiao, X. & Jacobsen, C. (2014). J. Synchrotron Rad. 21, 1188–1193. Web of Science CrossRef IUCr Journals Google Scholar
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., GérardMarchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C. & Oliphant, T. E. (2020). Nature, 585, 357–362. Web of Science CrossRef CAS PubMed Google Scholar
Howells, R. G. (1950). Acta Cryst. 3, 366–369. CrossRef IUCr Journals Web of Science Google Scholar
Kabsch, W. (2010). Acta Cryst. D66, 125–132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Karplus, P. A. & Diederichs, K. (2012). Science, 336, 1030–1033. Web of Science CrossRef CAS PubMed Google Scholar
Katayama, C., Sakabe, N. & Sakabe, K. (1972). Acta Cryst. A28, 293–295. CrossRef IUCr Journals Web of Science Google Scholar
Kazantsev, D., Duman, R., Wagner, A., Mykhaylyk, V., Wanelik, K., Basham, M. & Wadeson, N. (2021). J. Synchrotron Rad. 28, 889–901. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kazantsev, D., Wadeson, N. & Basham, M. (2022). SoftwareX, 19, 101157. Google Scholar
Kopfmann, G. & Huber, R. (1968). Acta Cryst. A24, 348–351. CrossRef IUCr Journals Web of Science Google Scholar
Lam, S. K., Pitrou, A. & Seibert, S. (2015). Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, pp. 1–6. Association for Computing Machinery. Google Scholar
Leal, R. M. F., Teixeira, S. C. M., Rey, V., Forsyth, V. T. & Mitchell, E. P. (2008). J. Appl. Cryst. 41, 729–737. Web of Science CrossRef CAS IUCr Journals Google Scholar
Maslen, E. N. (2004). International Tables for Crystallography, 3rd ed., edited by E. Prince, Vol. C, ch. 6.3.3, pp. 600–608. Dordrecht: Kluwer. Google Scholar
Merrifield, D. R., Ramachandran, V., Roberts, K. J., Armour, W., Axford, D., Basham, M., Connolley, T., Evans, G., McAuley, K. E., Owen, R. L. & Sandy, J. (2011). Meas. Sci. Technol. 22, 115703. Web of Science CrossRef Google Scholar
Meulenaer, J. de & Tompa, H. (1965). Acta Cryst. 19, 1014–1018. CrossRef IUCr Journals Web of Science Google Scholar
Minor, W., Cymborowski, M., Otwinowski, Z. & Chruszcz, M. (2006). Acta Cryst. D62, 859–866. Web of Science CrossRef CAS IUCr Journals Google Scholar
Monaco, H. L. & Artioli, G. (2002). Fundamentals of Crystallography, 2nd ed., edited by H. Giacovazzo, ch. 5, pp. 376–388. Oxford University Press. Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
North, A. C. T., Phillips, D. C. & Mathews, F. S. (1968). Acta Cryst. A24, 351–359. CrossRef IUCr Journals Web of Science Google Scholar
Schaffner, I., Mlynek, G., Flego, N., Pühringer, D., LibisellerEgger, J., Coates, L., Hofbauer, S., Bellei, M., Furtmüller, P. G., Battistuzzi, G., Smulevich, G., DjinovićCarugo, K. & Obinger, C. (2017). ACS Catal. 7, 7962–7976. Web of Science CrossRef CAS PubMed Google Scholar
Sheldrick, G. M. (1996). SADABS. University of Göttingen, Germany. Google Scholar
Skubák, P. & Pannu, N. S. (2013). Nat. Commun. 4, 2777. Web of Science PubMed Google Scholar
Strutz, T. (2011). IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 797–807. Web of Science CrossRef Google Scholar
Thorn, A. & Sheldrick, G. M. (2011). J. Appl. Cryst. 44, 1285–1287. Web of Science CrossRef CAS IUCr Journals Google Scholar
Vo, N. T., Atwood, R. C. & Drakopoulos, M. (2018). Opt. Express, 26, 28396–28412. Web of Science CrossRef PubMed Google Scholar
Wagner, A., Duman, R., Henderson, K. & Mykhaylyk, V. (2016). Acta Cryst. D72, 430–439. Web of Science CrossRef IUCr Journals Google Scholar
Walker, N. & Stuart, D. (1983). Acta Cryst. A39, 158–166. CrossRef CAS Web of Science IUCr Journals Google Scholar
Warren, A. J., Armour, W., Axford, D., Basham, M., Connolley, T., Hall, D. R., Horrell, S., McAuley, K. E., Mykhaylyk, V., Wagner, A. & Evans, G. (2013). Acta Cryst. D69, 1252–1259. Web of Science CrossRef CAS IUCr Journals Google Scholar
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals Google Scholar
Weiss, M. S. & Hilgenfeld, R. (1997). J. Appl. Cryst. 30, 203–205. CrossRef CAS Web of Science IUCr Journals Google Scholar
Winter, G., Beilsten–Edmands, J., Devenish, N., Gerstel, M., Gildea, R. J., McDonagh, D., Pascal, E., Waterman, D. G., Williams, B. H. & Evans, G. (2022). Protein Sci. 31, 232–250. Web of Science CrossRef CAS PubMed Google Scholar
Winter, G., Waterman, D. G., Parkhurst, J. M., Brewster, A. S., Gildea, R. J., Gerstel, M., FuentesMontero, L., Vollmar, M., MichelsClark, T., Young, I. D., Sauter, N. K. & Evans, G. (2018). Acta Cryst. D74, 85–97. Web of Science CrossRef IUCr Journals Google Scholar
Wong, J. L., Romano, M., Kerry, L. E., Kwong, H.S., Low, W.W., Brett, S. J., Clements, A., Beis, K. & Frankel, G. (2019). Nat. Commun. 10, 1–10. Web of Science PubMed Google Scholar
Yang, C., Pflugrath, J. W., Courville, D. A., Stence, C. N. & Ferrara, J. D. (2003). Acta Cryst. D59, 1943–1957. Web of Science CrossRef CAS IUCr Journals Google Scholar
Zeldin, O. B., Gerstel, M. & Garman, E. F. (2013). J. Appl. Cryst. 46, 1225–1230. Web of Science CrossRef CAS IUCr Journals Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.