research papers
Rigidbody motion is the main source of diffuse scattering in protein crystallography
^{a}Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
^{*}Correspondence email: l.m.j.kroonbatenburg@uu.nl
The origin of diffuse Xray scattering from protein crystals has been the subject of debate over the past three decades regarding whether it arises from correlated atomic motions within the molecule or from rigidbody disorder. Here, a
approach to modelling diffuse scattering is presented that uses ensembles of molecular models representing rigidbody motions as well as internal motions as obtained from ensemble This approach allows oversampling of and comparison with equally oversampled diffuse data, thus allowing the maximum information to be extracted from experiments. It is found that most of the diffuse scattering comes from correlated motions within the with only a minor contribution from longerrange correlated displacements. Rigidbody motions, and in particular rigidbody translations, make by far the most dominant contribution to the diffuse scattering, and internal motions give only a modest addition. This suggests that modelling biologically relevant protein dynamics from diffuse scattering may present an even larger challenge than was thought.1. Introduction
Xray crystallography has been the main method for solving macromolecular structures for several decades. With the advent of highly brilliant Xray sources and photoncounting pixelarray detectors, it has evolved into a highly automated technique, even for very small micrometresized crystals of large molecular complexes; this has allowed its widespread use by structural biologists. Crystallography makes use of the enhancement of Xray scattering caused by the periodic arrangement of molecules in a lattice, and datacollection and structuresolution techniques focus on obtaining the intensities of the Bragg reflections and using them to refine a structural model. Any background scattering is removed in the integration process and is treated as a nuisance rather than as a carrier of information. However, correlated motion or disorder of atoms in the crystal causes diffuse scattering outside the Bragg peaks (note that Xray diffraction experiments cannot distiguish between static and dynamic disorder). While amplitudes of motion result in the B factors, it is the correlation in motion that is exclusively contained in the diffuse scattering. It is estimated that for a protein crystal with a modest B factor of 20 Å^{2} the total diffuse scattered intensity exceeds that of the Bragg intensity beyond a resolution of 3.8 Å (Clarage et al., 1992). Access to information on correlated motion of biomolecules could provide insight into their dynamics, which are generally considered to be crucial to their function (HenzlerWildman & Kern, 2007). Understanding and modelling the diffuse scattering potentially adds valuable information to what we can learn from Bragg scattering (Meisburger et al., 2017).
The first attempts at interpreting diffuse scattering in terms of proteinmolecule motions or internal mobilities were made in the 1980s and 1990s. In a seminal paper, Caspar et al. (1988) developed a liquidlike model to explain the observed variational diffuse scattering features of rhombohedral insulin crystals (see Section 2 for a description of the various types of diffuse scattering). They found that the two main features that were observed, broad cloudy scattering and narrower halos around the Bragg peaks, could be modelled by two displacement correlation functions with coupling distances of 6 and 20–30 Å, respectively. They ruled out the possibility that the diffuse scattering was caused by lowfrequency lattice vibrations which would give rise to thermal diffuse scattering (TDS), as these would produce much narrower halos. In contrast, their observations indicated significant correlation between nearestneighbour molecules. In a later paper by Clarage et al. (1992), this approach was further extended and applied to triclinic and tetragonal lysozyme crystals. Again, for each crystal two components of the diffuse scattering could be modelled: a shortrange correlation of internal movements with a coupling distance of 6 Å, which was interpreted as changes of torsion angles in the backbone or neighbouring sidechain displacements, and longrange latticecoupled displacements of 50 Å in distance. In contrast to these findings, Pérez et al. (1996) concluded that rigidbody movements are the major contribution to the diffuse scattering of tetragonal lysozyme crystals. Their model reproduced the shape of the observed diffuse patches (speckles) with roughly equal contributions from translational and rotational displacements. A further argument for this model is that the B factors of C^{α} positions are reproduced. Moleculardynamics simulations of orthorhombic lysozyme (Héry et al., 1998) further supported rigidbody translations, although it was suggested that only the backbone atoms form the rigid core, with the side chains forming separate rigid bodies.
In the following years, Wall and coworkers (Wall, Clarage et al., 1997; Wall, Ealick et al., 1997) published methods to extract threedimensional diffuse scattering maps from experimental data. Until then, all data had been extracted from single (still) images and mapped onto the twodimensional detector plane by intersection with the They applied their techniques to staphylococcal nuclease and calmodulin crystals and fitted the diffuse scattering in both cases using Caspar's liquidlike motional models, although in the latter case there were additional streaks in the scattering data caused by nearestneighbour coupling that required an anisotropic treatment.
The debate on whether the variational diffuse scattering is caused by internal correlated motion or rigidbody translations and rotations became dormant for some time, but has recently been revived, starting with a series of papers by Van Benschoten and Wall (Van Benschoten et al., 2015, 2016; Wall, 2018). In the first paper, diffuse scattering maps are generated from translation–libration–screw (TLS) models as used in the structural of protein crystal structures. However, different selections of TLS groups produced markedly different diffuse patterns. In a very enlightening paper, Van Benschoten et al. (2016) showed that threedimensional diffuse scattering data can be obtained from routine data collections from protein crystals using the highly brilliant Xray sources that are currently available and modern pixelarray detectors (PADs). They analysed the diffuse scattering of cyclophilin A (CypA) and trypsin using various models and concluded that TLS models did not agree well with the data, but that normalmode (NM) analysis and liquidlike motion (LLM) models gave much better agreement. In contrast, Ayyer et al. (2016) concluded that the continuous scattering visible as a speckle pattern in XFEL data beyond the 4.5 Å Bragg limit from crystals of the integral membraneprotein complex photosystem II is caused by translational lattice disorder. The diffuse scattering then becomes the incoherent sum of many (rotationally) aligned singlemolecule diffraction patterns. Iterative phasing of the continuous diffraction gave Fourier amplitudes and phases to 3.3 Å resolution and muchimproved electron density. This method is further detailed in Chapman et al. (2017). Recently, Peck and coworkers showed evidence for longerrange intermolecular correlated motions, i.e. longer than the size of one molecule, in three different protein crystals (Peck et al., 2018), and Polikanov and Moore suggested displacements arising from acoustic lattice vibrations in ribosome crystals, implying lowfrequency motions of whole molecules (Polikanov & Moore, 2015). Previously, this longrange order had also been observed by Doucet & Benoit (1987) for orthorhombic lysozyme.
Models for diffuse scattering from protein crystals can be subdivided into those that use analytical expressions with only a few parameters, such as the liquidlike motion model, and those that use molecular model coordinates, such as normalmode analysis, TLS models and moleculardynamics simulations. None of these approaches has given a conclusive structural interpretation of the correlated motion that is responsible for diffuse scattering. A comprehensive review containing an excellent section on diffuse scattering can be found in Meisburger et al. (2017). The quality indicators that should be used to quantify the agreement between modelled and experimental diffuse data have not yet been well established in the field. For Bragg data, R_{work} and R_{free} in structural and realspace electrondensity correlation coefficients between model and observed data are well accepted.
In this work, we study how diffuse scattering is built up from various structural contributions in the full threedimensional Bfactor fingerprint or sampled poses from motions described by TLS models, and generated ensembles from ensemble of the crystal structures (Burnley et al., 2012) to model internal motions. The diffuse maps were calculated by our newly developed method, allowing sampling of in between integer We extracted diffuse scattering intensities from experimental diffraction data of CypA and lysozyme and converted these to full threedimensional reciprocalspace maps. Since the diffuse signal is continuous through sampling only on Bragg spots can lead to a loss of information. The size of the pixels and the rotation scan width of the images allows oversampling of the by a factor of 5–10, i.e. 5^{3}–10^{3} more voxels can be assigned than just those belonging to integer We will show that rigidbody contributions to diffuse scattering are dominant by analysing different aspects. (i) We calculate linear correlation coefficients (CCs) between the maps and compare these with literature values. (ii) We visually inspect intensity distributions (speckle patterns) in both the calculated and experimental twodimensional and threedimensional diffuse maps. (iii) We calculate the contribution of internal motion to the diffuse features. (iv) We make an unbiased estimate of the structural unit that is responsible for the diffuse scattering by calculating the of the experimental diffuse data.
We simulate diffuse scattering from an ensemble of molecular models that represent disorder in crystals through rigidbody motions and/or internal motions. For this, we sampled rigidbody translations and rotations from Gaussian distributions based on the refined2. Theory of diffuse scattering from disordered crystals
Diffuse scattering caused by static or dynamic disorder can be understood by considering the general equation for the total scattering of a crystal in terms of a lattice summation of unit cells containing the scattering atoms,
The first double summation is over all periodic lattice points with positional vectors R_{N} in three dimensions; the second term runs over the positional coordinates of atoms in the unit cells. Q is the vectorial difference between the incident and scattered wavevectors and has length 1/d = 2sinθ/λ. If the crystal were strictly ordered, the total diffracted intensity^{1} would be
where N_{a} is the number of unit cells along the a axis, and likewise for N_{b} and N_{c}, and F is the of every Let us consider deviations of atoms from their ideal positions in the unit cells. Each atom j will be displaced by a vector δ_{j} from its average position 〈r_{j}〉 . The total scattering then becomes
The variation of atom positions produces diffuse scattering and is dependent on the type of motion or disorder. Four classes can be distinguished.
) is the general equation for describing diffuse scattering and can be expanded in several ways. We follow James (1958) in deriving the results for random, independent and isotropic displacements of atoms. Averaging over the unit cells reduces the last exponential to exp{−2π^{2}[(〈δ_{j} − δ_{k}〉)·Q]^{2}} (where use is made of a Taylor expansion, cut off after the quadratic term) and in addition 〈δ_{j} − δ_{k}〉^{2} ≃ 〈δ_{j}^{2}〉 + 〈δ_{k}^{2}〉.where N_{t} is the number of unit cells in the crystal. The last term is the usual Bragg intensity modulated with the Debye–Waller factor, and peaks at because of the lattice sum. The first term is the diffuse scattering of type (i) that is spherical around the incident beam, and the reduction in intensity by the Debye–Waller factor from the Bragg part reappears in the diffuse scattering.
Now, suppose that the P1 symmetry) and that the molecules have random isotropic translational displacements. The atomic displacements are thus fully correlated and all atoms within a are displaced over the same vector δ_{N}. The subscripts j and k in (3) can be dropped and, following the same reasoning as above, we obtain
contains one molecule (where Ft[〈ρ(r)〉] is the Fourier transform of the average electron density and 〈δ〉 is the average displacement. We see that the diffuse scattering is proportional to the squared Fourier transform of the unitcell density. In the case of symmetryrelated molecules that are displaced independently, Ayyer et al. (2016) have shown that the diffuse scattering is proportional to the incoherent sum of the squared Fourier transforms of the independent rigid units. This principle was exploited by Ayyer et al. (2016) and Chapman et al. (2017), who used continuous scattering from translationally disordered crystals for phasing beyond the Bragg diffraction limit. The diffuse scattering is that of type (ii). It is important to note that the maximum diffuse scattered intensity is achieved by these rigidbody translations as all atoms move in a correlated fashion and the Fourier transform of the whole molecule appears in (5). Also note that increasing the average displacement 〈δ〉 (i.e. increasing the disorder of the crystal) does not change the diffuse pattern (the Fourier transform) but only scales the intensties.
An effort to derive such equations to incorporate rotational disorder was undertaken by Moore (2009). It followed from his paper that the diffuse scattering caused by rotational disorder looks completely different from that of translation disorder. If atomic displacements are correlated in a complex way, including rigidbody rotations, it is easier to rearrange (3) by incorporating all atomic displacements into the varying structure factors (Welberry, 1985; Moss et al., 2003),
where R_{M} is the difference vector between unitcell origins. (6) can be rewritten as
The first part is the Bragg scattering; the second part, which contains a possible correlation between unit cells R_{M} apart, is responsible for the diffuse scattering. When correlations exist between atom motions on length scales larger than the sharp diffuse scattering of types (iii) and (iv) is observed. It is convenient to rewrite the second part of (7) in terms of correlation coefficients (Moss et al., 2003). In this paper, we are only concerned with diffuse scattering of type (ii). Thus, if no correlations across unit cells exists, (7) reduces to
The first part is the Bragg scattering, which becomes N_{t}〈F〉^{2} after integration over the peak width resulting from the finite size of the crystal. The second part is the diffuse scattered intensity and is commonly rewritten as
the well known Guinier equation for modelling diffuse scattering caused by motions within the
and which we exploited in this work. Thus, for such motions it is sufficient to calculate the variance in structure factors.3. Materials and methods
3.1. Diffraction data for CypA and hen eggwhite lysozyme
Experimental data for cyclophilin A (CypA) were obtained from the SBGrid Data Bank (https://data.sbgrid.org/dataset/68; Fraser, 2015). The data were recorded on beamline 111 at Stanford Synchrotron Radiation Light source using a Dectris PILATUS 6M pixelarray detector, a rotation range of 180°, a rotation scan width of 0.5° and an exposure time of 0.2 s. The data were from a single crystal at an ambient temperature of 293 K with minimal surrounding mother liquor. The data were indexed with DirAx (Duisenberg, 1992); unitcell and instrument parameters were refined with Peakref (Schreurs, 1999b). A significant offset from the horizontal orientation of the spindle axis was found with some 5° of reorientation of the crystal during the scan. Refined unitcell matrices were used for reciprocalspace reconstruction. The structural models were generated based on by Van Benschoten et al. (2016) and deposited as PDB entry 5f66.
Crystals of hen eggwhite lysozyme (Sigma–Aldrich, Schnelldorf, Germany) were obtained using the hangingdrop vapourdiffusion method with a protein concentration of 25 mg ml^{−1}. The crystals had dimensions of 100 × 100 × 20 µm. Data were collected on beamline ID30A3 at the European Synchrotron Radiation Facility (ESRF) using a Dectris EIGER X 4M detector. One crystal was mounted on a MicroMesh Crystal Mount (MiTeGen) and kept at constant humidity using the HC1 Humidity Control Device (SanchezWeatherby et al., 2009) and ambient temperature (293 K). Images were recorded over a rotation range of 180° and were finesliced in 0.1° per image with 0.01 s exposure. Images were merged into 1° frames prior to indexing with DirAx. The unitcell matrix was refined with Peakref (Schreurs, 1999b) and reflection data were processed with EVAL15 (Schreurs et al., 2010) to 1.3 Å resolution (Supplementary Table S1) and scaled using SADABS (Sheldrick, 1996). The structure was refined against these data using phenix.refine (Adams et al., 2010; Supplementary Table S1).
3.2. Reconstruction of diffuse scattering maps in reciprocal space
All of the software used to generate diffuse scattering maps forms part of the EVAL software suite (Adams et al., 2010; Schreurs, 1999a). For each image, badpixels masks were generated. These comprise panel gaps (indicated by a pixel value of `−1' in the image file) and a userdefined beamstop shadow. To remove parasitic scattering of air and solvent surrounding the crystal and inelastic Compton scattering, a circularly averaged profile was subtracted. This profile was constructed using pixels with values of less than 0.5 of the maximum pixel intensity in the image and was corrected for polarization of the synchrotron beam. When subtracting the radial profile the polarization was reintroduced. To isolate the diffuse scattering, Bragg spots had to be removed. Methods have been described in the literature that use knowledge of Bragg peak positions. Masks are located at predicted reflection positions and, within these, pixels are removed only if they deviate significantly from the background (Polikanov & Moore, 2015; Peck et al., 2018). An alternative method that is not dependent on predicting reflection positions and that is often used to remove sharp features in images is mode filtering (Wall, Ealick et al., 1997). The most common value of the pixel intensities in a box around every pixel replaces its value. We took this approach and investigated how well Bragg reflections were removed depending on the kernel size. We found that a kernel size of 21 × 21 pixels was needed to remove the Bragg spots completely. Background and Bragg peak removal is implemented in VIEW (Schreurs, 1999a). Examples of the resulting images containing only diffuse scattering for CypA and lysozyme are shown in Fig. 1. Once the radial scattering and Bragg peaks have been removed, the pixels are transformed to by the software IMG2HKL, which is part of the EVAL package (Schreurs, 1999a). In fact, every pixel represents a voxel extending in the rotation direction over the scan width. The eight corners are mapped to and the intensity is divided over the voxels that are touched in the new grid. We chose to define the new grid in terms of (h_{s}, k_{s}, l_{s}) indices for easy comparison with the simulated diffuse maps (see Section 3). The indices correspond to rational fractions of of the original For CypA we used a 9 × 8 × 5 allowing subMillerindex sampling in multiples of 1/9, 1/8 and 1/5 in the a*, b* and c* directions, respectively. For lysozyme we used a 5 × 5 × 10 In both cases the target voxels represent roughly the same dimension in Å^{−1}. The resolution limit of the pixel data we used was 2.0 Å in both cases. During the mapping, image voxel intensities are corrected for Lorentz and polarization factors and accumulated in the target voxels (h_{s}, k_{s}, l_{s}). Thus, the final values are proportional to squared structure factors. However, a particular region in can occur twice in a rotation scan ranging over less than 360°: one time left and one time right of the rotation axis. Target voxel intensities are corrected for these number of occurrences; voxels not being hit stay blank.
3.3. Molecular ensembles for modelling disorder
All calculations were performed with custommade scripts using cctbx (GrosseKunstleve et al., 2002). Four types of motion models, three rigidbody motion models and one rigidbody plus internal motion model, were generated for comparison with the measured and extracted experimental diffuse scattering. The three rigidbodyonly models (Fig. 2, top panels) were fitted to the C^{α} Bfactor fingerprint of the refined structure (target B in Fig. 2). Rotation angles were selected from a onedimensional normal distribution, while translation vectors were extracted from a threedimensional multivariate distribution. The rotation axis is a randomly generated vector. The variances of normal distribution, from which the rotation angles and translational displacements were generated, were fitted by a simplex minimization (scitbx.simplex) on the difference between the C^{α} Bfactor trace and the B factors obtained from the rootmeansquare fluctuation (r.m.s.f.) of 100 asymmetric units generated from the distributions. The disorder models then consist of 100 asymmetric units created with the fitted variances of either the translational distribution, the rotational distribution or a mixture of the two.
To model the internal motion of a protein in a crystal, ensemble phenix.refine (Burnley et al., 2012; Adams et al., 2010) was used. A parameter sweep over p_{TLS}, d_{TMP} and τ_{x} was performed (Burnley et al., 2012). The ensemble with the lowest R_{free} is chosen as the `best' ensemble and used for further calculations (Supplementary Table S1). Before ensemble is started, it is common practice to fit TLS matrices to the B factors of the input model (a refined crystal structure) and to subtract their contribution (BTLS) from the Bfactors columns. This prevents the from sampling largescale motion and forces the sampling of internal atomic fluctuations (Burnley et al., 2012). For the diffuse scattering calculations presented here, these permolecule TLS motions are reintroduced to the generated ensemble models. This is performed by fitting the rotation and translation variances to the C^{α} BTLS trace found in the Bfactor column of the ensemble models, similar to the method described above. The resulting translation and rotation operations are then randomly applied to models from the ensemble to create asymmetric units describing internal motion and BTLS (Fig. 2, bottom panels).
as implemented inAs performed previously by Van Benschoten et al. (2015), we also calculated diffuse scattering from TLS models that were fitted to refined anisotropic displacement parameters U_{ij}. The eigenvalues of (input U_{ij} − fitted U_{ij}) were restricted to be positive. The Smatrix components were always set to zero. Fitted TLS matrices were used to generate ensembles of structures using phenix.tls_as_xyz (Urzhumtsev et al., 2015).
3.4. Calculation of diffuse scattering from molecular ensembles
We use supercells to sample diffuse scattering in hold for these small crystals as long as F_{N} = Ft[ρ(r)_{N}] is calculated at (h_{s}, k_{s}, l_{s}) values that are integer multiples of fractional (h, k, l). Otherwise shape transform ripples will dominate the diffuse pattern (Neder & Proffen, 2008), which does not occur in the observed diffraction patterns unless the crystal are truely nanometresized (Chapman et al., 2011). Thus, we implement (9) by calculating the structurefactor variance of N_{s} supercells,
in between the Bragg peaks at fractional The crystals are of very limited size (5–10 unit cells in each dimension). However, all of the equations in Section 2N_{s} is 100 throughout this paper. The asymmetric units describing the disorder are prepared for diffuse scattering calculation by setting all B factors to 0 and all occupancies to 1. parameters are chosen in such a way that the crystals are close to cubic, and the smallest is five unit cells in a row. This ensures that the reciprocalspace voxels in the final map will be close to cubic as well. Once the dimensions have been chosen, the symmetry operations of the and unitcell translations of the crystal are determined, forming a complete set of operations to fill the For each of the elements in the set, an from the disorder model is chosen at random and the corresponding operation is applied. The coordinate file, P1 and size are passed on to mmtbx.utils.fmodel_from_xray_structure to be Fourier transformed to a resolution of 2 Å. A bulksolvent model is used to represent the solvent. The structure factors and phases are written to a binary structurefactor file (.mtz). This is repeated 100 times in order to sample the full disorder that we want our supercells to represent. The process is performed in parallel using the easy_mp functionality in cctbx. 〈F(h_{s}, k_{s}, l_{s})〉^{2}_{100} and 〈F(h_{s}, k_{s}, l_{s})^{2}_{100}〉 are then calculated, after which a final .mtz file is written containing the from the and the columns I_{Bragg}, I_{tot} and I_{diff} (I_{tot} − I_{Bragg}) that follow from (10) and (11).
The final diffuse intensities were placed in an array after applying Friedel symmetry to all .mapstyle file with constants in Å^{−1} describing the reciprocalspace dimensions. No other symmetry operations were applied. The supercells are built with the of the crystals and thus the calculated diffuse maps should have the corresponding pointgroup symmetry.
This array was written to a CCP4For large supercells these calculations can become computationally intensive. For example, for the lysozyme diffuse scattering calculations discussed in this paper, the 5 × 5 × 10 a = b = 394.16, c = 382.32 Å, α = β = γ = 90.0°. This resulted in a containing 250 unit cells, each filled with eight molecules made up of 1000 nonH atoms. The FFT resulted in a list of 15 550 023 The 100 temporary .mtz files took up 297 MB of disk space each and the final .mtz file was 356 MB in size. The map file used for further analysis had a file size of 230 MB.
was3.5. Analysis of calculated diffuse scattering
To compare experimental and model maps, the origins of the maps are aligned and a combined mask of unmeasured and noncalculated voxels is constructed. Noncalculated voxels in the model maps were set to 0. Calculated and experimental maps are scaled by their total unmasked intensities. The maps were displayed with UCSF Chimera (Pettersen et al., 2004) for visual comparison. Linear correlation coefficients (CCs) between all unmasked points are calculated using cctbx array_family flex.linear_correlation. The correlation coefficients between voxels corresponding to the original Bragg reflections are calculated by masking the nonBragg voxels.
Radially averaged intensities of the scaled maps are calculated by masking everything that is not within the resolution shell and calculating the mean in 20 resolution bins. Maps containing the radial average per voxel are constructed, saved and subtracted from the original maps. Correlation coefficients between these isotropic corrected maps are calculated similarly as above.
Scripts are available on GitHub (https://github.com/kroonlab/scud).
4. Results
4.1. Experimental diffuse maps
The maps reconstructed from images as described in Section 3 have pointgroup symmetry 1 and are subsequently symmetrized using Friedel symmetry (linear CC of 0.86 for CypA and 0.78 for lysozyme) or the Laue of the crystals, which is mmm for CypA (CC = 0.74) and 4/mmm for lysozyme (CC = 0.53). The diffuse maps for CypA [Figs. 1(b) and 1(c)] and lysozyme [Figs. 1(e) and 1(f)] viewed along the l axis (c*) in the −1 and the higher mmm and 4/mmm symmetries, respectively, show that in the lower the the noise level is quite high and averaging in mmm or 4/mmm improves the maps enormously. For lysozyme, Figs. 1(e) and 1(f) show that the fourfold symmetry is present in the lower We verified that every target voxel (h_{s}, k_{s}, l_{s}) was hit multiple times: for CypA the most frequent number of hits in a 9 × 8 × 5 oversampled map with pointgroup symmetry 1 was 44, but ranged from 0 to 507. Zero hits occur from detectorpanel gaps, the beamstop shadow and the cusp region of the rotation scan. For lysozyme, in the 5 × 5 × 10 oversampled map these values were 78 and 0–502. Voxel dimensions in the rotation direction (φrange) are large in the case of wide slicing. We investigated what the consequence is for mapping into When fineslicing the lysozyme data at 0.3°, instead of at 1° as we used initially, the most frequent number of hits per target voxel increased to 100 and ranged between 0 and 1467, which implies that the subdivision of every voxel into 3.3 voxels does not generate 3.3 times the number of hits, and that many of them map to the same target voxel. The two maps look quite similar (CC = 0.62). The original data were finesliced to 0.1° but brought the diffuse scattering to the singlephoton noise level and no good diffuse maps could be obtained. We conclude that a scan width of 0.5–1° is probably best for obtaining sufficient signal in the diffuse maps in the usual experimental setup at synchrotron beamlines. The subtraction of radial mean background intensity leads to negative pixel values in the diffuse maps. Chapman et al. (2017) have developed an improved method for background subtraction by using a discrete noisy Wilson distribution, by which average background intensities and their variance are determined. This method avoids the oversubtraction of background, while getting rid of almost all negative intensities. We did not correct the diffuse image intensities to obtain only positive intensities. The speckle structure, the distribution of intensities and linear are not affected by the maps containing negative intensities.
We noticed that in projections of the complete threedimensional diffuse maps intensities accumulated on the Bragg layers perpendicular to a* and b* in CypA and to c* in lysozyme (Supplementary Fig. S1). Such features could not be observed in individual slices as they are very weak. We confirmed that the kernel in our mode filter (21 × 21 pixels) was sufficiently large to not leave part of the Bragg spots behind (judged after mapping to threedimensional reciprocal space), so we rule out these features being caused by Bragg peaks. Similar observations were made by Polikanov & Moore (2015). They found troughs between adjacent rows where the Bragg reflections were removed in diffuse patterns of ribosome. These features must be related to the lattice disorder rather than diffuse scattering caused by motion within the Polikanov and Moore were able to reproduce this type of diffuse scattering using a model for acoustic displacement waves. By writing diffuse scattering in terms of structurefactor variances and structurefactor correlation coefficients between unit cells [which corresponds to our equation (7) and diffuse scattering of type (iii)], Moss et al. (2003) concluded that in soft molecular crystals the correlation coefficients fall off rapidly with q, the vector, resulting in a broad acoustic peak at the Bragg positions. Such weak acoustic lattice vibrations must therefore be present in both CypA and lysozyme.
4.2. Calculated diffuse maps
Molecules (asymmetric units) randomly picked from the disorder models described previously were used to construct supercells [Fig. 3(a); Section 3]. The Fourier transforms of these supercells sample on and between the integer of the original (Section 3). A Fourier transform of a single [Fig. 3(b)] shows Bragg reflections of the original and a weak diffuse scattering pattern. When 100 supercells are Fourier transformed and the average total intensities are calculated, this results in well defined diffuse scattering under and between the Bragg reflections [Fig. 3(c)]. The Bragg reflections obey the symmetry and of the original (P4_{3}2_{1}2; see the in the h_{s} = 0 and k_{s} = 0 directions). Diffuse scattering is calculated as the difference between the total scattering and the Bragg scattering.
4.3. Comparison of the diffuse scattering between models and data
Linear correlation coefficients between all calculated maps and the data were calculated (Table 1; Section 3). For CypA, the modelled scattering from translational disorder has a (CC) of 0.46 with the measured diffuse scattering; disorder modelled using a mix of translation and rotation gives a CC of 0.47 (Table 1). Van Benschoten et al. (2016) recorded the CypA data set and showed that diffuse scattering fitted by a liquidlike motion model resulted in a of 0.518. However, the authors only compared the anisotropic components of both the measured and calculated diffuse scattering in their analysis. If we remove the isotropic components from the data (very little is left because of radially averaged background subtraction) and models, we obtain a CC of 0.51 for our translationonly model and a CC of 0.53 for a model from mixed rotation and translation, and thus we obtain comparable agreement.

For lysozyme, lower correlations between rigidbody models and the data were obtained than for CypA (CC = 0.29 for mixed translation and rotation). However, the agreement improves when considering only the diffuse scattering at the original Bragg positions (Table 1). The anisotropic components of the data and the calculated maps show an even better agreement: a CC of 0.45 for the mixed rigidbody disorder model.
The addition of internal motion to the rigidbody disorder models did not improve the correlation coefficients with the data. For CypA these correlation coefficients are comparable to those of rigidbody models (CC of 0.47 for Ensemble+BTLS versus 0.47 for the mixeddisorder model), while for lysozyme the coefficients become worse. Modelled diffuse scattering maps show high correlation coefficients amongst each other (Table 1). The only exception is the poor resemblance of translation and rotationcalculated maps (CC < 0.55), which is consistent with the findings of Moore (2009).
We generated an ensemble of molecules from refined TLS matrices, a method that was used previously by Van Benschoten et al. (2015), and calculated linear crosscorrelations between the modelled scattering and the data. For CypA, CC_{all} and CC_{aniso} are 0.46 and 0.51, which are comparable to the translation CC values (CC_{all} of translation versus TLS of 0.93). For lysozyme, the CC with data for TLS models improved compared with translation models (CC_{all} = 0.33, CC_{aniso} = 0.37). This shows that the anisotropic translation matrix from the TLS model more accurately describes the true (anisotropic) translation behaviour (Supplementary Fig. S3).
5. Discussion
Correlated motional disorder of atoms within the unit cells produces diffuse scattering of type (ii) (see Section 2). Such motions can be rigidbody movement of whole molecules or internal conformational mobility, or combinations thereof. We generated molecular models to describe such motions using the method and calculated full oversampled threedimensional diffuse maps. Diffuse maps from rigidbody models have a remarkable resemblance to experimental diffuse maps, as discussed below. Firstly, the linear correlation coefficients are comparable to those in earlier work by Van Benschoten et al. (2016) for CypA, but are lower for lysozyme. The latter is likely to be caused by the more noisy experimental data, as the CC between symmetrized and original maps is only 0.53 and fine and widesliced data sets from the same image data produce maps with a CC of 0.62. Secondly, the twodimensional zero zone slices (Fig. 4) and threedimensional maps for both CypA and lysozyme (Supplementary Fig. S2) clearly show that throughout experimental diffuse features are reproduced by the mixed rigidbody models. Thirdly, the introduction of internal motion models in addition to rigidbody motions, which were obtained from ensemble and were not specifically optimized to reproduce the diffuse scattering, does not improve the agreement (Table 1). Internal motions appear to only modulate the rigidbody diffuse scattering (compare the two lower rows in Fig. 4), although substantial motions occur (see, for example, the ensembles representing internal motions of CypA depicted in Fig. 5).
The crystals considered here have a moderate degree of packing disorder (diffraction to 1.15 and 1.3 Å resolution for CypA and lysozyme) but are still sufficient to produce this type of diffuse scattering. Ayyer et al. (2016) and Chapman et al. (2017) observed continuous diffraction in the XFEL data of photosystem II (PSII) crystals that diffracted to only 4.5 Å resolution. They assumed this to be caused by translational displacements of individual molecules and showed that the total diffuse scattering is the incoherent sum of that of displaced symmetryrelated molecules. This assumption allowed them to use oversampling techniques as practiced in coherent diffractive imaging and thereby to interatively phase to higher resolution than the Bragg diffraction. An unbiased estimation of the structural unit that is responsible for the continuous scattering was obtained from the size of the speckles in the diffraction pattern and its autocorrelation function, which indicates that for PSII this is a dimer. To verify our above results, we made such an independent estimation of the structural unit responsible for the diffuse scattering in CypA and lysozyme by calculating the autocorrelation function from our experimental diffuse maps. This is similar to calculating a from Bragg data, as is common practice in crystallography. Indeed, we could feed the CCP4 Patterson module with our (h_{s}, k_{s}, k_{s}, I_{diff}) array (Fig. 6). We found a size of 30–40 Å, corresponding to one molecule for both CypA and lysozyme, and consistent with our rigidbody models. A critical review (Wall et al., 2018) questions the assumptions made by Ayyer and Chapman. We discuss some of the issues raised below.
Our conclusions are different from previous work, where internal correlation motions were held to be responsible for diffuse scattering. LLM models for CypA (Van Benschoten et al., 2015; Peck et al., 2018) and tetragonal lysozyme (Clarage et al., 1992) give fair agreement with diffuse scattering data, and likewise elastic network models for other protein crystals (Riccardi et al., 2010). In both approaches the diffuse scattering is proportional to a convolution of the Fourier transform of the Patterson of the displaced structure and the Fourier transform of a displacement correlation function. This leads to speckles distributed over all of The parameters in this model have been fitted to the diffuse scattering, and indeed its global appearance resembles that from the rigidbody translations (see Fig. 4 in Peck et al., 2018). We have calculated from the Fourier transform of the exponential displacement correlation function that a correlation length of 7.1 Å, as Van Benschoten et al. (2016) found, leads to a speckle size of 1/33 Å^{−1}, which is roughly in agreement with the size of the rigid unit as determined from the autocorrelation function of our diffuse data. In contrast, our ensemble structures that model internal correlated motions make only a small contribution to the diffuse scattering maps. Our models are from ensemble of the Bragg data and are not fitted to correlated motion, so may not be fully representative, although we assume that the force field in the ensemble ensures at least some correlated motions. Obviously, the motion that has the largest correlation between atoms is rigidbody translation, as all atoms move in a fully concerted manner, and therefore will always dominate the diffuse scattering (see equation 5 and the discussion below it). If only smaller structural units move in a correlated fashion the variances in structure factors are not that large (equation 8) and the diffuse intensities are much smaller. Moleculardynamics simulations have been used to predict diffuse scattering with some success, especially since it was realized that long sampling times (>1 ns) were needed to reach convergence (Clarage et al., 1995). Héry et al. (1998) concluded from MD simulations of one that in orthorhombic lysozyme crystals the molecules move only partially as rigid bodies, i.e. only the backbone atoms move as such. However, comparison with the data was only visual and on a single detector image. 10 ns MD simulations of the staphylococcal nuclease crystal by Meinhold & Smith (2005a,b) and subsequent principal component analysis (PCA) showed that the five lowest frequency largeamplitude components reproduce the main features of diffuse scattering. Wholemolecule motion was found to only represent part of the meansquare fluctuations, although these might be limited by periodic boundary conditions in the simulations. This restriction was overcome by Wall (2018) through MD simulations of 2 × 2 × 2 unit cells of the same protein. The agreement with diffuse scattering in terms of CC (0.68) is better than before. Unfortunately, limited insight is given into the threedimensional diffuse maps as only one intersection with the was shown and only averaged diffuse intensities in resolution shells. Furthermore, it is left unclear whether rigidbody translations occurred in the simulations, which is very possible because only unitcell centreofmass translations were removed in the MD protocol, and with 32 molecules in the there is plenty of room for relative motions of the molecules. In a recent paper, Peck et al. (2018) reanalysed the diffuse scattering of CypA using the same data that we used here and that was made public by Van Benschoten et al. (2016). Their conclusion is that intermolecular correlations are needed to explain the diffuse intensities that they extracted from the data. The analysis was based on a liquidlike motion model that was extended to include nearestneighbour motional correlations. Although in the current paper we noted that evidence for longer range correlated motions is indeed found, we believe that their data actually still contain parts of the Bragg reflections and their large CC (0.71) can be attributed to these. Our diffuse maps look completely different, as we did not rely on predicted locations and the size of the Bragg reflections, but used mode filtering instead.
Simulated diffuse maps have an isotropic component that is part of the correlated motion, which we would prefer not to subtract. Clearly, the way we analysed the experimental data, by subtracting radially averaged background scattering, leads to the removal of all isotropic scattering, and as a consequence CC_{aniso} (Table 1) is larger than CC_{all}. Improvements in this step of data processing in order to obtain better estimates of background scattering along the lines laid out by Chapman et al. (2017) will most likely give better agreement. One might question whether CC values in the range 0.45–0.6 are sufficient to conclude that any of the motion models are correct. We think that a large part of the disagreement comes from the noisy data and the processing methods. It is only after considering the features in full threedimensional oversampled diffuse maps that we gained confidence in the validity of the rigidbody motion model.
We believe that our current approach by forward modelling of diffuse scattering in oversampled full threedimensional et al., 2014). We are currently developing a ensemblerefinement technique that uses the total scattering, i.e. Bragg intensities and diffuse scattered intensities. Realistic conformational motions, next to the rigidbody motions, can potentially be obtained from this kind of structural refinement.
from well defined ensembles with translational, rotational and internal correlated motions, clearly shows the dominant influence of rigidbody translational disorder in protein crystals. Despite this, correlated internal motions could have an effect on the diffuse intensities. The challenge will be to model their weak contribution in order to reveal protein dynamics (WallSupporting information
Supplementary Table and Figures. DOI: https://doi.org/10.1107/S2052252519000927/cw5019sup1.pdf
Footnotes
^{1}Although the peak intensity as follows from equation (1) is proportional to N^{2}, after integration over the Bragg peak volume it is proportional to N.
Acknowledgements
We thank N. M. Pearce and P. Gros for discussions and reading the manuscript, and N. M. Pearce for generating TLS matrices.
Funding information
We thank the Netherlands Organization for Scientific Research (NWO) for financial support through grant 711.013.006.
References
Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L.W., Kapral, G. J., GrosseKunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. & Zwart, P. H. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Ayyer, K., Yefanov, O. M., Oberthür, D., RoyChowdhury, S., Galli, L., Mariani, V., Basu, S., Coe, J., Conrad, C. E., Fromme, R., Schaffer, A., Dörner, K., James, D., Kupitz, C., Metz, M., Nelson, G., Xavier, P. L., Beyerlein, K. R., Schmidt, M., Sarrou, I., Spence, J. C. H., Weierstall, U., White, T. A., Yang, J.H., Zhao, Y., Liang, M., Aquila, A., Hunter, M. S., Robinson, J. S., Koglin, J. E., Boutet, S., Fromme, P., Barty, A. & Chapman, H. N. (2016). Nature (London), 530, 202–206. CrossRef CAS Google Scholar
Burnley, B. T., Afonine, P. V., Adams, P. D. & Gros, P. (2012). Elife, 1, e00311. Web of Science CrossRef PubMed Google Scholar
Caspar, D. L. D., Clarage, J., Salunke, D. M. & Clarage, M. (1988). Nature (London), 332, 659–662. CrossRef CAS Google Scholar
Chapman, H. N., Fromme, P., Barty, A., White, T. A., Kirian, R. A., Aquila, A., Hunter, M. S., Schulz, J., DePonte, D. P., Weierstall, U., Doak, R. B., Maia, F. R. N. C., Martin, A. V., Schlichting, I., Lomb, L., Coppola, N., Shoeman, R. L., Epp, S. W., Hartmann, R., Rolles, D., Rudenko, A., Foucar, L., Kimmel, N., Weidenspointner, G., Holl, P., Liang, M., Barthelmess, M., Caleman, C., Boutet, S., Bogan, M. J., Krzywinski, J., Bostedt, C., Bajt, S., Gumprecht, L., Rudek, B., Erk, B., Schmidt, C., Hömke, A., Reich, C., Pietschner, D., Strüder, L., Hauser, G., Gorke, H., Ullrich, J., Herrmann, S., Schaller, G., Schopper, F., Soltau, H., Kühnel, K.U., Messerschmidt, M., Bozek, J. D., HauRiege, S. P., Frank, M., Hampton, C. Y., Sierra, R. G., Starodub, D., Williams, G. J., Hajdu, J., Timneanu, N., Seibert, M. M., Andreasson, J., Rocker, A., Jönsson, O., Svenda, M., Stern, S., Nass, K., Andritschke, R., Schröter, C.D., Krasniqi, F., Bott, M., Schmidt, K. E., Wang, X., Grotjohann, I., Holton, J. M., Barends, T. R. M., Neutze, R., Marchesini, S., Fromme, R., Schorb, S., Rupp, D., Adolph, M., Gorkhover, T., Andersson, I., Hirsemann, H., Potdevin, G., Graafsma, H., Nilsson, B. & Spence, J. C. H. (2011). Nature (London), 470, 73–77. Web of Science CrossRef CAS PubMed Google Scholar
Chapman, H. N., Yefanov, O. M., Ayyer, K., White, T. A., Barty, A., Morgan, A., Mariani, V., Oberthuer, D. & Pande, K. (2017). J. Appl. Cryst. 50, 1084–1103. Web of Science CrossRef CAS IUCr Journals Google Scholar
Clarage, J. B., Clarage, M. S., Phillips, W. C., Sweet, R. M. & Caspar, D. L. (1992). Proteins, 12, 145–157. CrossRef PubMed CAS Web of Science Google Scholar
Clarage, J. B., Romo, T., Andrews, B. K., Pettitt, B. M. & Phillips, G. N. (1995). Proc. Natl Acad. Sci. USA, 92, 3288–3292. CrossRef CAS Google Scholar
Doucet, J. & Benoit, J.P. (1987). Nature (London), 325, 643–646. CrossRef CAS PubMed Web of Science Google Scholar
Duisenberg, A. J. M. (1992). J. Appl. Cryst. 25, 92–96. CrossRef CAS Web of Science IUCr Journals Google Scholar
Fraser, J. S. (2015). Xray Diffraction Data from Cyclophilin A, Source of 4YUO Structure. https://dx.doi.org/10.15785/SBGRID/68. Google Scholar
GrosseKunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
HenzlerWildman, K. & Kern, D. (2007). Nature (London), 450, 964–972. Web of Science CrossRef PubMed CAS Google Scholar
Héry, S., Genest, D. & Smith, J. C. (1998). J. Mol. Biol. 279, 303–319. Web of Science CrossRef PubMed Google Scholar
James, R. W. (1958). The Optical Principles of the Diffraction of Xrays. London: G. Bell & Sons. Google Scholar
Meinhold, L. & Smith, J. C. (2005a). Phys. Rev. Lett. 95, 218103. CrossRef Google Scholar
Meinhold, L. & Smith, J. C. (2005b). Biophys. J. 88, 2554–2563. CrossRef CAS Google Scholar
Meisburger, S. P., Thomas, W. C., Watkins, M. B. & Ando, N. (2017). Chem. Rev. 117, 7615–7672. Web of Science CrossRef CAS PubMed Google Scholar
Moore, P. B. (2009). Structure, 17, 1307–1315. Web of Science CrossRef PubMed CAS Google Scholar
Moss, D. S., Harris, G. W., Wostrack, A. & Sansom, C. (2003). Crystallogr. Rev. 9, 229–277. CrossRef CAS Google Scholar
Neder, R. B. & Proffen, T. (2008). Diffuse Scattering and Defect Structure Simulations: A Cook Book using the Program DISCUS, p. 240. Oxford University Press. Google Scholar
Peck, A., Poitevin, F. & Lane, T. J. (2018). IUCrJ, 5, 211–222. CrossRef CAS IUCr Journals Google Scholar
Pérez, J., Faure, P. & Benoit, J.P. (1996). Acta Cryst. D52, 722–729. CrossRef Web of Science IUCr Journals Google Scholar
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C. & Ferrin, T. E. (2004). J. Comput. Chem. 25, 1605–1612. Web of Science CrossRef PubMed CAS Google Scholar
Polikanov, Y. S. & Moore, P. B. (2015). Acta Cryst. D71, 2021–2031. Web of Science CrossRef IUCr Journals Google Scholar
Riccardi, D., Cui, Q. & Phillips, G. N. (2010). Biophys. J. 99, 2616–2625. Web of Science CrossRef CAS PubMed Google Scholar
SanchezWeatherby, J., Bowler, M. W., Huet, J., Gobbo, A., Felisaz, F., Lavault, B., Moya, R., Kadlec, J., Ravelli, R. B. G. & Cipriani, F. (2009). Acta Cryst. D65, 1237–1246. Web of Science CrossRef CAS IUCr Journals Google Scholar
Schreurs, A. M. M. (1999a). EVAL Program Suite. Utrecht University, The Netherlands. https://www.crystal.chem.uu.nl/distr/eval. Google Scholar
Schreurs, A. M. M. (1999b). Peakref. Utrecht University, The Netherlands. https://www.crystal.chem.uu.nl/distr/eval/documentation/ccd/peakref/doc/index.html. Google Scholar
Schreurs, A. M. M., Xian, X. & KroonBatenburg, L. M. J. (2010). J. Appl. Cryst. 43, 70–82. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sheldrick, G. M. (1996). SADABS. University of Göttingen, Germany. Google Scholar
Urzhumtsev, A., Afonine, P. V., Van Benschoten, A. H., Fraser, J. S. & Adams, P. D. (2015). Acta Cryst. D71, 1668–1683. Web of Science CrossRef IUCr Journals Google Scholar
Van Benschoten, A. H., Afonine, P. V., Terwilliger, T. C., Wall, M. E., Jackson, C. J., Sauter, N. K., Adams, P. D., Urzhumtsev, A. & Fraser, J. S. (2015). Acta Cryst. D71, 1657–1667. Web of Science CrossRef IUCr Journals Google Scholar
Van Benschoten, A. H., Liu, L., Gonzalez, A., Brewster, A. S., Sauter, N. K., Fraser, J. S. & Wall, M. E. (2016). Proc. Natl Acad. Sci. USA, 113, 4069–4074. Web of Science CrossRef CAS PubMed Google Scholar
Wall, M. E. (2018). IUCrJ, 5, 172–181. CrossRef CAS IUCr Journals Google Scholar
Wall, M. E., Adams, P. D., Fraser, J. S. & Sauter, N. K. (2014). Structure, 22, 182–184. Web of Science CrossRef CAS PubMed Google Scholar
Wall, M. E., Clarage, J. B. & Phillips, G. N. (1997). Structure, 5, 1599–1612. Web of Science CrossRef CAS PubMed Google Scholar
Wall, M. E., Ealick, S. E. & Gruner, S. M. (1997). Proc. Natl Acad. Sci. USA, 94, 6180–6184. CrossRef CAS PubMed Web of Science Google Scholar
Wall, M. E., Wolff, A. M. & Fraser, J. S. (2018). Curr. Opin. Struct. Biol. 50, 109–116. CrossRef CAS Google Scholar
Welberry, T. R. (1985). Rep. Prog. Phys. 48, 1543–1594. CrossRef CAS Web of Science Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.