How best to use photons

Different modes of data collection are explored, and the effect of flux and multiplicity on the resulting quality of the data set is discussed. Advice is offered on how to collect data in the absence of prior knowledge of the sample.


Introduction
The principal limit on the completeness and accuracy of crystallographic data from third-generation synchrotron sources is often sample lifetime, i.e. radiation damage. With CCD detectors this presented a specific challenge: to obtain sufficiently strong data to overcome detector read-out noise whilst obtaining a complete data set, ideally to the highest possible resolution. Strategy programs such as BEST (Popov & Bourenkov, 2003) were developed with exactly this challenge in mind. With the advent of photon-counting detectors, however, the possibility arises of recording far weaker data and instead relying on multiplicity of measurements to obtain improvements to the quality of the data, rather than increasing photon counts for individual observations. Therefore, this raises the question of how best to use the photons that may be scattered within the lifetime of the sample.
While software exists which may estimate the lifetime of samples given a detailed knowledge of the beamline and sample composition (Murray et al., 2004;Zeldin et al., 2013), and strategy programs exist to exploit this information, these are sensitive to the initial input and require a detailed knowledge of the beam profile, intensity and sample composition. The aim here is to arrive at a protocol that may be used in the absence of this preparation but should still arrive at a good quality data set, i.e. a general strategy rather than a sample-specific one.
In arriving at such a strategy, there are four specific questions that must be answered.
(i) Is a larger number of weak observations equivalent to a smaller number of stronger observations with the same total photon counts?
(ii) If very weak data are recorded, are they useful? ISSN 2059-7983 (iii) Given a reasonable multiplicity of observations, how can the presence of radiation damage be detected, and where is the optimum point to 'truncate' the data set?
(iv) Given data from multiple samples, is it better to combine weak complete sets or stronger partial ones?
These questions will be considered in sequence, with example data sets to consider each point. Extensive use will be made of merging statistics, and the reader is directed to https:// strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/R-factors for a refresher, if needed.

Strength versus multiplicity
Any data-collection strategy that depends on multiplicity of measurements must first ask if, in the absence of significant radiation damage, the results of a high-multiplicity low-dose experiment are equivalent to the same number of photons scattered from the same crystal over fewer reflections. Recording fewer, stronger reflections (whilst still a complete set) may be an effective strategy if (i) the sample lifetime is well known, (ii) data size (disk storage) is a factor and (iii) acquisition time is a major consideration. If the sample lifetime is not well known, for example a novel protein where the sample behaviour has not been previously characterized, there is a strong argument for a conservative approach to data collection, i.e. recording more data with a lower intensity beam, such that in the event of radiation damage being found the data may be cut back post mortem, reducing multiplicity but ideally not completeness.
To address this question, data were recorded on Diamond Light Source beamline I24 from three cubic insulin samples, deliberately grown to be comparable in size to the beam (details in Appendix D in the Supporting Information). The total dose (i.e. full-beam seconds) for each was kept as close as possible to constant, as well as keeping it low to reduce effects of damage, resulting in relatively weak but comparable data sets -the data-collection parameters are listed in Table 1. All data were recorded with an exposure time of 20 ms per frame at 0.9686 Å , with the total rotation and transmission adjusted to give approximately the same total dose of around 0.16 MGy, as estimated by RADDOSE-3D (Zeldin et al., 2013). For each sample, multiple data sets were recorded with varying total rotation and transmission, in a randomly selected order, with the first scan repeated at the end to allow direct comparison. In all cases no signs of significant radiation damage were detected, and the results of structure refinement were comparable.
All had around 0.4 full-beam seconds of data collected, around 1.2 Â 10 12 photons. While the R merge values vary as expected, the R p.i.m. values are relatively consistent (Fig. 1). An additional sample was collected where the total dose was around eight times higher, with the corresponding improvement in R p.i.m. , indicating that the dominant factor in the precision of the measurements was the total scattered photons. As such, there is no evidence that recording higher multiplicity weaker measurements has any detrimental effect on the overall data quality or final resolution limit. In particular, the final resolution limits as estimated by CC 1/2 ' 0.5 for each of the data sets recorded on the three crystals were comparable. It is important to note that there are practical limits to this, as the data must be strong enough that spot finding and indexing remain successful. Merging statistics for 12 comparable data sets from three samples (A, left; B, middle; C, right) where the total number of scattered photons was kept approximately constant while the transmission and total rotation range varied to assess the effects on the total data quality. Table 1 Merging statistics for 12 comparable data sets from three samples (A, B, C) where the total number of scattered photons was kept approximately constant while the transmission and total rotation range varied to assess the effects on the total data quality.

Transmission ladder
In many cases the expected lifetime for a sample will not be known a priori. However, there will usually be fairly well known extrema, for example a minimum and maximum typical lifetime, which may differ by one or more orders of magnitude. In this situation, a conservative strategy for data collection could be to acquire first an exceedingly weak full rotation, i.e. well below an anticipated lifetime dose of the sample, then the same rotation with 4, 16 and perhaps 64 times the dose -in principle doubling the Poisson-derived I/(I) each cycle. It is highly likely that the later runs will have substantial radiation damage, however if this is observed, the previous run should always give complete data, or as complete as possible given the geometric constraints. The earlier low-dose data may also be suitable for molecular replacement or substructure determination, where subsequent (potentially somewhat damaged) data could be more suitable for structure refinement as a higher resolution may have been achieved. Conversely the stronger but radiation-damaged data could be useful for determining an initial sample orientation, which could then be used to process the weaker data.

Difference maps for ligand binding
Ligand-binding studies for drug discovery is a common use for data collection at synchrotron sources. In such cases the majority of the atomic positions are well known, so even imprecise data may be adequate to observe the differences between the sample under study and the existing model, thus showing any ligands. This may be demonstrated by taking a sequence of data sets from a sample with a ligand, with a range of transmissions, and computing difference maps for each.
Data were collected at Diamond Light Source beamline I03 from a thaumatin crystal prepared following standard protocols with tartrate in the crystallization conditions. Each data set was recorded as 3600 Â 0.1 images with 40 ms exposure period, with transmissions as close as possible to 1 16 , 1 4 , 1, 4, 16, 64% (i.e. $1 Â 10 9 to $1 Â 10 12 photons s À1 ) for a total of six runs. The steps in transmission were chosen to give an approximate doubling of I/(I) due to counting statistics ( Fig. 2 and Table 2).
Each data set was processed independently with xia2/ DIALS (Winter, 2010;Winter et al., 2018) to a fixed resolution of 1.6 Å , and DIMPLE (http://ccp4.github.io/dimple/) was run to compute a difference map, using a model of thaumatin without tartrate present. As can be seen in Fig. 3, even though the merging statistics are very poor from the weakest data set, the map shows clear difference density which is reproduced by the subsequent data sets. The structure refinement also shows a good agreement between the model and the data, though the stronger data sets before radiation damage becomes apparent give slightly improved statistics.
This clearly demonstrates that though the data are very weak and show rather high merging residuals, the averaged data are nevertheless useful for ligand identification, and can be acquired with as little as one tenth of a full-beam-second worth of exposure. While thaumatin crystals are well known to be robust in the beam, clear signs of radiation damage such as a significant fall-off in resolution were visible in the 16% and 64% data sets. The question of radiation damage will be revisited in Section 5.

Symmetry determination and molecular replacement
Traditional data-collection strategies from e.g. EDNA (Incardona et al., 2009) rely on acquiring a small number of 'screening' images from which the lattice symmetry is derived via indexing. In the majority of cases this will result in the correct lattice, however, in some circumstances accidental symmetry in the unit-cell parameters (e.g. an orthorhombic primitive lattice with a = b) may give misleading results. This may only be discovered subsequently once a full data set has been collected and the integrated intensities have been analysed. Such an analysis may however be successfully performed with a very low dose data set. Similarly, molecular Merging statistics for thaumatin data sets recorded with transmissions from 1 replacement is principally dependent on the low-resolution (from 1 to $4-2.5 Å ) data (Evans & McCoy, 2008), so intensities resulting from a low-dose sweep may be useful for assessing molecular replacement models.
To demonstrate this, data were collected from four crystals of cyclin dependent kinase 2 (CDK2) kindly provided by Arnaud Basle of Newcastle University, UK, and stepped transmission data collected as for thaumatin above. Although the crystals have orthorhombic P2 1 2 1 2 1 symmetry, the unit-cell b and c axes are very similar in length, giving a pseudo-tetragonal lattice. Analysis of the intensities with POINTLESS (Evans, 2011) -even of the very weakest data set -clearly shows the presence of three twofold axes and the absence of the fourfold (Table 3). As such, even if the stepped transmission approach is not used for data collection, there may be substantial value in collecting a relatively complete low dose data set rather than a sequence of single images separated in ! for screening. The full processing results from all sets for all crystals are shown in Table S9 in the Supporting Information.  Table 2 Merging and refinement statistics for thaumatin data sets recorded with transmissions from 1 16 % to 64%, processed to a fixed resolution of 1.6 Å .
Clearly the weakest of these data are suffering from poor precision in the intensity measurements, which rapidly improve as a greater dose is applied. There is, however, a point of diminishing returns between 1 and 16% where radiation damage becomes a greater factor in data quality than counting statistics, with the optimum data for refinement around 1%, as judged by R free .  Table 3 Point-group symmetry-analysis scores for individual rotational symmetry operations for a low-dose data set from CDK2. After processing, the data were taken forward to molecular replacement with PHASER (McCoy et al., 2007) using as a search model PDB entry 1hck (Schulze-Gahmen et al., 1996). Despite the low overall I/(I) of the weakest data ($5) molecular replacement was successful in every case, as judged by TFZ scores in the range 46.7-59.6. As such, even very weak or low-dose data may be useful for assessing the crystal symmetry and testing molecular replacement solutions prior to acquiring full data sets for final structure determination and refinement, though in this case even the weakest data set gave a good refined structure.

Exploration of parameter space with insulin
Data were collected from four cubic insulin crystals on Diamond Light Source beamline I03. Each data set consisted of 4800 images at 0.15 per 0.04 s, at a wavelength of 1.2 Å , 6.25% transmission ($3.1 Â 10 11 photons s À1 ) and at a distance such that the inscribed circle on the detector was at 1.4 Å . Despite the low transmission, each data set showed signs of very mild radiation damage (shown in Appendix D in the Supporting Information). However, each data set also contained sufficient anomalous signal to allow phasing via S-SAD with SHELXC/D/E (Sheldrick, 2010) making them useful for exploring parameter space.
For a given total dose, the choice will be between strength and multiplicity, as discussed earlier in Section 2. Here, however, this may be explored in more depth by taking either subsets of the data or by applying a postori transmission adjustment by digital attenuation.
3.3.1. Digital attenuation. In a monochromatic synchrotron beamline, the photon flux is controlled (for a given source configuration) by attenuator foils or wedges, which absorb a predictable fraction of the primary beam. Obviously the absorbed photons could have contributed to background, Bragg diffraction or simply passed through the sample, so the filter transmission has the overall effect of approximately scaling the image. It is important to note that this is not a simple scaling, since all processes involved are stochastic.
To reproduce this process in silico, care must be taken to ensure the stochastic processes are reproduced. The scheme in  Difference maps (rendered at 3) derived from thaumatin data, showing the tartrate molecule from the crystallization conditions, for data recorded with transmission from 1 16 to 64%. Signs of radiation damage are clearly visible in the electron density in the last of these data sets. Of particular interest is the similarity in the maps (b)-(e): by eye there is very little difference in the maps despite the factor of 64 difference in transmission used. designed to reproduce this: for each count recorded on every pixel of every image a random value is drawn from [0.0, 1.0). 1 If this random value is less than the desired transmisson factor T, the count is kept in the data, otherwise it is rejected. This will therefore maintain the statistical structure of the data, whilst reducing the intensity in the background and reflections equivalently. This is illustrated in Fig. 4 for one reflection on one image. Clearly, any radiation damage present in the original data will continue to be present in the attenuated data.
Use of this attenuation scheme will therefore allow a fairer comparison of the effects of transmission with the size of the data set, though radiation damage is not taken into consideration. This scheme is only applicable to data from a photon-counting pixel-array detector, since the events must be individually recorded and uncorrelated with one another.
3.3.2. Results. The merging statistics for each combination of transmission and subset of the data are shown for the first insulin crystal in Table 4 and (Fig. 5). Data for all crystals are included in the Supporting Information. In the table, each row in principle corresponds to comparable data sets, i.e. the same total photon count, though data sets with a wider rotation range will include more of the small amount of radiation damage present in the original data. As may be expected, the overall R meas value for each of the transmission values remains  Examples of a digitally attenuated diffraction spot for transmissions 1 to 4 À5 , and a scheme showing the mechanism for digitally attenuating data in place, for a transmission factor T. The command line for running DIALS implementation included in Appendix D in the Supporting Information.
approximately constant, however between transmissions the values change by a smaller factor than would be expected from counting statistics alone. The total summationintegrated counts for each processed data set behave as expected, deviating only a couple of percent from the desired total, which should be expected as the illuminated volume of the crystal will vary as the crystal is rotated.
Based on the merging statistics alone, for a given total dose the best overall R p.i.m. comes from the higher multiplicity weaker data, which is slightly counterintuitive given the  Table 4 Merging statistics for data derived from the first insulin crystal, with digital transmission applied.
The data are indexed by the transmission factor from 1 to 1 8 and the total rotation included (i.e. all 720 , first 360 , 180 and 90 of the data). Data sets in each row are in principle comparable, as the product of the rotation and transmission factor is constant.  radiation damage. The outer shell R p.i.m. values, however, are generally better for the stronger, lower multiplicity measurements. This may reflect the increased sensitivity of highresolution data to radiation damage, but could also reflect the increased sensitivity of weak, high-resolution data to systematic effects: a greater number of unique paths through the crystal will increase the spread of absorption paths sampled and therefore the spread relating to insufficient fidelity in absorption modelling, as the large samples (around 100 mm) and the wavelength of 1.2 Å are sufficient to give around a 5% chance of photon re-absorption based on a linear attenuation coefficient from RADDOSE-3D of 5.83 Â 10 À4 mm À1 . While this may negatively affect the precision of the high-resolution intensities, it is not clear that this would affect the accuracy of the averaged intensities. The merging statistics may therefore be inconclusive in deciding on a high-multiplicity or high-dose strategy. Similar conclusions can be drawn for all four crystals, from results shown in the Supporting Information.
3.3.3. Substructure determination. For most users the most useful measure of data quality is whether the data answer the experimental question. For ligand-binding studies this is a relatively low bar, as much of the structural information is known a priori. For experimental phasing, however, almost all of the structural information is derived from the experimental data. For phasing with SHELXC/D/E, the SHELXE phasing step is particularly effective if the data are high resolution and the solvent fraction is large: both of which apply to these insulin data where the solvent fraction is around 64%.
Therefore the success of the substructure determination will be used as the metric for data comparison here.
For the substructure determination a fairly standard SHELXC/D script was run, with 10 000 trials 2 using data to 1.9 Å , seeking three disulfides, and histograms of the combined figure of merit (CFOM = CC all + CC weak ) used to assess success. From Fig. 6(a), it is clear that substructure determination was generally unsuccessful for the data sets with 1 8 of the original photon count. Manual verification of the subsequent phasing with SHELXE confirmed that the overall phasing process was unsuccessful. For the data with 1 4 of the original photon counts [ 1 4 À 720 , 1 2 À 360 and 1 À 180 ; Fig. 6(b)] some of the trials gave potentially useful solutions for the 1 4 À 720 and 1 2 À 360 sets. Subsequent phasing with SHELXE showed a substantial contrast difference between the hands and interpretable maps from both sets, with only 1000 trials run. For the last comparison set with half of the original photon count [ Fig. 6(c)] both sets unsurprisingly gave good solutions. Inspection of the histograms suggests roughly the same number of useful solutions, indicating that the two sets are effectively equivalent in terms of substructure determination.

Resolution limits for weak data
A clear advantage of using a higher total dose is that the data are generally significant [as measured by CC 1/2 or I/(I)] to a higher resolution as the effects of random errors are Merging statistics for data derived from the first insulin crystal, with digital transmission applied. The data are indexed by the transmission factor from 1 to 1 8 (i.e. equivalent photon flux from 3.1 Â 10 11 to 3.9 Â 10 10 photons s À1 ) and the total rotation included (i.e. all 720 , first 360 , 180 and 90 of the data). Data sets included in each plot are in principle comparable, as the product of the rotation and transmission factor is constant.
reduced. Digital attenuation can be used to show that even very weak data can be sensibly interpreted and arrive at the correct symmetry albeit with substantially poorer merging statistics. 360 of data were taken from cubic insulin crystal 3 and attenuated by factors of 4 Àn for values of n in the range 0-6 (i.e. from 100% of the photons to 1 4096 %). Fig. 7 shows the total counts in the data set and the processed resolution using xia2/DIALS (full statistics shown in the Supporting Information). The trends as presented are remarkably linear, as the resolution limits are well within the linear regime of the Wilson plot, so doubling the I/(I) of the data will give a corresponding increase in the 1/d 2 min . The gradient of this line depends on the overall B factor of the crystal. The corollary of this is that an increase in transmission of around 256 was necessary to improve the resolution limit by 0.5 Å . Clearly this behaviour is sample dependent, and most samples diffract rather less well than insulin, with a higher intrinsic B factor. This however emphasises the value of using lower transmissions: the reduction in resolution for using a quarter of the dose will, in general, be much more modest, whilst the damage will be massively reduced. Recording data from mutiple isomorphous samples may be a practical way of improving the resolution, as the total number of scattered photons can increase without increasing the damage to individual samples. Similar results to those presented here have been reported in Yamamoto et al. (2017), though there the emphasis was on achieving resolution via high-flux beamlines whereas here we highlight the massive increase in photon count necessary to achieve a modest increase in resolution.

Diminishing returns
In the absence of radiation damage, increasing the multiplicity of observations will always improve the precision of the average intensity measurements, all other things being equal. Indeed, collecting high multiplicity data from one or several crystals is a well established mechanism for improving data quality (see e.g. Liu et al., 2011). If, however, the repeated measurements are through the same path through the crystal and on the same detector position, they may suffer the same systematic errors and therefore do little to improve the Histograms of combined figure of merit (CFOM = CC all + CC weak ) from SHELXD for 10 000 trials for comparison data sets with 1 8 original total photon count (a) 1 4 (b) and 1 2 (c).

Figure 7
Resolution (derived from CC 1/2 ' 0.5) versus total counts for digitally attenuated cubic insulin data, for attenuations in the range 0.0244% to 100%. The corresponding resolution limits increase from 2.15 to 1.29 Å .
accuracy of the average measurements. Also, in reality, radiation damage is rarely undetectable for very high multiplicity data sets, as shown from the following. Data were collected from a standard thermolysin test crystal with very low transmission (0.05% giving $2.5 Â 10 9 photons s À1 ) on Diamond Light Source beamline I03. Eight data sets each consisting of 7200 Â 0.1 images were recorded, and the structure refined against the first set (Winter et al., 2018) and re-refined against data consisting of the first one, two, four and all eight data sets (Table 5). Although the R merge is very high, corresponding to the very weak individual observations, the multiplicity is extremely high (from 70 to around 600-fold). As may be seen from Fig. 8, the R p.i.m. and CC 1/2 values improve for each data set, roughly in line with the multiplicity of measurements. There are however signs of modest radiation damage (Fig. 9). The results of refinement do not show such substantial improvements, suggesting that the precision of the measurements (i.e. number of scattered photons) is not a significant factor (in this case) in the overall quality of the final model, comparable with the outcomes in Section 3.1.

Radiation damage
With modern third-and fourth-generation synchrotron sources, radiation damage is the greatest limit on collecting data. Most obviously the problem of damage will become apparent as poorer diffraction on later images in the data set. By this time there is clearly nothing that can be done to correct the experiment, however it may be possible to recover  Merging statistics for weak thermolysin data sets, for one, two, four and eight double rotations (i.e. 720 data sets) at very low transmission. Table 5 Merging statistics for weak thermolysin data sets, for one, two, four and eight double rotations (i.e. 720 data sets) at very low transmission. something from the data if a high multiplicity strategy has been employed. Alternatively, this outcome may be used to give some insight into sample lifetime for subsequent data collections -the so-called 'sacrificial crystal' (Leal et al., 2011).
In either case the data should be appropriately analysed to estimate the useful sample lifetime.

Analysis statistics
The most obvious effect of radiation damage during the diffraction experiment is the fall-off in resolution during the data set. This may be determined either by eye, by inspecting the diffraction images, or by using the spot-finding tools in data-processing software. At most facilities some kind of online analysis performing spot finding with e.g. DIALS , DISTL (Zhang et al., 2006) or Cheetah (Barty et al., 2014) will give feedback on the number of strong spots and an estimate of the resolution, sampled at points throughout the data set. While the interpretation of this feedback may be complicated by the effects of diffraction anisotropy, poor sample centering, differing unit-cell lengths and 'fresh' crystal being rotated into the beam, the idea that the sample at the end of the experiment is isomorphous with the one at the start can be tested. Fig. 10(a) shows a case where no radiation damage is apparent, with the first run of thermolysin data from Section 4, with the plot derived from spots found on all images and averaged over ten-image intervals (i.e. 1 ). While a certain amount of point-to-point variation is obvious, the overall trend is flat as expected, with a modest periodic variation. It is important to note that the resolution value here is a substantial underestimate compared with the final high multiplicity scaled and merged data set.
In cases where the radiation damage is more obvious the fall-off in diffracting resolution can be dramatic. Fig. 10   R merge versus frame number for 8 Â 720 data sets, showing a steady increase in the statistic alongside a periodic variation due to illuminated volume.
Diamond Light Source beamline I03. Data were collected with 9600 40 ms exposures at 0.9762 Å with 50% beam ($3.8 Â 10 11 photons s À1 ) each corresponding to 0.15 of rotation (i.e. a total of four full rotations). While there are clearly some interesting features in the diffraction as the sample is rotated, the overall trend is clearly downward after the first eighth of the data set. In this case attempting to recover a complete set from the beginning of the data or collecting from a fresh sample with much lower transmission may be advisable.
In some cases radiation damage may be present but less severe. The third example (Fig. 10c) was collected as part of the same lifetime study, from a crystal of CDK2. Data were collected with the same parameters used for BRD4, with a much more modest fall-off in diffraction during the scan, suggesting that a substantial part or indeed the whole data set could be used downstream.
After integration and scaling however, the R merge versus batch plot from AIMLESS (Fig. 11a) shows clear indications of radiation damage, with data at the middle of the exposure agreeing better than the extrema (Evans & Murshudov, 2013).
The R d plot (Fig. 11b) (Diederichs, 2006) shows a clear positive gradient, indicating the presence of radiation damage, though without suggesting a point where this damage becomes problematic. In response to this challenge a new statistic was developed, R cp , which accumulates the pairwise differences throughout the data set.

R cp
The statistic R cp was derived from some of the principles behind R d some time ago (Winter, 2009) but never formally published though referenced (Evans, 2011). The derivation started from the principle, analogous to R d , that comparing measurements in a pairwise manner stabilized the statistic with respect to multiplicity of measurements -avoiding the difference between R merge and R meas . However, where accumulates the differences between measured intensities I hj on a baseline of dose (or image number) difference, R cp accumulates all differences up to this dose or image number, as At the time when the statistic was developed (late 2000s) interleaved MAD experiments were en vogue for structural genomics, so the intention was to accumulate the statistic across multiple wavelengths following how they were collected. For the most straightforward mode of data collection, i.e. high multiplicity experiments as discussed in this section, the interpretation of the statistic is relatively simple: once you have a complete set of observations, the statistic will remain constant if the new measurements you are bringing into the data set agree with the existing ones, and will increase if they agree, on average, less well than the pairwise observations to date agree. As with all statistics of this nature, it is effectively impossible to disentangle radiation damage from changes in illuminated volume and diffraction anisotropy unless greater than 360 of data have been measured, If you have a sufficient multiplicity of measurements however the trends should be clear. Fig. 12 shows the statistic computed for the thermolysin data used previously. From the completeness curve it is clear that an almost complete data set has been acquired after around 400 images, however a little more anomalous data are acquired after 180 of rotation. Beyond this point, no new measurements are being made, however the repeated observations are in agreement with those measured to this point. At the very earliest stages the statistic is very poorly sampled, so should not be considered reliable (this is comparable with R d at the far right end of the plot). Including additional measurements will, in this case, improve the precision of the average intensities as expressed in R p.i.m. as the new observations are drawn from the same population.  In the case of the CDK2 data ( Fig. 13) complete data are acquired after around 1200 images (180 ) and the R cp statistic stays approximately level until about 360 have been collected, after which it increases in a monotonic manner. While including the new measurements may improve the R p.i.m. this will be misleading, as the new measurements are from measurably if slightly different populations. Indeed, as may be seen in Table 6, including all the measurements in the data set does not give the improvement which could be expected in R p.i.m. , which drops in the outer shell from 0.103 to 0.084 when the quantity of observations is quadrupled. In this case the choice should be made by the experimenter as to how much data to include in the downstream analysis, which may in turn depend on the experimental objectives. For reference, the total dose to the sample from 2400 images (360 ) was estimated to be 3.5 MGy, though this is complicated by the sample being substantially larger than the beam.

Multiple crystals
The conventional approach to data collection from multiple crystals focuses on constructing a complete set from samples that are highly radiation sensitive. However, as is well established in the literature (see e.g. Liu et al., 2011) combining multiple complete data sets can aid in phasing experiments. By the same token, collecting data from multiple samples also allows the choice on which data to take forward to be made on the basis of downstream analysis. Finally. the intention is to determine structural insight into a biological molecule or complex, rather than a specific sample, so averaging across multiple samples should improve the accuracy of the averaged intensities as sample-to-sample variations in e.g. crystal shape and orientation are averaged out.

Sample selection
Before the arrival of photon-counting pixel-array detectors, screening a few samples before selecting the best for data R cp and completeness versus batch for CDK2, showing complete data after around 1200 images but substantial increases in the R cp statistic after 2400 images (360 ). Table 6 Merging statistics for CDK2 for the full data set (four full rotations) one half and one quarter, the last as recommended by interpretation of R cp .
All data are processed to a fixed resolution limit of 1.3 Å to enable straightforward comparison. Though the R p.i.m. in the outer resolution shell improves slightly in the full data set, it is a long way short of the improvement which could be expected from the fourfold increase in multiplicity.

Figure 12
R cp and completeness versus batch for the first sweep of weak thermolysin data from Section 4, showing that essentially complete data are present after about 1800 images, and no increase in R cp throughout the data set.
collection was common practice, as acquiring a full data set could take many minutes. With pixel-array detectors on thirdgeneration sources it becomes possible to carefully record a complete 180 or 360 in under a minute, raising the prospect of recording a complete data set from every sample and deciding later how best to use the measurements. The simplest option is to select the data set with the greatest precision to a given resolution limit (i.e. lowest overall R p.i.m. ) or the strongest high-resolution data. Table 7 shows the merging statistics for the first 360 of each of the original cubic insulin data sets used in Section 3.3. While they are similar overall, it may be tempting to select the first as it has the highest overall I/(I), or the second or fourth as they have the highest I/(I) in the outer shell. Substructure determination with the fourth (Fig. 14) was in fact unsuccessful with 1000 trials, with the third sample having the greatest overall number of successful trials: taking the data forward in parallel was therefore helpful in making a sensible choice.

Combining crystals
One well established technique for improving the quality of data sets (Liu et al., 2011) is to combine the data from multiple samples. An obvious question to ask is whether, in the absence of radiation damage, collecting a given amount of data from  Table 7 Merging statistics for four 360 data sets from cubic insulin. Each data set was recorded with a low transmission to reduce the impact of radiation damage.
A fixed resolution limit of 1.4 Å was used for side-by-side comparisons.
Crystal (Â360 )  1  2  3  4 Crystal parameters Space group I2 1 3 I2 1 3 I2 1 3 I2 1 3 Unit-cell parameters ( Table 8 Reproduced statistics from Table 7, with combined half data sets from samples 1 + 2 and 3 + 4 and quarter data sets of 1 + 2 + 3 + 4, showing comparable statistics in all cases to the resolution limit of 1.4 Å . Unit-cell parameters ( multiple samples is equivalent to collecting the same total dose from a single sample. In the general case of course radiation damage will be more substantial with the higher dose, however data can be collected carefully to minimize damage and give data with which this hypothesis can be tested, Table 8 shows the merging statistics of seven 'equivalent' data sets: 360 from each of the four insulin crystals, 180 from 1 + 2 and 3 + 4 and 90 from 1 + 2 + 3 + 4. In all cases the R p.i.m. and R meas are comparable, suggesting that the combined data sets are equivalent i.e. that the samples are truly isomorphous. Clearly if radiation damage is not substantial, and the samples are isomorphous, then combining the complete 360 from each set is sensible as this will improve the overall data set. Table 9 shows the merging statistics for sample 1, then 1 + 2, 1 + 2 + 3 and 1 + 2 + 3 + 4 combined, with the expected improvement in I/(I) and R p.i.m. . Critically, the success rate of substructure trials for phasing ( Fig. 15) improves with the addition of data from each sample, indicating that the combined data set is more useful than any of the individuals as may be expected.

In situ data collection at room temperature
The examples presented so far in this section combined data sets from multiple crystals in order to improve the overall data quality. In some cases, it is simply not possible to collect a complete data set from any one individual crystal, in particular for small, weakly diffracting crystals, or for room-temperature in situ experiments (Axford et al., 2012). In such cases, it is necessary to combine many severely incomplete data sets from many crystals in order to obtain a complete data set. Each individual data set covers a limited region of reciprocal space as a result of small crystal size, radiation damage or limitations of experimental setup (e.g. in situ data collection).

Figure 15
Histograms of combined figure of merit (CFOM = CC all + CC weak ) from SHELXD for 10 000 trials for the first 360 from crystal 1, 1 + 2, 1 + 2 + 3 and 1 + 2 + 3 + 4. As may be expected from the merging statistics, the data from two, three and four crystals give increasingly successful substructure determination.

Figure 14
Histograms of combined figure of merit (CFOM = CC all + CC weak ) from SHELXD for 10 000 trials for the first 360 from each of the four insulin crystals. Despite similar merging statistics, the trials for crystal 3 were much more successful than crystal 4. optimal data set for downstream phasing and refinement. In this section we describe some of the challenges involved using the example of in situ experimental phasing of a proteinase K heavy-atom derivative.
6.3.1. In situ experimental phasing of a proteinase K heavyatom derivative. In situ data collection was performed on both native and heavy-atom derivatives of proteinase K microcrystals. Data were collected on beamline I24 at Diamond Light Source, using a Dectris PILATUS3 6M detector, using a 9 Â 6 mm beam with a flux of approximately 2Â 10 12 photons s À1 . Data were collected with an oscillation range of 0.1 and exposure time of 0.01 s per image. Data collection was performed across two beamline visits, with 63 and 82 Au-derivative data sets collected across the two visits, giving a total of 145 Au data sets. 50 images (5 ) of data were collected per crystal for the first visit of Au data, and 25 images (2.5 ) per crystal for the second based on experience from the first visit. In addition, 83 native data sets were collected in a single visit, with 25 images from each. 6.3.2. Data processing. 136 individual Au data sets were successfully processed with xia2/DIALS, with initial indexing, refinement and integration performed in the primitive triclinic (P1) setting. Clustering on unit-cell parameters (Zeldin et al., 2015) identified a cluster containing 133 data sets in P4/mmm symmetry, with median unit-cell parameters a = b = 68.47, c = 103.88 Å , = = = 90 . Analysis with dials.cosym and dials.symmetry, implementing the algorithms of  and POINTLESS (Evans, 2006) respectively, identified the Laue group as 422. Joint refinement of unit-cell parameters using dials.two_theta_refine gave overall unit-cell parameters of a = b = 68.48, c = 103.95 Å , = = = 90 . Scaling with dials.scale gave the merging statistics in Table 10. Additionally the Au data sets from the two visits were processed independently.
Radiation-damage analysis was performed by calculating the R cp statistic presented in Section 5.2, under the assumption that each crystal received an equivalent dose per image (Fig. 16). From Fig. 16(a) it can be seen that after reaching a minimum somewhere between 25 and 30 images, R cp begins to climb steadily, suggesting that cutting the data after 25 images may reduce the affects of radiation damage. Therefore, scaling of all 136 data sets was repeated as above, however this time using only the first 25 images of each data set.
Merging statistics for all data sets are presented in Table 10.
6.3.3. Phasing. Substructure determination using single isomorphous replacement with anomalous scattering (SIRAS) was possible with SHELXD (Fig. 17a). The heavy-atom derivative data sets were collected across two separate beamline visits. To test the effects of multiplicity on phasing success, substructure determination was attempted separately  Table 10 Merging statistics for native and Au-derivative data sets of proteinase K.. Statistics are reported for Au-derivative data sets collected separately across two beamline visits, with all data combined, and using only the first 25 images from each data set.

Native
Au ( on data sets coming from a single visit, and on data from both visits combined. Fig. 17(b) shows the map contrast versus cycle number after density modification with SHELXE. Given the potential for radiation damage in some of the data sets identified above, phasing was also attempted using only data from the first 25 images of each data set. Using only the first 25 images gave improved phases for both heavy-atom substructure and density modification, as judged by the SHELXD combined figure of merit (CFOM = CC all + CC weak ) and SHELXE map contrast respectively. The resulting densitymodified phases and heavy-atom phases are shown along with the SHELXE poly-Ala trace in Fig. 17(e). Substructure determination by single-wavelength anomalous diffraction (SAD) was unsuccessful using data from either visit alone, or using all data combined. However, when using only data from the first 25 images of each data set, a successful substructure solution was obtained (Fig. 17c). Unfortunately, the phases were not of good enough quality for subsequent density modification with SHELXE. Nonetheless, this demonstrates that careful selection of the data, in particular avoiding inclusion of radiation-damaged data, can be crucial in determining the success of experimental phasing. The correctness of the substructure from SAD phasing was verified by comparison with the SIRAS substructure using the program phenix.emma (Adams et al., 2010).
Anomalous difference maps were calculated with ANODE ( Thorn & Sheldrick, 2011), using refined models obtained by running DIMPLE on each data set. For all Au data sets two significant anomalous peaks were found. Using all data sets combined gave a stronger anomalous peaks than when only using data from a single beamline visit. However, the strongest anomalous peaks were obtained when using only the first 25 images from each data set (Fig. 17d).
While the assumption that all samples are affected by the radiation at the same rate is hard to justify, the effect of individual variation in a population of more than 100 samples is likely to be modest. As such, looking at the population as a whole is reasonable as well as pragmatic, as the entire search space consists of around 10 145 permutations. It is also worth noting that the completeness of around 90% is an unavoidable feature of some in situ data sets, as the samples have preferred orientations with respect to the crystallization plate.

Discussion and practical recommendations
Considering the four questions set out earlier.
(i) Is it the case that a larger number of weak observations is equivalent to a smaller number of stronger observations with the same total photon counts? Does the speed of collection matter?
(ii) If very weak data are recorded, are they useful? (iii) Given a reasonable multiplicity of observations, how is radiation damage detected and how do we decide where to cut the data set?
(iv) Given data from multiple samples, how is it best to combine the data, i.e. is it better to combine weak complete sets or stronger partial ones?
Overall, the question of how to use the photons in the absence of radiation damage seems equivocal -by and large the 'quality' of the data as assessed by merging statistics is dominated by the total number of scattered photons, at least in the low-dose regime. Of course, radiation damage is rarely absent, so a high-multiplicity/low-dose strategy is a more conservative plan for data collection, provided that a photoncounting detector is used. In general, if a multi-axis goniometer is available and multiple low-dose sweeps are to be recorded, changes in orientation between sweeps (i.e. changes in or ) will help to improve the average accuracy of the data. In the absence of any insight into the sample lifetime, recording a full rotation with low flux, say O(10 10 ) photons per degree, then quadrupling transmission [which will, in the absence of radiation damage and by counting statistics alone, double the I=ðIÞ of the data] and repeating until clear signs of radiation damage are seen can be an effective strategy for acquiring a useful data set from a single sample: in the infinite limit the dose deposited before the 'useful' data set is roughly one third of the dose of the final set. If the last two sets are used (i.e. the 'useful' one and the one before with one quarter of the dose) the 'wasted' dose (i.e. exposure of the sample to X-rays which do not contribute to the final data set) drops to around one twelfth. As shown earlier these weaker data sets can also be useful for confirming the symmetry of the sample, performing molecular replacement or computing difference maps for ligand idenfication. In terms of radiation-damage detection, the R d statistic (Diederichs, 2006) can be an effective tool in determining the presence of damage though gives little insight into the point at which this damage becomes evident. The R cp statistic presented in Section 5.2 overcomes this limitation and may therefore be a useful tool when combined with high-multiplicity/low-dose data collection, and when data are collected in situ and the configuration space to explore in terms of cutting back data sets is vast. Finally, the question of combining data from multiple samples and the best data to use remains open. Clearly, assessing isomorphism from effectively complete data sets will be more straightforward than narrow sweeps however the form of data may  Experimental phasing results for Au derivatives of proteinase K. (a) Histograms of combined figure of merit (CFOM = CC all + CC weak ) from SIRAS substructure determination with SHELXD for 10 000 trials, with data from two separate visits individually and combined. (b) Map contrast versus cycle number for density modification with SHELXE. Solid lines indicate the best hand, while dashed lines correspond to the inverted hand. (c) Histograms of combined figure of merit (CFOM = CC all + CC weak ) from SAD substructure determination with SHELXD for 10 000 trials, with data from two separate visits individually and combined. (d) Anomalous peak heights calculated with ANODE. (e) The density-modified (blue) and heavy-atom substructure (orange) phases, contoured at 3, and poly-Ala traced model output by SHELXE after substructure solution with SIRAS. ultimately be dictated by the mode of data collection i.e. in situ collection brings geometric limitations. It is however useful to note that combining data from multiple isomorphous samples will almost certainly improve the quality of the final measurements.
As such, the practical recommendations may be summarized as follows.
Collect carefully! In the absence of any insight into the lifetime of your samples, use a low transmission (aiming for e.g. 10 11 photons s À1 into a 30 mm beam) and build up from there. Particular care should be taken with microfocus beamlines.
Given a sensible lifetime estimate, record highly multiple data to allow the data set to be truncated later, ideally changing sample orientation between data sets if possible.
Take any detector dead-time into consideration when chosing an exposure time for shutterless data collection -with some detectors such as the Dectris EIGER2 X this is negligible while others (e.g. Dectris PILATUS3 X 2M) this can be as much as 24% of the total frame exposure time.
Consider combining data from multiple (isomorphous) samples: if the samples really are representative of the molecule under study and the experiment reproducible, the combined data should be better.
If combining data from multiple samples, analyse the data as they are collected to assess completeness, isomorphism and usefulness of the combined data. For phasing experiments this should include attempts at substructure determination.
Experiment with using different data-processing packages as well as inspecting all available automated processing -the 'best' software may be case dependent and some programs may work better than others for your combination of sample, experiment hardware and mode of data collection.
Following these guidelines may increase the computational expense of data analysis and the data storage requirements for archiving. It is worth noting however that low-dose pixel-array data compresses very well (using gzip the total storage for a data set is roughly proportional to the total counts in the images) and that careful collection of data may remove the need for collecting from similar samples on a future visit. Of course, the main benefit of the approach presented here is to increase the success rate of X-ray diffraction experiments by limiting the impact of radiation damage, giving the best possible use of your samples and ultimately the best use of photons.