Efficient merging of data from multiple samples for determination of anomalous substructure

The benefits of using local scaling and optimization of anomalous signal (as implemented in PHENIX) for merging data sets from many crystals for determination of the substructure for weak anomalous scatterers are examined.


Introduction
The use of high-multiplicity Bijvoet data from multiple samples has been shown to be effective in the determination of anomalous substructure in difficult cases (Akey, Brown, Dutta et al., 2014;Liu et al., 2012). These difficult cases include crystals with weak or few anomalous scatterers, poorly diffracting crystals and/or data collected far from the energy of peak absorption. While in principle increasing the multiplicity of merged data sets improves the data quality, complications arise from non-isomorphism when data from many samples are merged and from radiation damage when more data are collected from single crystals. Culling data, either by identifying and excluding outlier crystals or by removing the later diffraction images from radiation-damaged data sets, may be effective when merging data from many samples. Choosing an 'optimal' combination of data sets requires an assessment of individual data sets and, if radiation damage is suspected, a decision on how many data to include from each sample. Compatible data sets can be chosen by a variety of metrics including unit-cell variations (Foadi et al., 2013;Liu et al., 2012Liu et al., , 2013, anomalous difference and/or intensity correlation ISSN 2059-7983 coefficients Giordano et al., 2012;Liu et al., 2012Liu et al., , 2013Evans, 2006), and merging R factors (Evans, 2006).
A modification to data-merging strategies that relies on the weighting of data sets prior to merging has been implemented in phenix.scale_and_merge as part of the PHENIX suite of software . Phenix.scale_and_ merge uses a multi-part strategy to optimize the accuracy of the anomalous differences in a merged data set. The strategy combines the use of an algorithm for local scaling, corrections for data anisotropy and, perhaps most importantly, a weight for each data set based on the correlation of anomalous differences to those of the average data. In principle, local scaling can ameliorate the effects of radiation damage, and weighting individual data sets allows the inclusion of more samples in the final merged data than does data culling. While phenix.scale_and_merge attempts to use as many samples as are available, outlier data sets are identified and excluded.
Here, we evaluate the strategy of local scaling and data-set weighting in comparison to the treatment of data with a global-scaling protocol. The system is the flavivirus NS1 protein, for which we solved the crystal structure from the anomalous scattering of sulfur (Akey, Brown, Dutta et al., Figure 1 (a) CC ano of individual data sets for 28 crystals compared with the reference 28-crystal merged data. D ano from merged and scaled data (forward and inverse) for each sample was compared with D ano from the complete, local-scaled 28-crystal data set. Two data sets (red traces; samples Nos. 152 and 155) are outliers. (b) Correlation coefficients (cumulative to a d min of 3.5 Å ) of structure-factor amplitudes (F ) from scaled and merged data for each crystal (forward and inverse wedges combined) for each pairwise combination of samples. Data are coloured on a percentile scale (percentile ranking and correlation values are shown in the key) from blue (lowest correlation) to red (highest correlation). Bold lines separate the crystals of the 14-, 18-, 23-and 28-crystal data sets. This plot is symmetric about the diagonal. 2014). The S-SAD data are highly suitable for this experiment because complete data were recorded with a sixfold multiplicity of Bijvoet pairs for each of 24 crystals and with a threefold multiplicity for each of four other crystals. Previously, we reported that in certain instances better substructure solutions resulted when 'outlier' data sets were excluded . This is confirmed when data are added (from best to worst) to create a merged data set . In the absence of accounting for inter-data-set variances, adding the 'worst' data sets degraded the accuracy of anomalous differences . The identification of outlier crystals can be tedious and may vary depending on the metric used. We tested whether the data-set weighting implemented in phenix.scale_and_merge would downplay the deleterious effects of slightly non-isomorphous crystals and eliminate the need to manually exclude data from outlier crystals.
We solved the crystal structure of the flavivirus NS1 protein by native sulfur SAD phasing at low resolution using multicrystal data, followed by phase extension to 3.0 Å resolution, which yielded a high-quality electron-density map (Akey, Brown, Dutta et al., 2014;. A total of 28 data sets were collected, 18 of which were combined for the original structure solution. Here, we investigate a local-scaling and anomalous-optimization strategy for weighting and combining individual data sets into a final high-multiplicity merged data set. As the focus is to optimize the determination of the anomalous substructure, the data analysis is limited to anomalous signal quality indicators in the low-resolution shells where an anomalous signal was detectable (4.0 Å ). The metrics for assessing merged data quality were the correlation (CC ano ) of observed anomalous differences (ÁF) with those from either an atomic model or a reference data set, the ratio of anomalous difference to total structure amplitude (h|ÁF |/F i) and the quality of the anomalous substructure determined from a merged data set using SHELXD (Sheldrick, 2010; number of sites found and CFOM). h|ÁF|/Fi is a useful metric as it can be calculated in the absence of structural information, it appears to be a sensitive indicator of anomalous signal quality and it does not rely on error estimates. While the signal-to-noise estimate of anomalous differences [h|ÁF|/Á(F)i] could also be used, sigma values vary with the method used to estimate them and, especially for small differences and weak F, they may be an unreliable metric for evaluating different merging strategies.

Data collection and processing
As previously described Akey, Brown, Dutta et al., 2014), data were collected from 28 individual crystals on beamline 23ID-D (GM/CA) at the Advanced Photon Source (APS), Argonne National Laboratory. As a compromise between minimizing sample X-ray absorption and maximizing sulfur anomalous signal strength, data were collected at 7.1 keV (Liu et al., 2012), where the theoretical f 0 is 0.365 e À and the theoretical f 00 is 0.703 e À . At this energy, the estimated anomalous signal is approximately 1.5% (|ÁF|/F) based on the sulfur content (six disulfides and five methionines in 352 amino acids). In order to obtain complete anomalous data and to minimize systematic errors in anomalous differences owing to radiation damage, all data were recorded in an inverse-beam geometry (5 wedges) so that true Friedel pairs were recorded close in time. 2 Â 90 of data ('forward' from goniometer setting 0 to 90 and 'inverse' from goniometer setting 180 to 270 in interleaved wedges of 5 ) were collected for each crystal, with the exception of two samples (Nos. 152 and 155) for which only 45 Â 2 were collected. Data were processed using XDS to a limit (d min ) of 2.9 Å for all trials reported here (Kabsch, 2010a,b). For each crystal, the forward and inverse passes were processed separately but with common crystal-orientation parameters. A reference data set was assigned to ensure the indexing of all data with the same axis convention (space group P321). The forward and inverse passes for each crystal were scaled in XDS and output without merging.
2.1.1. Assessing individual crystal data. For the current analysis, the 28 data sets from individual crystals were merged with phenix.scale_and_merge (using default settings and optimize_anomalous=True). Correlation coefficients (CC ano ) of the observed anomalous differences for each crystal with the anomalous differences of a fully merged data set including all data from all crystals were calculated as a function of resolution (Fig. 1a). The individual data sets were clustered with respect to CC ano , with the exception of the two samples for which only 45 of data were collected (Nos. 152 and 155). Pairwise correlations of amplitudes (F ) were calculated for all possible crystal pairs (Fig. 1b). This analysis identified three samples (Nos. 117, 119 and 233) that appeared to be outliers. Interestingly, these three crystals were centrally located in the narrow distribution of unit-cell parameters.
2.1.2. Comparison of local to global scaling. Data from 28, 23, 18 and 14 crystals were scaled and merged using both AIMLESS for global scaling (Evans & Murshudov, 2013) and phenix.scale_and_merge for local scaling . Data merged with AIMLESS used the scale keywords constant, bfactor on and brotation 20 to ensure true global scaling for these trials. The groups with fewer crystals were chosen as described previously . Briefly, pairwise correlations among the 28 samples identified 18 that were deemed to be compatible and initially used to solve the NS1 structure. Retrospective analysis included testing all subsets of the 18 samples, of which four did not appear to contribute to substructure identification. Elimination of these four resulted in the 14sample data set. Reprocessing of the initial ten outlier samples corrected mistakes in initial processing and brought five more samples into 'compliance'. With the original 18, these compose the 23-sample data set. The five remaining outliers either had only 45 of data collected (Nos. 152 and 155; Fig. 1a) or had poor correlation coefficients (F) to the remaining samples (Nos. 117, 119 and 233; Fig. 1b unit-cell parameters to within 1%, and those that deviated most from the average had a correlated deviation in the refined detector distance. Thus, unit-cell differences were not a useful determinant for data-set clusters in this case. Observed anomalous differences in the merged data sets were compared with calculated anomalous differences using h|ÁF|/Fi (Dauter & Adamiak, 2001;Hendrickson et al., 1985;Smith & Hendrickson, 2001) and CC ano , where CC ano was calculated between the observed anomalous differences and the calculated anomalous differences based on the refined model (phenix_refine using PDB entry 4tpl and explicit sulfur f 0 and f 00 values). For both metrics, the local-scaled, anomalousoptimized data were in better agreement with the calculated anomalous differences than were the global-scaled data (Figs. 2a and 2b). The average h|ÁF calc |/F calc i (Fig. 2a, red curves) illustrates the increase in anomalous signal from disulfide scattering for data where individual S atoms are not resolved (solid red curve) compared with h|ÁF calc |/F calc i from a model where disulfides were artificially broken by rotation of the cysteine side-chain torsion (dashed red curve). In all cases, the h|ÁF obs |/F obs i were closer to expectation for local-scaled, anomalous-optimized data relative to the global-scaled data (solid versus dashed black, orange, green and blue curves), consistent with a lower level of noise in the anomalous signal for the local-scaled data. Although counterintuitive, a low value of h|ÁF obs |/F obs i for a data set from crystals with weak anomalous scattering indicates more useful anomalous differences than a high value because a high value reflects the errors in measurement of |ÁF obs | (Dauter & Adamiak, 2001;Hendrickson & Ogata, 1997). The CC ano values were smaller in the lowest resolution shell than in the next lowest shell (Fig. 2b). We surmise that this is owing to the difficulty in modelling bulk-solvent contributions to the overall structure factors, an effect that would have the largest impact at low resolution. Alternatively, systematic errors in measurement of the very low-resolution data may contribute. When using either local-scaled and anomalous-optimized or global-scaled protocols, addition of data from 14-to 18-and 23-crystal data sets yielded incremental improvements in the agreement with model anomalous differences (Fig. 2b). However, inclusion of the 'outlier' data sets (from 23 to 28 data sets) had a marginal effect for the local-scaled data but severely degraded the anomalous signal with the global-scaled data. At higher resolutions, the local scaled data had a slightly better agreement with the model phases. Substructure identification (using SHELXD) was more robust for the 14-and 18-data-set combinations when using local-scaled and anomalous optimized data, with higher correlation coefficients (CFOM; Fig. 2c). With 23 data sets, the local-and global-scaling protocols worked equally well, but when the 'outlier' data sets were included SHELXD failed to find any solution for the Comparison of local-scaled, anomalous-optimized data with global-scaled data. (a) Average fractional anomalous difference [h|ÁF|/Fi] versus resolution for 28-crystal (black), 23-crystal (orange), 18-crystal (green) and 14-crystal (blue) data sets merged with phenix.scale_and_merge (solid lines) or AIMLESS (dashed lines). h|ÁF calc |/F calc i were from the refined model (PDB entry 4tpl) with intact disulfides (solid red lines) and with artificially broken disulfides (dashed red lines; S atoms were moved by changing each Cys rotamer). (b) Cumulative CC ano between ÁF obs for merged data and ÁF calc from the refined model. (c) CFOM (CC all + CC weak ) of the best solution among 10 000 SHELXD trials at each limiting resolution. There is no plot for the global-scaled 28-crystal data because SHELXD failed to find any solutions. global-scaled data. We suspect that the data-set weighting in the anomalous-optimization protocol, rather than the local scaling per se, alleviated the poisoning of the combined data sets.

Merged data quality is strongly correlated with
multiplicity. The merged data sets from 28, 23, 18 and 14 crystals have an unusually high multiplicity of anomalous differences (78-fold to 150-fold). We expanded the analysis to merged data sets with more typical multiplicities ranging from 12-fold to 45-fold. Merged data sets were generated from eight, six, four and two crystals that were chosen based on the highest correlation of ÁF with the 28-crystal reference data set (Fig. 1). As expected, the anomalous signal quality steadily decreased with decreasing multiplicity (Fig. 3). Additionally, data combinations using fewer than 14 crystals failed in anomalous substructure determination with SHELXD (Table 1).

Local-scale handling of radiation damage and non-
isomorphism. Both non-isomorphism and radiation damage have deleterious effects on the ability to estimate accurate anomalous differences for multi-crystal data sets. These effects are expected to differ in magnitude and to be dependent on the details of data collection and on the crystals. Thus, the effect on signal quality when multiplicity is reduced may depend on whether the reductions were achieved by limiting the number of crystals or by limiting the number of experimental observations per crystal. In the case of NS1, a total of 180 of data were recorded from nearly all crystals (90 for each of the forward and inverse passes, recorded in interleaved 5 wedges). The diffraction images from the end of the data collection for each of the 28 crystals exhibited a visible reduction in the diffraction limit, which was up to 2.9 Å at the start of data collection and typically decreased by 0.5-0.8 Å by the end of data collection. Thus, we anticipated that excluding the data most affected by radiation damage, i.e. the forward and inverse images recorded last in the data collection, might improve the anomalous signal relative to data sets where an equivalent multiplicity was achieved by excluding crystals. We created 'damage-minimized' data sets for all 28 crystals by Metrics of anomalous signal for merged data sets from different numbers of crystals. (a) h|ÁF|/Fi for local-scaled, anomalous-optimized data for merged data sets from two (11.5-fold multiplicity), four (23.0-fold), six (34.5-fold), eight (45.5-fold), 14 (77.5-fold), 18 (100.2-fold), 23 (128.8-fold) and 28 (150.0fold) crystals with multiplicities indicated in the legend. (b) CC ano (ÁF obs compared with ÁF calc ) by resolution shell as function of increasing multiplicity, shown for individual resolution bins. (c) Cumulative CC ano for data sets of variable multiplicity. Table 1 Correct sulfur sites found (of 23 possible sites: 12 S-S, ten Met and one SO 4 2À ) with SHELXD. excluding the last-recorded 15, 30 or 45 of data and then merged these data with local scaling. Each of these 'damageminimized' merged data sets was compared with a data set of similar multiplicity that included the full 90 of inverse-beam data from fewer crystals [the first 75 of data from 28 crystals (127.5-fold multiplicity) versus 90 of data from 23 crystals (128.8-fold multiplicity), the first 60 of data from 28 crystals (104.6-fold multiplicity) versus 90 of data from 18 crystals (100.2-fold multiplicity), and the first 45 of data from 28 crystals (80.2-fold multiplicity) versus 90 of data from 14 crystals (77.5-fold multiplicity)]. The full 90 data set from 28 crystals had a 150.0-fold multiplicity of anomalous pairs. At equivalent multiplicity, the anomalous signal metrics were roughly equivalent for the damage-minimized and the crystalreduced data sets (Fig. 4, Table 1). This suggests that to a first approximation and in the case of the NS1 data, discrepancies owing to non-isomorphism and discrepancies owing to radiation damage were handled equally well by the local-scaling protocol. Nevertheless, we note that the |ÁF|/F and CC ano metrics are slightly but consistently better for the reducedcrystal data sets than for the damaged-minimized data sets with matched multiplicity.

Summary
In principle, local scaling will better account for intensity differences that arise either from radiation damage or from errors in modelling sample absorption. Here, data for each of several crystals were locally scaled and individual crystal data sets were merged with anomalous optimization. This scalingand-merging procedure resulted in a merged multi-crystal data set with lower noise in the anomalous differences than was observed for the same multi-crystal data that were scaled globally and merged without anomalous optimization. Local scaling resulted in more accurate estimates of anomalous differences at lower data multiplicity while data-set weighting to optimize anomalous differences down-weighted suspect data (23-versus 28-crystal data sets), thus minimizing the need to cull data from a combined meta-data set. Both local scaling and data-set weighting contributed to greater success in substructure determination. The most striking result of our investigation is that the overall quality of anomalous differences was similar for data sets of equal multiplicity, whether data multiplicity was reduced by excluding crystals or by excluding the most radiation-damaged segments of data from each crystal. For each of three multiplicities tested (128-fold, 100-fold and 80fold), the crystal-reduced data were slightly superior to the damage-minimized data acccording to the criteria h|ÁF |/F i and correlation to calculated anomalous differences. We had anticipated the opposite: that excluding data to minimize the effects of radiation damage would result in a substantial Comparison of damage-minimized and crystal-reduced multi-crystal data sets. (a) h|ÁF|/Fi versus resolution for 28-, 23-, 18-and 14-crystal data sets (solid lines) and damage-minimized 75, 60 or 45 data sets (dashed lines) of matched multiplicity. In red are calculated values for h|ÁF|/Fi from the refined model. (b) Cumulative CC ano comparing ÁF for merged data with those calculated from the refined model. (c) CFOM (CC all + CC weak ) of the best solution from 10 000 SHELXD trials at each limiting resolution.
improvement in the quality of anomalous differences. Two effects may contribute to the finding that achieving a given multiplicity by using fewer crystals was slightly better than by using less data from each crystal. NS1 crystals diffracted to $3 Å resolution, but we analysed only the strongest data (to 4.0 Å resolution) as the anomalous differences beyond this limit were not useful. The lowest resolution data are expected to be least influenced by radiation damage, so that the anomalous differences to 4.0 Å resolution may be preserved despite visible damage at the $3 Å limit during the 90 (Â2) of data collection from each crystal. The second effect is a reduction in the multiplicity of observed anomalous differences within each crystal for the damage-minimized data sets (from sixfold in the full 90 data sets to threefold in the 45 data sets). For weak anomalous differences at low resolution (estimated to be 1.5% of F for NS1; solid red curve in Fig. 4a), the greater multiplicity was evidently superior to the avoidance of the apparently modest effects of radiation damage at 4.0 Å resolution and below. This is also clear in the pairwise CC ano values, where anomalous differences in the data of the two crystals from which only 45 (Â2) of data were recorded were far more poorly correlated to anomalous differences from other crystals even though the amplitudes (F) were well correlated.
Anomalous substructure-determination relies almost exclusively on low-resolution, typically high-intensity data, and we have seen that local scaling and anomalous optimization improve the quality of anomalous differences in multicrystal data sets. This may not be the best approach to generate structure-factor amplitudes to the diffraction limit of crystals, as local scale factors in high-resolution shells are generated using only the weakest data. A more global scheme may be more suited to preparing a multi-crystal data set for model refinement. The combination of local scaling and anomalous optimization appears to provide a robust and efficient method for combining data from many crystals to maximize the quality of the anomalous signal in difficult problems.