Figure 4
Results of data-processing methods on training data. In (a), (b) and (c), the values refer to Pearson's product–moment correlation coefficient and r.m.s. refers to root-mean-square. (a) Inter-feature correlation for three example feature pairs. (b) Distributions of all 210 inter-feature correlation coefficients versus resolution. Coefficients are converted to r.m.s. values, plotted boxes correspond to values within the interquartile range, the median is shown as a horizontal bar and values outside this range are shown as dots. (c) Summary of inter-feature and versus-resolution correlation for ED and CC features at each stage of data processing. Inter-feature (inter-feat.) values refer to correlations between grouped features and versus-resolution (vs reso.) values to correlations between features and crystallographic resolution. Values are given for each stage in the data-processing workflow described in Fig. 2(a). (d) ED scores binned by resolution. Blue and red boxes represent the range of training-data ED scores for water and sulfate, respectively, that fall within one standard deviation of the mean (horizontal bar) of each resolution bin. |