view article

Figure 5
Performance of four trained XGBoost models on the noisy synthesized data from the testing set. Twenty sampled SWAXS profiles with low, medium and high error levels are shown in the top row. The subsequent rows show a number of boxed panels containing four histograms of predictions made by the different indicated models: noise-free, noisy, sparsely sampled and densely sampled. The vertical lines represent the real values, extracted from detailed molecular analysis. The transparency of the histograms is coded by the error levels: the higher the error, the more transparent the lines. Generally speaking, all the trained models perform well on noisy data with reasonable error levels (low and medium). As the error levels increase, corresponding to an unphysically low signal-to-noise ratio, outlier values start to appear, and the prediction distribution spreads. However, even under this extreme case, some of the peak values still recapitulate the real ones.

IUCrJ
Volume 7| Part 5| September 2020| Pages 870-880
ISSN: 2052-2525