view article

Figure 3
Summary of training, validation and testing of five XGBoost models on different structural descriptors. The variances are reported in the last row. The 10-fold CV results report the averaged regression mean-squared error (MSE) or classification accuracy and the standard deviation among 10 folds. Note that we used 750 and 7500 CARTs in the 10-fold CV and training processes, respectively. The shaded models are identified subjectively as poor, based on 10-fold CV results, performance on all the datasets and comparison with other trained models on the same structural descriptor. Overall, the numbers suggest that the XGBoost model is able to learn or recognize the patterns in the training data and generalize for unknown testing data. This characteristic implies the potential to be applied to noisy experimental data and different molecular systems.

Volume 7| Part 5| September 2020| Pages 870-880
ISSN: 2052-2525