Buccaneer model building with neural network fragment selection

A neural network trained to identify unfavourable fragments and therefore improve protein model building in the Buccaneer software is described.

1 S1. The decision tree training S1.1 The training data sets We run Buccaneer using two different s eeds, 1 0 a nd 2 0, w ith t he d efault parameters as set by the Buccaneer developers. The experiment ran on a 173-node high-performance cluster with 7024 Intel Xeon Gold/Platinum cores and a total memory of 42 TB. Buccaneer built 1187 protein structures using each seed. The Buccaneer's indicators and R-work and R-free were obtained for each model for the two seeds. We compared each model evaluation indicator and replaced the actual value with the label 'N' or 'Y' when the model built using seed 10 was better. We deemed the protein model built using seed ten is better when the model's completeness is at least %5 higher than when the model built using seed 20 and labelled either 'N' or 'Y' ( Figure S1).
(a) Protein models built using seed 10 (b) Protein models built using seed 20

Labelled training features
Fig. S1. The protein models evaluation indicators were built using the two seeds. (a) The protein models were built using the seed 10. (b) The protein models were built using seed 20. (c) The labelled training features and the predicted label where each evaluation indicator is replaced by either 'N' or 'Y' based on the difference between the same evaluation indicator when the model was built using seeds 10 and 20 with a difference is that the improvement should be at least 5% to be labelled 'Y'.

2
S2. The performance of the neural networks with and without the features that contributed less than 0.01 in the model performance.  S2. Comparison of structure completeness, R-work, R-free and structure correlation between Buccaneer and the Buccaneer with neural network (Buccaneer(NN)) variants using ten thresholds and the Freedman-Diaconis rule, for the JCSG experimental phasing data sets with original and truncated resolutions. The regions where Buccaneer(NN) is better than Buccaneer (either below or above the diagonal) are indicated in the diagrams. The inset boxplot depicts the difference in the four evaluation indicators achieved by Buccaneer (NN) and Buccaneer.

S4.
Comparison of structure completeness, R-work, R-free and structure correlation between Buccaneer and Buccaneer with neural network for the recently deposited experimental phasing data sets Freedman-Diaconis rule Completeness R-free R-work Structure correlation Fig. S3. Comparison of structure completeness, R-work, R-free and structure correlation between Buccaneer and Buccaneer with neural network (Buccaneer(NN)) using ten thresholds and Freedman-Diaconis rule for the recently deposited experimental phasing data sets. The results where Buccaneer(NN) is better than Buccaneer either below or above the diagonal is indicated in the figures. The inset boxplot depicts the difference in the four evaluation indicators achieved by Buccaneer(NN) and Buccaneer.

S5. Comparison of structure completeness, R-work, R-free and structure correlation between Buccaneer and Buccaneer with neural network for the MR data sets Freedman-Diaconis rule
Completeness R-free R-work Structure correlation Fig. S4. Comparison of structure completeness, R-work, R-free and structure correlation between Buccaneer and Buccaneer with neural network (Buccaneer(NN)) using ten thresholds and Freedman-Diaconis rule for the MR data sets. The results where Buccaneer(NN) is better than Buccaneer either below or above the diagonal is indicated in the figures. The inset boxplot depicts the difference in the four evaluation indicators achieved by Buccaneer(NN) and Buccaneer.

S6.1 Data sets
The MR models were downloaded from PDB-REDO same as in PDB data sets. PDBSET was used to extract the target chain from the PDB-REDO model. GESAMT was used with the PDB model as a reference and PDB-REDO as the moving model rather than repeating the molecular replacement. REFMAC was run for ten cycles to refine the PDB-REDO models. This provides a clear comparison of the impact of using the PDB-REDO structures on model building by eliminating any differences due to changes in the MR results.  S5. Comparison of structure completeness, R-work and R-free between Buccaneer and Buccaneer with neural network (Buccaneer(NN)) using ten thresholds and MR model from PDB and PDB-REDO model. The results where Buccaneer(NN) is better than Buccaneer either below or above the diagonal is indicated in the figures. The inset boxplot depicts the difference in the four evaluation indicators achieved by Buccaneer(NN) and Buccaneer.