findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM

Chojnowski, G.; Simpkin, A.J.; Leonardo, D.A.; Seifert-Davila, W.; Vivas-Ruiz, D.E.; Keegan, R.M.; Rigden, D.J.

doi:10.1107/S2052252521011088

Figure 4
Sequence-identification and assignment benchmarks for EM models. (a) Identity of a sequence identified for models built de novo using ARP/wARP as a function of HMMsearch best-single-domain sequence-alignment score. (b) Identity of a sequence assigned to continuous fragments of deposited EM models as a function of the sequence-assignment score (p value) for protein-fragment lengths of 10, 50 and 100 residues selected at random from test-set models. The continuous curves on the plots are logistic regression estimates of a probability that an identified sequence will have at least 80% sequence identity to the reference model. The orange circles represent three reference chains with register error that were not used for the logistic regression calculations.

IUCrJ

Volume 9| Part 1| January 2022| Pages 86-97

ISSN: 2052-2525

https://doi.org/10.1107/S2052252521011088

CRYO | EM

Open

access

Follow IUCrJ

Search IUCr Journals		doi		Advanced search
Author		volume	page