view article

Figure 4
Coverage versus error plot of different sequence comparison methods: Five different sequence comparison methods are evaluated, each using statistical scores (E or P values) on the PDB40D-B database (Brenner et al., 1998BB9). In this analysis, the best method is SSEARCH, which finds 18% of relationships at 1% errors per query (EPQ). FASTA ktup = 1 and WU-BLAST2 are almost as good. In the coverage versus error plot, the × axis indicates the fraction of all homologs in the database (known from structure) which have been detected, i.e. the number of detected pairs of proteins with the same fold divided by the total number of pairs from a common superfamily. PDB40D contains a total of 4522 homologs, so a score of 10% indicates identification of 452 relationships. The y axis reports the number of EPQ. Because there are 1323 queries made in the PDB40D all-versus-all comparison, 13 errors corresponds to 0.01, or 1% EPQ. The y axis is presented on a log scale to show results over the widely varying degrees of accuracy which may be desired. The graph demonstrates the trade-off between sensitivity and selectivity. As more homologs are found (moving to the right), more errors are made (moving up). The ideal method would be in the lower right corner of the graph, which corresponds to identifying many evolutionary relationships without selecting unrelated proteins. Copyright National Academy of Sciences USA, Brenner et al. (1998BB9) used with permission.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds