Figure 4
Functional coverage (fraction of structural families with given GO annotations that can be solved with X-ray structures) of the considered 4719 GO annotations across the three superkingdoms of life and viruses. Proteins with given GO annotations were mapped into modeling families. (a) Results on assuming that a GO annotation is covered when a given fraction of its structural families is solved. The solid lines assume that a given GO annotation is covered when one or more of its annotated modeling families has an obtainable structure. The dashed/dotted lines assume that a given annotation is covered when at least 50%/all of its modeling families are structurally covered. The vertical lines show the cutoff values that correspond to the 25th centile, the median and the 75th centile of the crystallization propensity scores of the clustered proteins from the PDB data set. (b) Analysis of how many GO annotations in a given superkingdom (y axis) have at least a given fraction of modeling families amenable to structure solustion (x axis). We assume that a given modeling family can be structurally covered if it includes at least one protein with a crystallization propensity above a cutoff value provided on the x axis in (a) or above the median score (0.498) for the PDB structures clustered at 30% sequence identity in (b); the remaining structures in that family can be obtained using homology modeling. To assure statistically sound estimates and to accommodate for the incompleteness of the GO annotations we limited analysis to annotations with at least 20 modeling families. |