Figure 5
Coverage by X-ray structures of GO annotations in the H. sapiens proteome. (a) Functional coverage of the H. sapiens proteome (number of GO annotations with X-ray structures divided by the number of all available GO annotations in H. sapiens). The colors of the lines correspond to the results for all GO annotation types (all) and for biological processes (P), molecular functions (F) and cellular components (C) annotations. Human proteins with given GO annotations were mapped into modeling families. A given modeling family can be structurally covered if it includes at least one protein with a crystallization propensity above the cutoff value provided on the x axis; the remaining structures in that family can be obtained using homology modeling. The solid lines assume that a given GO annotation is covered when one or more of its annotated modeling families has an obtainable structure. The dashed/dotted lines assume that a given annotation is covered when at least 50%/all of its modeling families are structurally covered. The vertical lines show the cutoff values that correspond to the 25th centile, the median and the 75th centile of the crystallization propensity scores of the clustered proteins from the PDB data set. To assure statistically sound estimates we limited analysis to the annotations with at least 20 modeling families. (b) The current (black line) and the attainable (violet, red and yellow lines) coverage by X-ray structures of the annotated proteins in the complete H. sapiens proteome. The y axis shows the percentage of annotations which have at least x% of their modeling families covered, where the value of x is given on the x axis. Lines labeled as the 25th, 50th and 75th centiles are the coverage by X-ray structures when we assume that a given protein can be solved if its score is higher than the the 25th, the 50th (median) and the 75th centile, respectively, of propensity scores of the clustered structures from the PDB data set. |