view article

Figure 3
Coverage by X-ray structures for the considered 1486 complete proteomes grouped into the three superkingdoms of life and viruses. (a) The current coverage (`PDB coverage' plots) and the attainable coverage by combining X-ray crystallography and homology modeling (the remaining plots) of individual proteomes (shown using points) which are grouped into lines depending on the specific criteria used. The x axis lists all considered proteomes that are sorted based on their coverage by X-­ray structures; colors/markers of points indicate their taxonomic category. The coverage quantifies the fraction of modeling families in a given proteome that currently are or can be structurally solved. A given modeling family can be structurally covered if it includes at least one protein with a crystallization propensity above the median propensity of the clustered proteins from the PDB; the remaining structures in that family can be obtained using homology modeling. The top two lines show the coverage when modeling families are established based on different levels of sequence identity (25 and 30%); 30% corresponds to the current limits of homology modeling. The `50% seq ident' line is used to analyze proteins families that share similar functions. The line labeled `random target selection' shows the coverage by X-ray structures where targets in a given modeling family are selected at random instead of using the chain with the highest crystallization propensity. The two lines labeled `PDB coverage' refer to the actual (current) coverage based on homology modeling (assuming the ability to predict structures at 30 or 50% identity) using existing structures in the PDB as templates. The dotted red line indicates the position of the human proteome. Smaller proteomes (<100 modeling families) were excluded to assure statistically sound estimates of propensities. (b) Changes in the coverage by X-ray structures over time. The four lines labeled with dates refer to the actual coverage based on homology modeling (assuming the ability to predict structures at 30% identity) using structures available in the PDB at a given time as templates. The inset shows the growth of average coverage aggregated for all considered proteins, each superkingdom, viruses and human proteins.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds