Acta Crystallographica Section D

Biological Crystallography

Volume 69, Part 11 (November 2013)

research papers

Acta Cryst. (2013). D69, 2186-2193    [ doi:10.1107/S0907444913027157 ]

An estimated 5% of new protein structures solved today represent a new Pfam family

J. Mistry, E. Kloppmann, B. Rost and M. Punta

Abstract: High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in the Protein Data Bank (PDB), the repository of all publicly available protein structures, continues to increase, with more than 8000 structures released in 2012 alone. The authors of this article have studied how structural coverage of the protein-sequence space has changed over time by monitoring the number of Pfam families that acquired their first representative structure each year from 1976 to 2012. Twenty years ago, for every 100 new PDB entries released, an estimated 20 Pfam families acquired their first structure. By 2012, this decreased to only about five families per 100 structures. The reasons behind the slower pace at which previously uncharacterized families are being structurally covered were investigated. It was found that although more than 50% of current Pfam families are still without a structural representative, this set is enriched in families that are small, functionally uncharacterized or rich in problem features such as intrinsically disordered and transmembrane regions. While these are important constraints, the reasons why it may not yet be time to give up the pursuit of a targeted but more comprehensive structural coverage of the protein-sequence space are discussed.

Keywords: Pfam families; structural coverage; protein-sequence space.

xlsxdisplay filedownload file

Microsoft Excel (XLSX) file (4466.3 kbytes)
[ doi:10.1107/S0907444913027157/ba5211sup1.xlsx ]
Excel spreadsheet with all PDB chains used in this study and their matches to Pfam families. Column 1 contains all PDB chains considered here (format is PDBidCHAINid); column 2 contains all pfam_scan matches for that PDB chain; column 3 contains all PDBfam matches for that PDB chain. The way matches are calculated is described in the Methods section of the paper.


To open or display or play some files, you may need to set your browser up to use the appropriate software. See the full list of file types for an explanation of the different file types and their related mime types and, where available links to sites from where the appropriate software may be obtained.

The download button will force most browsers to prompt for a file name to store the data on your hard disk.

Where possible, images are represented by thumbnails.

 bibliographic record in  format

  Find reference:   Volume   Page   
  Search:     From   to      Advanced search

Copyright © International Union of Crystallography
IUCr Webmaster