SCOP, Structural Classification of Proteins Database: Applications to Evaluation of the Effectiveness of Sequence Alignment Methods and Statistics of Protein Structural Data

Hubbard, T.J.P.; Ailey, B.; Brenner, S.E.; Murzin, A.G.; Chothia, C.

doi:10.1107/S0907444998009172

Figure 3
Entries are shown for PDB files 1DAN and 1CFI in the SCOP FASTA format files for (a) PDB chains and (b) SCOP domains and (c) in the SCOP domain definition flat file. The format of (a) and (b) is: >scopid scopcode [,scopcode] (region) Description SEQUENCE. scopid is six characters for chains (cXXXXY) and seven characters for domains (dXXXXYZ), where the prefix c or d indicates chain or domain; XXXX is the PDB code; Y is the PDB chain and Z is an arbitrary number indicating the domain (i.e., the first part of the sequence is not necessarily labelled dXXXXY1). For entries with an unlabelled chain, `_' is used for Y. For domains composed of multiple chains Y becomes `.' and the chain information is embedded in the region element. For entries with only a single domain, `_' is used for Z. scopcode is a domain classification identifier and is of the format a.b.c.d.e.f where a is class; b is fold; c is superfamily; d is family; e is species and f is protein. Thus, entries with a.b.c in common are from the same superfamily etc. If the scopid is for a PDB chain which contains more than one type of domain then a series of scopcodes are listed separated by `,'. Note that scopcodes change with each release of SCOP, where as scopids change only if the domain organization of that PDB entry is revised. region is found only in entries where scopid is a domain which is part of a PDB chain and specifies a range with respect to the sequence in the corresponding scopid chain entry. This does not necessarily correspond to the range of residue numbers in the PDB entry. Description is a description of the entry, in the case of chains extracted directly from the PDB header and in the case of domains extracted from SCOP. The format of (c) is similar: scopid<TAB>pdbid<TAB>pdbregion<TAB>fullscopcode. Differences are: scopid is always a domain code pdbid is the PDB id (XXXX from scopid). pdbregion is similar to region but is of format chain:start-end where start and end are PDB residue numbers (from ATOM records) and do not relate to the index of the corresponding sequence in the FASTA format file. fullscopcode is equivalent to scopcode expect for the leading zeros and the initial number (which is currently unused). These values map to the corresponding page in scop for the domain of that line, such that the page for d1cfi_ is http://scop.mrc-lmb.cam.ac.uk/scop/data/1.007.024.001.001.003.html in this release of SCOP. However, these page numbers (and the associated scopcodes) change with each release. The correct way to refer to d1cfi_ is: http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?sid = d1cfi_. 1CFI (bottom) is an example of the simplest type of entry: it has a single chain (unlabelled) which is also a single domain. 1DAN is one of the most complex examples. It has four chains, T, U, H and L. H is a single-chain domain (d1danh_). L is a chain which contains three domains (d1danl1, d1danl2, d1danl3). There are two more domains: one is the second part of chain U (d1danu1); the other is composed of all of chain T and first part of chain U (d1dan.1). Note that the sequence of this last domain is composed of fragments from two chains concatenated with a lower case `x'. The same is performed where a domain is composed of two parts of the same chain, interupted by an insertion domain. Note also the differences between the region (in b) and pdbregion (in c) records, which show how different sequence indices and PDB residue numbers can be.

STRUCTURAL
BIOLOGY

ISSN: 2059-7983

Volume 54| Part 6| November 1998| Pages 1147-1154

doi:10.1107/S0907444998009172

Search IUCr Journals		doi		Advanced search
Author		volume	page