Volume 60 Received 12 September 2003 | The architecture of metal coordination groups in proteinsaInstitute of Cell and Molecular Biology, University of Edinburgh, Michael Swann Building, Mayfield Road, Edinburgh EH9 3JR, Scotland A set of tables is presented and a survey given of the architecture of metal coordination groups in a representative set of protein structures from the Protein Data Bank [Bernstein et al. (1977 Keywords: metal coordination groups. |
Metal atoms or ions occur widely in association with proteins and have a variety of functions. In some cases the metal is part of the active site for a catalytic process; in others the metal appears to play a role in maintaining structure. Knowledge and understanding of the architecture of protein molecules (see, for example, Lesk, 2001
) play a key role in understanding their function. In a similar way, knowledge of the architecture of different metal coordination groups within proteins is important in addition to an understanding of the different chemical behaviour of the metals. The Biological Chemistry of the Elements (Frausto da Silva & Williams, 1991
) provides an excellent account of both the chemical behaviour of the different metals and the biological significance of metal coordination groups.
Two metal coordination groups are illustrated in Fig. 1
. The first aspects of interest are the number and nature of the donor groups around the metal atom or ion, the metal-to-donor atom distances and the angles between metal-donor bonds. In very many cases the protein molecule is, in the coordination chemist's terminology, a multidentate ligand, so we are also interested in the number and nature of the amino-acid donor groups from the protein chain, their relative positions in the amino-acid sequence and the size and conformation of the resulting chelate rings. How does the protein-chain conformation adapt to the requirements of the metal coordination? We want to know what generalizations, if any, can be made about these different properties and how far they can be predicted. Vallee & Auld (1990
) commented on the significance of the spacing between donor residues in 12 zinc enzyme structures and suggested how the observed long and short spacings contributed to effectiveness in catalytic function; a wealth of additional data is now available.
| Figure 1 Schematic illustration of coordination groups; see text for definitions. |
This paper presents a set of tables which allow comparisons of donor groups, chelate-loop sizes and conformations in
600 metal coordination groups. They are for the metals Ca, Mg, Mn, Fe, Cu, Zn, Na, K, the most commonly occurring metals in biological chemistry and the commonest in the Protein Data Bank [PDB (Bernstein et al., 1977
); available through the RCSB (Berman et al., 2000
)], which is the primary source of the information used here. For Na and K the borderline between a `coordination compound' and an electrostatic association of ions is certainly debatable, but regardless of the description of the bonding it is useful to describe the geometric situation around these ions as found in protein crystals. This study concentrates on coordination groups where amino-acid side chains or the main-chain carbonyl group provide donor atoms: haem groups, iron-sulfur clusters and chlorophyll derivatives have been excluded (there are some specialized articles about these, e.g. Parisini et al., 1999
; Maher et al., 1999
; Chong et al., 1999
; see also Huber et al., 2001
). In previous articles (Harding, 2000
, 2001
), data on coordination numbers and on metal-to-donor atom distances and angles in proteins have been gathered for these eight metals. For all the coordination groups in the tables presented here, the distances, angles and coordination group shape may be found at http://tanna.bch.ed.ac.uk/ ; for a more extensive range of metal coordination groups the Metalloprotein Database (MDB) is valuable (Castagnetto et al., 2002
).
The definition of a coordination group used here requires the donor atoms to be within specified target distances of the metal atom; this definition is objective, but it is narrow and it excludes the second and third coordination shells around a metal atom that are generally considered to be important in enzyme activity (see, for example, Duda et al., 2003
). Similarly, in building a library of structural motifs of metal coordination sites with catalytic activity, MacArthur & Thornton (2002
) include functional groups substantially further from the metal than a simple bond distance; their motifs, including the three-dimensional coordinates of donor atoms, can be used as templates or probes for a systematic classification of sites.
The results and discussion given here are based entirely on a `representative set' of proteins, a set within which none has more than 30% sequence identity with any other. There are various difficulties in comparing the frequency of occurrence of different coordination groups or donor patterns in proteins using the PDB. The proteins whose structures have been deposited in the PDB are far from a random sample. The use of a `representative set' of proteins is a simple expedient to obtain a fairly diverse sample, but is based on sequence similarity of the whole protein chain, not just the part in the immediate vicinity of and more relevant to the metal coordination group. Furthermore, many protein crystals contain two or more copies of the protein molecule in the crystal asymmetric unit; allowance for this has been made in several different ways at different stages of this project. Even with these allowances, the set of proteins in the PDB is by no means a random selection; small differences in statistics of distributions should not be thought to be significant, only broad trends.
`Target distances' for different types of metal-donor atom bond were based on the distances observed in accurately determined small-molecule crystal structures (Harding, 2001
, 2002
). The metal coordination number is the number of donor atoms within the target distance + 0.75 Å; some of these are normally donor atoms from amino-acid side chains within the protein or the O atoms of main-chain carbonyl groups, but they may also include water-molecule O atoms or atoms from other non-protein small molecules present at the metal site.
Metal coordination groups can be drawn very schematically, as in Fig. 1
. In this work, a donor atom is defined entirely on geometric criteria: it must be within the already established target distance + 0.75 Å of the metal atom. Single-letter amino-acid codes are used to specify the donor groups (of the protein) and O indicates main-chain carbonyl O atom as a donor. We thus describe the coordination group shown in Fig. 1
(a) as CHCC Zn 2 18 3, since Zn is coordinated to the sulfur of cysteine (n) with sequence number n, N of histidine (n + 2), S of cysteine (n + 2 + 18) and S of cysteine (n + 2 + 18 + 3). The total coordination number (CN) is 4. The sequence differences, seqdif, in the three chelate loops are 2, 18 and 3. In the first chelate loop there are seven backbone atoms of the donor residues and the residue between, as well as atoms from both side chains, making a ring of 14 atoms altogether (if N
of histidine coordinates). The relative sequence number of each donor amino acid in the coordination group is given by relseq; in this example there are cysteines at relseq = 0, 20 and 23, and histidine at relseq = 2. nspan is the sequence-number difference between the last and first amino-acid donors, which is the sum of all the seqdifs between them; nspan is 23 in this example. Many metal coordination groups also include water molecules or donor atoms from small non-protein molecules, for example as in Fig. 1
(b). It has also been useful to look at the chelate loops, which are the building blocks of coordination groups, i.e. the adjacent pairs of donors, such as CH 2, HC 18, CC 3 in this example. For full identification of a particular coordination group or a chelate loop, the protein name and the residue number and chain letter of the first amino acid must be given, e.g. the above group occurs in 1a1i at A137 (and another at A165). In comparing the compositions of metal coordination groups it has been necessary to treat the carboxylate group as one donor whether it is monodentate or bidentate, since the distinction between these is unreliable in structures determined at lower resolutions.
The basis for generating the coordination-group tables is the program MP (Harding, 2001
), which reads a PDB file, extracts the coordinates and occupancy of each metal atom and of all atoms within 3.6 Å of the metal atom and summarizes all the coordination information. Lists of PDB codes were obtained using the Jena Image Library search facility (http://www.imb-jena/ImgLibPDB/pages/hetDir/PSE2HET.shtml ) for structures containing each of the metals. From these lists protein and protein-nucleic acid complexes were selected with structures determined by diffraction to a resolution
2.5 Å and the program MP run for all that were available in the RCSB release of July 2001 (except that the July 2002 release was used for potassium proteins in order to augment the very small number of available structures). Additional smaller programs then gave information on coordination group descriptions for the full lists or for selections from them. One such selection is a `representative set' which excludes any structure which has more than 30% sequence identity with any other in the set; the culled PDB files of Dunbrack (2001
) were used to make this selection.
An atom is identified here as a donor when its distance from the metal atom is within target distance + tolerance. The target distances have been carefully established using appropriate small-molecule compounds from the Cambridge Structural Database (CSD, Allen & Kennard, 1993a
,b
) and checking against high-resolution protein structures (Harding, 1999
, 2000
, 2001
; the results of a check using 167 protein structures determined up to April 2003, with resolution of 1.25 Å or better, are given at http://tanna.bch.ed.ac.uk ). Errors in determination of atom positions, especially in low-resolution structures, might result in incorrect decisions on whether or not an atom is within the metal coordination group. For this reason, structures determined at resolutions less than 2.5 Å are not included. The tolerance was set at 0.75 Å after examining the distribution of the differences between observed and target distances. When the resolution is <1.8 Å there should be no `wrong decisions' about whether an atom is within the metal coordination group; when the resolution is poorer, but still <2.5 Å, a few `wrong decisions' will inevitably be made, but their number should be well under 5% of the whole. Less reliable decisions are indicated by a high r.m.s. deviation from target distances and/or additional donor atoms within distances up to target + 0.95 Å. A few metal atoms in the coordination-group tables have coordination numbers lower than would normally be expected (i.e. <5 for Ca, <4 for Mg, Mn, Fe and Zn and <3 for Cu). Usually this is the result of a failure to identify a donor group such as a water molecule in the electron-density map, but in a few cases it could be the result of a shortcoming in the software, which does not (yet) detect when the metal atom is coordinated to a donor group in a neighbouring asymmetric unit of the crystal. Metal coordination groups in which any atoms are disordered or have occupancy less than 0.7 are omitted.
There are frequently two or more identical protein chains within the crystal asymmetric unit. In comparisons of chelate loops these were all included initially and the r.m.s. difference in
and
evaluated over the range relseq = -10 to a relseq of 10 beyond the end of the chelate loop; when the r.m.s. difference in
and
over the range was less than 15°, the redundant chains were eliminated. In a few cases the r.m.s. difference was 20-25°, which probably represents uncertainties in interpretation of maps rather than true differences in conformation. Subsequently, whenever the PDB file included two or more protein chains with equivalent numbering, only the first was used. Even this does not work perfectly. There are a few cases, mostly with resolution in the range 2-2.5 Å, in which different coordination groups are identified for otherwise equivalent chains within the crystal asymmetric unit. (In 1kev for example, we find Zn CHD 22 91 at A353, but Zn CHED 22 1 90 at B353; the distance Zn-O of glutamate in the second is 2.43 Å, rather improbable for monodentate glutamate.)
The coordination-group tables, illustrated by a small selection of Zn coordination groups in Table 1
and given in full as supplementary Table 1D1, were thus assembled. Furthermore, the program which generated the lists could also generate MOLSCRIPT input files, which allowed quick viewing of a coordination group (similar to the examples in Fig. 6).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Local programs were further developed (i) to select from the coordination-group lists particular sequences for comparison, for example all occurrences of a particular chelate loop, and (ii) to extract the requisite atomic positions from the PDB files, calculate and store the torsion angles
,
,
,
1,
2 and assign the
,
angles to categories according to their positions in the Ramachandran plot (see §
4.1 for categories used). In comparisons of chelate loops, additional output included conformation categories from relseq = -10 to a relseq of 10 beyond the end of the chelate loop, aligned amino-acid sequences over the same range (also extracted from the PDB) and protein names and resolution; this output was the basis for the files of Table 4W1 (at http://tanna.bch.ed.ac.uk/arch/ ). Torsion angles could then be compared graphically or analytically, most conveniently by evaluating the r.m.s. difference between
,
in all pairs of protein chains over any selected range in the (aligned) sequences; this allowed chelate loops with the same or similar conformations to be identified quickly. For a set of similar chelate loops, the mean and standard deviation of
and
at each relseq position were then evaluated. Graphical superpositions of selected coordination groups and chelate loops were made with INSIGHTII, but since this is quite slow the preliminary analysis of torsion angles is essential. The versatile CSD program VISTA (Allen & Kennard, 1993a
,b
) was also used in some comparisons.
In the chelate-loop comparisons, fold families were obtained (manually) from SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/ ); in a few cases, the secondary-structure categories in chelate loops were examined [taken manually from PDBSUM (http://www.biochem.ucl.ac.uk/bsm/pdbsum ), where they are established with the program PROMOTIF] and the immediate geometry around the Zn (bond angles, coordination shape, bond length from http://tanna.bch.ed.ac.uk ). These details are in Table 4W (at http://tanna.bch.ed.ac.uk/arch/ ).
Tables for all eight metals are deposited as supplementary Table 1D. Table 1
illustrates some of the data stored for a small selection of Zn coordination groups; not shown here but also stored in these files are (i) the increase in coordination number corresponding to an increase in coordination sphere radius of 0.2 Å, (ii) water molecules and other non-protein donors in the coordination group, (iii) the EC enzyme number when it is given in the PDB, (iv) part of the header name from the PDB file, (v) the names in the PDB file of the metal and the first donor atom and (vi) the sequence of residue conformations in each of the chelate loops (in full when the loops contain up to five residues; abbreviated for larger loops). The tables can be downloaded and searched for particular coordination groups or other features and sorted or otherwise manipulated in, for example, Microsoft EXCEL. For each coordination group the metal-donor atom distances and bond angles can be found at http://tanna.bch.ed.ac.uk (or at http://metallo.scripps.edu/ ).
There is much diversity in the coordination groups and different metals have very different characteristics. The preferences of different metals for different amino-acid donors are shown in Table 2
(a) and 2
(b). Oxygen donors (carboxylate, amide, water etc.) are almost never found in the same coordination group as cysteine, although either may occur alongside histidine. Tables 2
(c), 2
(d) and 2
(e) summarize, for Ca and Zn coordination groups, metal coordination numbers and chelate-loop sizes and Table 2
(f) lists the most commonly occurring chelate loops for each; fuller details are deposited for these and all the other metals (supplementary Table 2D).
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In Ca proteins the EF-hand (see Pidcock & Moore, 2001
; Nelson & Chazin, 1998
; see also http://structbio.vanderbilt.edu/cabp_database/ ) is a very dominant structural motif, with 27 examples of the coordination group DDDOE 2225 or its close relatives in this set of representative proteins, and for Zn the pattern CCCC 3 n 3 with n = 10-20 is common in zinc fingers and related proteins. Apart from these, identical coordination groups (same donors, same residue separation) do not often recur in these tables; when they do the proteins usually have related folds and functions, but even then the conformations may differ, especially in the larger chelate loops. Fig. 2
shows an example. A detailed study was made of all the recurring Ca and Zn coordination groups, their conformations, amino-acid sequences etc. and these are available in Table 3W (at http://tanna.bch.ed.ac.uk/arch/ ).
| Figure 2 The coordination group Zn CCCC 3 3 8 showing its conformation in 1het_A 97 (green) and in 1e3j_A 96 (blue). The coordinating cysteines are labelled C96 in the blue chain, C97 in the green chain etc. 1het is an alcohol dehydrogenase; 1e3j is a ketose reductase. The backbone atoms of residues of the first two chelate loops, CCC 3 3, have been superposed using the program LSQKAB from the CCP4 suite (Collaborative Computational Project, Number 4, 1994 ); their conformations are the same, r.m.s. displacement 0.23 Å, whereas there are marked differences in the larger chelate loop, CC 8. This figure was prepared using MOLSCRIPT (Kraulis, 1991 ) and RASTER3D (Merritt & Murphy, 1994 ). |
While whole coordination groups are not often repeated here, except for the Ca EF-hands, some chelate loops that are their components occur frequently in different unrelated proteins, although other chelate loops are found only once or a few times. Small chelate loops, particularly seqdif = 2, are very common for calcium, whereas for zinc seqdif = 3 and larger loops are much more common. In coordination groups with only two protein donors these donors are rarely more than ten residues apart, which is understandable on simple stability grounds. When there are three or more protein donor groups it is common for there to be at least one large loop. Large chelate loops will usually serve the function of holding two parts of the polypeptide chain close to each other; this may be at the active site or simply to provide stability for the whole structure. It is common for long and short chelate loops to alternate in the protein-chain sequence and uncommon for a long loop to follow another long loop; a short loop following another short loop is uncommon in zinc coordination groups, but common in calcium groups.
Within the chelate loops the nature of the amino acids which are not donors is very varied, even in small loops with the same conformation, but glycine plays an important part in many. The average glycine content over all proteins is 6.9% (evaluated using http://www.expasy.org/tools/pscale/A.A.SWISS-PROT.html for the whole SWISS-PROT database). For all the coordination groups studied there is a 10-15% probability that the amino acid following a donor, i.e. at relseq = +1, is glycine and there is a similar probability for the amino acid preceding a donor; in each position the probability is about twice that in a random sequence. In calcium coordination groups the probability is even higher than in complexes of other metals, rising to 18% in calcium coordination groups with small loops (seqdif = 1-3). High coordination numbers and/or small chelate loops lead to the greatest steric congestion; this should account for the higher frequency of glycine in positions adjacent to donors.
Residues containing donor atoms or adjacent to donor atoms have been examined to see whether any particular conformations are favoured in metal coordination. Conformations have been assigned to categories which are regions of a Ramachandran plot (a) following Hovmöller et al. (2002
) and (b) in a way related to proposals of Efimov (1993
), as shown in Fig. 3
. The Efimov-type conformations are given in supplementary Table 1D for the residues in each coordination group. Their distributions are shown in Table 3
.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||