[Journal logo]

Volume 60 
Part 5 
Pages 849-859  
May 2004  

Received 12 September 2003
Accepted 20 February 2004

The architecture of metal coordination groups in proteins

Marjorie M. Hardinga*

aInstitute of Cell and Molecular Biology, University of Edinburgh, Michael Swann Building, Mayfield Road, Edinburgh EH9 3JR, Scotland
Correspondence e-mail: marjorie.harding@ed.ac.uk

A set of tables is presented and a survey given of the architecture of metal coordination groups in a representative set of protein structures from the Protein Data Bank [Bernstein et al. (1977[Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535-542.]), J. Mol. Biol. 112, 535-542; Berman et al. (2000[Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.]), Nucleic Acids Res. 28, 235-242]. The structures have been determined to a resolution of 2.5  Å or better; the metals considered are Ca, Mg, Mn, Fe, Cu, Zn, Na and K, with particular emphasis on Ca and Zn and the exclusion of haem groups and Fe/S clusters; the proteins are a representative set in which none has more than 30% sequence identity with any other. In them the metal is coordinated by several donor groups from different amino-acid residues in the protein chain and often also by water or other small molecules. The tables, for ~600 metal coordination groups, include information on the conformations of the protein chain in the region around the metal and reliability indicators. They illustrate the wide variety of coordination numbers, chelate-loop sizes and other properties and the different characteristics of different metals. They show that glycine has a particular significance in the position adjacent to a donor residue, especially in Ca coordination groups. They also show that metal coordination does not appear to lead to significant distortions of the torsion angles [varphi], [psi] from their normally allowed values. Very few metal coordination groups occur more than once in the representative set and when they do they are usually related in fold and function; they have similar but not necessarily identical conformations. However, individual chelate loops, for example Zn(-C-X-X'-C-), in which both cysteines are coordinated to Zn through S, and X and X' are any amino acids, are repeated frequently in many different and unrelated proteins. Not all chelate loops with the same composition have the same conformation, but for smaller loops there are usually one or two strongly preferred and well defined conformations. Quite frequently more than one metal coordination group is associated with one protein chain; these proteins are identified.

Keywords: metal coordination groups.

1. Introduction

Metal atoms or ions occur widely in association with proteins and have a variety of functions. In some cases the metal is part of the active site for a catalytic process; in others the metal appears to play a role in maintaining structure. Knowledge and understanding of the architecture of protein molecules (see, for example, Lesk, 2001[Lesk, A. M. (2001). Introduction to Protein Architecture, p. 21. Oxford University Press.]) play a key role in understanding their function. In a similar way, knowledge of the architecture of different metal coordination groups within proteins is important in addition to an understanding of the different chemical behaviour of the metals. The Biological Chemistry of the Elements (Frausto da Silva & Williams, 1991[Frausto da Silva, J. J. R. & Williams, R. J. P. (1991). The Biological Chemistry of the Elements. Oxford: Clarendon Press.]) provides an excellent account of both the chemical behaviour of the different metals and the biological significance of metal coordination groups.

Two metal coordination groups are illustrated in Fig. 1[link]. The first aspects of interest are the number and nature of the donor groups around the metal atom or ion, the metal-to-donor atom distances and the angles between metal-donor bonds. In very many cases the protein molecule is, in the coordination chemist's terminology, a multidentate ligand, so we are also interested in the number and nature of the amino-acid donor groups from the protein chain, their relative positions in the amino-acid sequence and the size and conformation of the resulting chelate rings. How does the protein-chain conformation adapt to the requirements of the metal coordination? We want to know what generalizations, if any, can be made about these different properties and how far they can be predicted. Vallee & Auld (1990[Vallee, B. L. & Auld, D. S. (1990). Proc. Natl Acad. Sci. USA, 87, 220-224.]) commented on the significance of the spacing between donor residues in 12 zinc enzyme structures and suggested how the observed long and short spacings contributed to effectiveness in catalytic function; a wealth of additional data is now available.

[Figure 1]
Figure 1
Schematic illustration of coordination groups; see text for definitions.

This paper presents a set of tables which allow comparisons of donor groups, chelate-loop sizes and conformations in ~600 metal coordination groups. They are for the metals Ca, Mg, Mn, Fe, Cu, Zn, Na, K, the most commonly occurring metals in biological chemistry and the commonest in the Protein Data Bank [PDB (Bernstein et al., 1977[Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535-542.]); available through the RCSB (Berman et al., 2000[Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.])], which is the primary source of the information used here. For Na and K the borderline between a `coordination compound' and an electrostatic association of ions is certainly debatable, but regardless of the description of the bonding it is useful to describe the geometric situation around these ions as found in protein crystals. This study concentrates on coordination groups where amino-acid side chains or the main-chain carbonyl group provide donor atoms: haem groups, iron-sulfur clusters and chlorophyll derivatives have been excluded (there are some specialized articles about these, e.g. Parisini et al., 1999[Parisini, E., Capozzi, F., Lubini, P., Lamzin, V., Luchinat, C. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1773-1784.]; Maher et al., 1999[Maher, M. J., Xiao, Z., Wilce, M. C. J., Guss, J. M. & Wedd, A. G. (1999). Acta Cryst. D55, 962-968.]; Chong et al., 1999[Chong, K. T., Miyazaki, G., Morimoto, H., Oda, Y. & Park, S.-Y. (1999). Acta Cryst. D55, 1291-1300.]; see also Huber et al., 2001[Huber, R., Wieghardt, K., Ponlos, T. & Messerschmidt, A. (2001). Editors. Handbook of Metalloproteins. Chichester: Wiley.]). In previous articles (Harding, 2000[Harding, M. M. (2000). Acta Cryst. D56, 857-867.], 2001[Harding, M. M. (2001). Acta Cryst. D57, 401-411.]), data on coordination numbers and on metal-to-donor atom distances and angles in proteins have been gathered for these eight metals. For all the coordination groups in the tables presented here, the distances, angles and coordination group shape may be found at http://tanna.bch.ed.ac.uk/ ; for a more extensive range of metal coordination groups the Metalloprotein Database (MDB) is valuable (Castagnetto et al., 2002[Castagnetto, J. M., Hennessy, S. W., Roberts, V. A., Getzoff, E. D., Tainer, J. A. & Pique, M. E. (2002). Nucleic Acids Res. 30, 379-382.]).

The definition of a coordination group used here requires the donor atoms to be within specified target distances of the metal atom; this definition is objective, but it is narrow and it excludes the second and third coordination shells around a metal atom that are generally considered to be important in enzyme activity (see, for example, Duda et al., 2003[Duda, D., Govindasamy, L., Agbandje-McKenna, M., Tu, C., Silverman, D. N. & McKenna, R. (2003). Acta Cryst. D59, 93-104.]). Similarly, in building a library of structural motifs of metal coordination sites with catalytic activity, MacArthur & Thornton (2002[MacArthur, M. W. & Thornton, J. M. (2002). Private communication.]) include functional groups substantially further from the metal than a simple bond distance; their motifs, including the three-dimensional coordinates of donor atoms, can be used as templates or probes for a systematic classification of sites.

The results and discussion given here are based entirely on a `representative set' of proteins, a set within which none has more than 30% sequence identity with any other. There are various difficulties in comparing the frequency of occurrence of different coordination groups or donor patterns in proteins using the PDB. The proteins whose structures have been deposited in the PDB are far from a random sample. The use of a `representative set' of proteins is a simple expedient to obtain a fairly diverse sample, but is based on sequence similarity of the whole protein chain, not just the part in the immediate vicinity of and more relevant to the metal coordination group. Furthermore, many protein crystals contain two or more copies of the protein molecule in the crystal asymmetric unit; allowance for this has been made in several different ways at different stages of this project. Even with these allowances, the set of proteins in the PDB is by no means a random selection; small differences in statistics of distributions should not be thought to be significant, only broad trends.

2. Some definitions

`Target distances' for different types of metal-donor atom bond were based on the distances observed in accurately determined small-molecule crystal structures (Harding, 2001[Harding, M. M. (2001). Acta Cryst. D57, 401-411.], 2002[Harding, M. M. (2002). Acta Cryst. D58, 872-874.]). The metal coordination number is the number of donor atoms within the target distance + 0.75  Å; some of these are normally donor atoms from amino-acid side chains within the protein or the O atoms of main-chain carbonyl groups, but they may also include water-molecule O atoms or atoms from other non-protein small molecules present at the metal site.

Metal coordination groups can be drawn very schematically, as in Fig. 1[link]. In this work, a donor atom is defined entirely on geometric criteria: it must be within the already established target distance + 0.75  Å of the metal atom. Single-letter amino-acid codes are used to specify the donor groups (of the protein) and O indicates main-chain carbonyl O atom as a donor. We thus describe the coordination group shown in Fig. 1[link](a) as CHCC Zn 2 18 3, since Zn is coordinated to the sulfur of cysteine (n) with sequence number n, N of histidine (n + 2), S of cysteine (n + 2 + 18) and S of cysteine (n + 2 + 18 + 3). The total coordination number (CN) is 4. The sequence differences, seqdif, in the three chelate loops are 2, 18 and 3. In the first chelate loop there are seven backbone atoms of the donor residues and the residue between, as well as atoms from both side chains, making a ring of 14 atoms altogether (if N[epsilon] of histidine coordinates). The relative sequence number of each donor amino acid in the coordination group is given by relseq; in this example there are cysteines at relseq = 0, 20 and 23, and histidine at relseq = 2. nspan is the sequence-number difference between the last and first amino-acid donors, which is the sum of all the seqdifs between them; nspan is 23 in this example. Many metal coordination groups also include water molecules or donor atoms from small non-protein molecules, for example as in Fig. 1[link](b). It has also been useful to look at the chelate loops, which are the building blocks of coordination groups, i.e. the adjacent pairs of donors, such as CH 2, HC 18, CC 3 in this example. For full identification of a particular coordination group or a chelate loop, the protein name and the residue number and chain letter of the first amino acid must be given, e.g. the above group occurs in 1a1i at A137 (and another at A165). In comparing the compositions of metal coordination groups it has been necessary to treat the carboxylate group as one donor whether it is monodentate or bidentate, since the distinction between these is unreliable in structures determined at lower resolutions.

3. Methods and procedures

The basis for generating the coordination-group tables is the program MP (Harding, 2001[Harding, M. M. (2001). Acta Cryst. D57, 401-411.]), which reads a PDB file, extracts the coordinates and occupancy of each metal atom and of all atoms within 3.6  Å of the metal atom and summarizes all the coordination information. Lists of PDB codes were obtained using the Jena Image Library search facility (http://www.imb-jena/ImgLibPDB/pages/hetDir/PSE2HET.shtml ) for structures containing each of the metals. From these lists protein and protein-nucleic acid complexes were selected with structures determined by diffraction to a resolution <= 2.5  Å and the program MP run for all that were available in the RCSB release of July 2001 (except that the July 2002 release was used for potassium proteins in order to augment the very small number of available structures). Additional smaller programs then gave information on coordination group descriptions for the full lists or for selections from them. One such selection is a `representative set' which excludes any structure which has more than 30% sequence identity with any other in the set; the culled PDB files of Dunbrack (2001[Dunbrack, R. (2001). Culling the PDB by Resolution and Sequence Identity, http://www.fccc.edu/research/labs/dunbrack/culledpdb.html .]) were used to make this selection.

3.1. Concerning coordination-group definition

An atom is identified here as a donor when its distance from the metal atom is within target distance + tolerance. The target distances have been carefully established using appropriate small-molecule compounds from the Cambridge Structural Database (CSD, Allen & Kennard, 1993a[Allen, F. H. & Kennard, O (1993a). Chem. Des. Autom. News, 8, 1.],b[Allen, F. H. & Kennard, O (1993b). Chem. Des. Autom. News, 8, 31-37.]) and checking against high-resolution protein structures (Harding, 1999[Harding, M. M. (1999). Acta Cryst. D55, 1432-1443.], 2000[Harding, M. M. (2000). Acta Cryst. D56, 857-867.], 2001[Harding, M. M. (2001). Acta Cryst. D57, 401-411.]; the results of a check using 167 protein structures determined up to April 2003, with resolution of 1.25  Å or better, are given at http://tanna.bch.ed.ac.uk ). Errors in determination of atom positions, especially in low-resolution structures, might result in incorrect decisions on whether or not an atom is within the metal coordination group. For this reason, structures determined at resolutions less than 2.5  Å are not included. The tolerance was set at 0.75  Å after examining the distribution of the differences between observed and target distances. When the resolution is <1.8  Å there should be no `wrong decisions' about whether an atom is within the metal coordination group; when the resolution is poorer, but still <2.5  Å, a few `wrong decisions' will inevitably be made, but their number should be well under 5% of the whole. Less reliable decisions are indicated by a high r.m.s. deviation from target distances and/or additional donor atoms within distances up to target + 0.95  Å. A few metal atoms in the coordination-group tables have coordination numbers lower than would normally be expected (i.e. <5 for Ca, <4 for Mg, Mn, Fe and Zn and <3 for Cu). Usually this is the result of a failure to identify a donor group such as a water molecule in the electron-density map, but in a few cases it could be the result of a shortcoming in the software, which does not (yet) detect when the metal atom is coordinated to a donor group in a neighbouring asymmetric unit of the crystal. Metal coordination groups in which any atoms are disordered or have occupancy less than 0.7 are omitted.

3.2. Redundant protein chains

There are frequently two or more identical protein chains within the crystal asymmetric unit. In comparisons of chelate loops these were all included initially and the r.m.s. difference in [varphi] and [psi] evaluated over the range relseq = -10 to a relseq of 10 beyond the end of the chelate loop; when the r.m.s. difference in [varphi] and [psi] over the range was less than 15°, the redundant chains were eliminated. In a few cases the r.m.s. difference was 20-25°, which probably represents uncertainties in interpretation of maps rather than true differences in conformation. Subsequently, whenever the PDB file included two or more protein chains with equivalent numbering, only the first was used. Even this does not work perfectly. There are a few cases, mostly with resolution in the range 2-2.5  Å, in which different coordination groups are identified for otherwise equivalent chains within the crystal asymmetric unit. (In 1kev for example, we find Zn CHD 22 91 at A353, but Zn CHED 22 1 90 at B353; the distance Zn-O of glutamate in the second is 2.43  Å, rather improbable for monodentate glutamate.)

3.3. Coordination group tables and comparisons of composition and conformation

The coordination-group tables, illustrated by a small selection of Zn coordination groups in Table 1[link] and given in full as supplementary Table 1D1, were thus assembled. Furthermore, the program which generated the lists could also generate MOLSCRIPT input files, which allowed quick viewing of a coordination group (similar to the examples in Fig. 6).

Table 1
A small part of the deposited Table 1D for Zn coordination groups illustrating some of the information stored

Table 1D, with the complete tables for eight metals, has been deposited as supplementary material (and is also available at http://tanna.bch.ed.ac.uk/arch/ ). np is the number of donors from the protein chain; nw is the number of water molecules; nn is the number of non-protein donor groups; dons are the amino-acid donor groups in the order in which they occur in the polypeptide chain, using the normal single-letter codes for amino acids and O for the main-chain carbonyl O atom; sd1 to sd7 are the seqdifs (-99 signifies that the donors are from two different polypeptide chains, -1 is given when the second donor is water or another non-amino-acid donor); his indicates whether histidine coordination is by ND or NE; cn is the total number of donor groups, including water molecules and small-molecule ligands, always treating carboxylate as one group (the coordination number, as it would be defined by a chemist, is then number of donor groups + number of bidentate carboxylate groups); r.m.s. is the r.m.s. deviation of metal-to-donor atom distances within the coordination sphere from target distances, which is a useful indicator of quality (0 is good, 0.5 is poor); res is the resolution (Å) of the structure determination; carbi indicates bidentate carboxylate groups, e.g. ..b. indicates that the third of four donor groups appears to be a bidentate carboxylate. (For additional information stored, see §[link]4.)

cngpname nspan np nw nn dons met sd1 sd2 sd3 sd4 his cn r.m.s. res carbi
1dsz_A 1135 20 4     CCCC Zn 3 14 3 -1 .... 4 0.1 1.7 ....
1dcq_A 264 23 4     CCCC Zn 3 17 3 -1 .... 4 0.1 2.1 ....
1ee8_A 238 23 4     CCCC Zn 3 17 3 -1 .... 4 0.1 1.9 ....
1a8h_ 127 20 4     CCCH Zn 3 14 3 -1 ...d 4 0.2 2.0 ....
1vfy_A 176 27 4     CCCH Zn 3 21 3 -1 ...d 4 0.1 1.1 ....
1ah7_ 55 67 4 1   DHHD Zn 14 49 4 -1 .de. 5 0.2 1.5 ....
1hxr_A 23 74 4     CCCC Zn 3 68 3 -1 .... 4 0.1 1.6 ....
1psz_A 67 213 4     HHED Zn 72 66 75 -1 ee.. 4 0.3 2.0 ..b.
1vhh_ 141 42 3 1   HDH Zn 7 35 -1 -1 e.d 4 0.1 1.7 ...
1lbu_ 154 43 3 1   HDH Zn 7 36 -1 -1 e.d 4 0.2 1.8 ...
1amp_ 117 139 3 1   DEH Zn 35 104 -1 -1 ..e 4 0.3 1.8 .b.
1cg2_A 141 244 3 1   DEH Zn 35 209 -1 -1 ..e 4 0.2 2.5 .b.
1hzy_A 201 29 2 2 1 HH Zn 29 -1 -1 -1 de 5 0.2 1.3  

Local programs were further developed (i) to select from the coordination-group lists particular sequences for comparison, for example all occurrences of a particular chelate loop, and (ii) to extract the requisite atomic positions from the PDB files, calculate and store the torsion angles [varphi], [psi], [omega], [chi]1, [chi]2 and assign the [varphi], [psi] angles to categories according to their positions in the Ramachandran plot (see §[link]4.1 for categories used). In comparisons of chelate loops, additional output included conformation categories from relseq = -10 to a relseq of 10 beyond the end of the chelate loop, aligned amino-acid sequences over the same range (also extracted from the PDB) and protein names and resolution; this output was the basis for the files of Table 4W1 (at http://tanna.bch.ed.ac.uk/arch/ ). Torsion angles could then be compared graphically or analytically, most conveniently by evaluating the r.m.s. difference between [varphi], [psi] in all pairs of protein chains over any selected range in the (aligned) sequences; this allowed chelate loops with the same or similar conformations to be identified quickly. For a set of similar chelate loops, the mean and standard deviation of [varphi] and [psi] at each relseq position were then evaluated. Graphical superpositions of selected coordination groups and chelate loops were made with INSIGHTII, but since this is quite slow the preliminary analysis of torsion angles is essential. The versatile CSD program VISTA (Allen & Kennard, 1993a[Allen, F. H. & Kennard, O (1993a). Chem. Des. Autom. News, 8, 1.],b[Allen, F. H. & Kennard, O (1993b). Chem. Des. Autom. News, 8, 31-37.]) was also used in some comparisons.

In the chelate-loop comparisons, fold families were obtained (manually) from SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/ ); in a few cases, the secondary-structure categories in chelate loops were examined [taken manually from PDBSUM (http://www.biochem.ucl.ac.uk/bsm/pdbsum ), where they are established with the program PROMOTIF] and the immediate geometry around the Zn (bond angles, coordination shape, bond length from http://tanna.bch.ed.ac.uk ). These details are in Table 4W (at http://tanna.bch.ed.ac.uk/arch/ ).

4. Results and discussion

Tables for all eight metals are deposited as supplementary Table 1D. Table 1[link] illustrates some of the data stored for a small selection of Zn coordination groups; not shown here but also stored in these files are (i) the increase in coordination number corresponding to an increase in coordination sphere radius of 0.2  Å, (ii) water molecules and other non-protein donors in the coordination group, (iii) the EC enzyme number when it is given in the PDB, (iv) part of the header name from the PDB file, (v) the names in the PDB file of the metal and the first donor atom and (vi) the sequence of residue conformations in each of the chelate loops (in full when the loops contain up to five residues; abbreviated for larger loops). The tables can be downloaded and searched for particular coordination groups or other features and sorted or otherwise manipulated in, for example, Microsoft EXCEL. For each coordination group the metal-donor atom distances and bond angles can be found at http://tanna.bch.ed.ac.uk (or at http://metallo.scripps.edu/ ).

There is much diversity in the coordination groups and different metals have very different characteristics. The preferences of different metals for different amino-acid donors are shown in Table 2[link](a) and 2[link](b). Oxygen donors (carboxylate, amide, water etc.) are almost never found in the same coordination group as cysteine, although either may occur alongside histidine. Tables 2[link](c), 2[link](d) and 2[link](e) summarize, for Ca and Zn coordination groups, metal coordination numbers and chelate-loop sizes and Table 2[link](f) lists the most commonly occurring chelate loops for each; fuller details are deposited for these and all the other metals (supplementary Table 2D).

Table 2
Constitution of metal coordination groups in the representative set of proteins

(a) Numbers of occurrences of different kinds of donor groups (from amino-acid side chains) in metal coordination groups with two or more protein donors. M.ch. O stands for main-chain carbonyl O atom is a donor.

  D, N E, Q S, T H C M K, R Y M.ch. O All
Ca 339 127 34 3 - - - 1 309 813
Mg 88 42 38 3 - - 1 2 54 228
Mn 51 30 3 22 1 - - - 6 113
Fe 12 30 - 60 18 3 - 5 7 135
Cu 2 3 3 77 26 10 - 1 4 126
Zn 63 50 1 179 206 1 3 - 10 517
Na 22 12 6 - - - 1 - 93 135
K 16 17 18 - - - - - 79 130

(b) Amino-acid types (%) for main-chain carbonyl oxygen donors, all metals combined, using categories based on those of Lesk (2001).

Glycine (G) 13
Other small amino-acids (A, S, T) 14
Medium and large hydrophobic amino acids 37
Acidic (D, E) 12
Basic (K, R) 14
Polar (N, Q, H) 11

(c) Coordination numbers of Ca and Zn in coordination groups with two or more protein donors.

Coordination No. 2 3 4 5 6 7 >8 All
No. Ca coordination groups 2 6 13 36 110 22 1 190
No. Zn coordination groups 7 19 89 31 3 - - 149

(d) Numbers of protein donor groups interacting with Ca and Zn.

No. protein donor groups 1 2 3 4 5 6 7 All
No. Ca coordination groups 27 29 26 45 61 27 2 228
No. Zn coordination groups 33 21 51 76 - - - 184

(e) Distribution of chelate-loop sizes for Ca and Zn coordination groups.

seqdif   1 2 3 4 5 6-10 11-19 20-29 30-49 50-99 100-199 200-499 All
Ca 31 56 237 68 14 38 16 29 28 48 22 16 3 606
Zn 9 9 37 69 29 13 26 38 30 31 40 18 5 354

(f) Most commonly occurring chelate loops for Ca and Zn. (See supplementary Table 2D for other metals and Table 2W at http://tanna.bch.ed.ac.uk/arch/ for numbers of all chelate loops for each metal.) For commonly occurring Ca and Zn donor pairs, individual details are given in Table 4W (http://tanna.bch.ed.ac.uk/arch/ ), including amino-acid sequences through the chelate loop and before and after it, conformation described by Efimov type, name of protein from PDB header and resolution, together with a summary of the agreement found by analysis of the torsion angles.

Metal Total No. coordination groups No. chelate loops Commonest donor pairs (number)
Ca 190 606 DD 2 (35) DO 1 (19) OE 5 (27) OD 0 (12)  
      DN 2 (16) DO 2 (38) OO 2 (38) OD 2 (32) [ON 2 (6)]
        NO 2 (15) OO 3 (20) OD 3 (12) [ON 3 (6)]
Zn 149 354 HH 2 (11) HH 4 (18) CC 2 (9) CC 3 (53) CC 5 (9)

In Ca proteins the EF-hand (see Pidcock & Moore, 2001[Pidcock, E. & Moore, G. R. (2001). J. Biol. Inorg. Chem. 6, 479-489.]; Nelson & Chazin, 1998[Nelson, M. R. & Chazin, W. J. (1998). Biometals, 11, 297-318.]; see also http://structbio.vanderbilt.edu/cabp_database/ ) is a very dominant structural motif, with 27 examples of the coordination group DDDOE 2225 or its close relatives in this set of representative proteins, and for Zn the pattern CCCC 3 n 3 with n = 10-20 is common in zinc fingers and related proteins. Apart from these, identical coordination groups (same donors, same residue separation) do not often recur in these tables; when they do the proteins usually have related folds and functions, but even then the conformations may differ, especially in the larger chelate loops. Fig. 2[link] shows an example. A detailed study was made of all the recurring Ca and Zn coordination groups, their conformations, amino-acid sequences etc. and these are available in Table 3W (at http://tanna.bch.ed.ac.uk/arch/ ).

[Figure 2]
Figure 2
The coordination group Zn CCCC 3 3 8 showing its conformation in 1het_A 97 (green) and in 1e3j_A 96 (blue). The coordinating cysteines are labelled C96 in the blue chain, C97 in the green chain etc. 1het is an alcohol dehydrogenase; 1e3j is a ketose reductase. The backbone atoms of residues of the first two chelate loops, CCC 3 3, have been superposed using the program LSQKAB from the CCP4 suite (Collaborative Computational Project, Number 4, 1994[Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760-763.]); their conformations are the same, r.m.s. displacement 0.23  Å, whereas there are marked differences in the larger chelate loop, CC 8. This figure was prepared using MOLSCRIPT (Kraulis, 1991[Kraulis, P. (1991). J. Appl. Cryst. 24, 946-950.]) and RASTER3D (Merritt & Murphy, 1994[Merritt, E. A. & Murphy, M. E. P. (1994). Acta Cryst. D50, 869-873.]).

While whole coordination groups are not often repeated here, except for the Ca EF-hands, some chelate loops that are their components occur frequently in different unrelated proteins, although other chelate loops are found only once or a few times. Small chelate loops, particularly seqdif = 2, are very common for calcium, whereas for zinc seqdif = 3 and larger loops are much more common. In coordination groups with only two protein donors these donors are rarely more than ten residues apart, which is understandable on simple stability grounds. When there are three or more protein donor groups it is common for there to be at least one large loop. Large chelate loops will usually serve the function of holding two parts of the polypeptide chain close to each other; this may be at the active site or simply to provide stability for the whole structure. It is common for long and short chelate loops to alternate in the protein-chain sequence and uncommon for a long loop to follow another long loop; a short loop following another short loop is uncommon in zinc coordination groups, but common in calcium groups.

4.1. Residue conformations and the significance of glycine

Within the chelate loops the nature of the amino acids which are not donors is very varied, even in small loops with the same conformation, but glycine plays an important part in many. The average glycine content over all proteins is 6.9% (evaluated using http://www.expasy.org/tools/pscale/A.A.SWISS-PROT.html for the whole SWISS-PROT database). For all the coordination groups studied there is a 10-15% probability that the amino acid following a donor, i.e. at relseq = +1, is glycine and there is a similar probability for the amino acid preceding a donor; in each position the probability is about twice that in a random sequence. In calcium coordination groups the probability is even higher than in complexes of other metals, rising to 18% in calcium coordination groups with small loops (seqdif = 1-3). High coordination numbers and/or small chelate loops lead to the greatest steric congestion; this should account for the higher frequency of glycine in positions adjacent to donors.

Residues containing donor atoms or adjacent to donor atoms have been examined to see whether any particular conformations are favoured in metal coordination. Conformations have been assigned to categories which are regions of a Ramachandran plot (a) following Hovmöller et al. (2002[Hovmöller, S., Zhou, T. & Ohlson, T. (2002). Acta Cryst. D58, 768-776.]) and (b) in a way related to proposals of Efimov (1993[Efimov, A. V. (1993). Prog. Biophys. Mol. Biol. 60, 201-239.]), as shown in Fig. 3[link]. The Efimov-type conformations are given in supplementary Table 1D for the residues in each coordination group. Their distributions are shown in Table 3[link].

Table 3
Distributions of conformations

(a) Distribution of conformations of amino acids according to the categories helix, sheet, turn and `other' defined by Hovmöller et al. (2002) in all the metal coordination groups treated here. The conformation definitions are shown in Fig. 3[link](a). For comparison, two distributions from Hovmöller et al. (2002[Hovmöller, S., Zhou, T. & Ohlson, T. (2002). Acta Cryst. D58, 768-776.]) are given: the first is for all amino acids in their set of non-redundant and representative protein chains and the second for the subset of these which are classified (FAST) as random coil.

    Helix (%) Sheet (%) Turn (%) Other (%) No. observations
All metal coordination groups Non-glycine donors 42 53 4 1 2042
  Non-glycine adjacent to donors 50 46 4   3417
All metal coordination groups Glycine donors 14 22 19 44 77
  Glycine adjacent to donors 24 21 36 18 442
Compare whole PDB (Hovmöller et al., 2002) All 51 43 5 2 237384
  Classified as random coil 32 54 11 4 96442

(b) Distribution of conformations of donor amino acids and of amino acids adjacent to donors in all the metal coordination groups treated here. The categories are based on those of Efimov (1993) and are shown in Fig. 3[link](b). The comparison sample is for all amino acids in the structures of nine Ca-containing proteins, determined with resolution <=1.4  Å.

  b (%) d (%) k (%) a (%) g (%) j (%) Other (%) No. observations
Metal coordination groups
  Non-glycine donors 48 3 15 26 3   4 2042
  Non-glycine adjacent to donors 43 2 16 32 3   3 3417
  Glycine donors 16   5 8 18 30 23 77
  Glycine adjacent to donors 17 1 7 17 35 11 12 442
Sample of all amino acids in nine Ca proteins 45 2 13 28 5 2 5 1846

(c) Distribution of conformations according to PROCHECK categories. The categories are `core', `allowed', `generous' and `not' (see text). The distributions are for all amino acids other than glycine and proline which provide donors in metal coordination groups or which are adjacent in the amino-acid sequence to one or more such donors; the recommendations are from the current CCP4 instructions for structures determined at a resolution of <2.0  Å (Collaborative Computational Projcet, Number 4, 1994).