[Journal logo]

Volume 36 
Part 1 
Pages 34-42  
February 2003  

Received 23 July 2002
Accepted 3 October 2002

Improved dihedral-angle restraints for protein structure refinement

John P. Priestlea*

Because of the relatively low-resolution diffraction of typical protein crystals, structure refinement is usually carried out employing stereochemical restraints to increase the effective number of observations. Well defined values for bond lengths and angles are available from small-molecule crystal structures. Such values do not exist for dihedral angles because of the concern that the strong crystal contacts in small-molecule crystal structures could distort the dihedral angles. This paper examines the dihedral-angle distributions in ultra-high-resolution protein structures (1.2  Å or better) as a means of analysing the population frequencies of dihedral angles in proteins and compares these with the stereochemical restraints currently used in one of the more widely used molecular-dynamics refinement packages, X-PLOR, and its successor, CNS. Discrepancies between the restraints used in these programs and what is actually seen in high-resolution protein structures are examined and an improved set of dihedral-angle restraint parameters are derived from these inspections.

Keywords: dihedral angles; protein refinement; stereochemical restraints; X-PLOR; CNS.

1. Introduction

Protein crystals tend to diffract to much lower resolution than small molecules. This means that few data are available for refinement of the structure relative to the number of parameters being refined and must be supplemented by known stereochemical (geometry) restraints on e.g. bond lengths, bond angles, planar groups, chiral centres and dihedral (torsion) angles. Generally the `ideal' values for these restraints come from small-molecule crystal structures, for which high-resolution diffraction ensures robust structure refinement without stereochemical restraints and thus provides unbiased precise values for these parameters (Engh & Huber, 1991[Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392-400.]). Unbiased dihedral-angle parameters are difficult to derive from small-molecule crystal structures because the strong crystal contacts in these crystals (ironically the reason they diffract so well) can potentially distort these angles. Dihedral-angle restraints in protein refinement programs are generally derived from the general principle that the most likely dihedral angles are those that minimize steric contact between substituent groups on the atoms defining the dihedral angle. In the protein structure refinement program X-PLOR (Brunger et al., 1987[Brunger, A. T., Kuriyan, J. & Karplus, M. (1987). Science, 235, 458-460.]) and its successor CNS (Accelrys, San Diego), the dihedral energy is defined by the equation

[E_{\rm dihe} = k[1 + \cos(n\theta + \delta)], \eqno (1)]

in which k is a force constant, n is the multiplicity of the rotation, [theta] is the current torsion angle in the structure, and [delta] is an offset angle (phase shift) for this dihedral-angle type that defines the minimum energy (= `ideal' values). Since the minimum energy occurs when n[theta] + [delta] = 180°, one can determine the appropriate value for [delta] for a known `ideal' torsion angle ([theta]min):

[\delta = 180 - n\theta_{\rm min}.\eqno (2)]

In this paper, only variable dihedral angles, i.e. the backbone conformation angles [varphi], [psi] and [omega] and the side-chain [chi] angles, will be examined. Dihedral-angle restraints used to maintain planarity, e.g. in aromatic side chains, will not be considered since their preferred conformations are already well defined.

Since ultra-high-resolution protein structures are not dependent on stereochemical restraints for proper refinement, they should provide an unbiased source for determining protein dihedral-angle propensities. Examination of their population frequencies should lead to proper values for n and [delta], while appropriate force constants can be derived from examination of the width of the distributions around the minima.

2. Materials and methods

2.1. Selection of ultra-high-resolution protein structures

All crystal structures refined with diffraction data to 1.2  Å resolution or better deposited in the Protein Data Bank (Rutgers University) as of 26 October 2001 were selected. Where multiple structures of the same protein met this criterion, only the structure resulting from the highest-resolution data was taken. In addition, to reduce the percentage of residues involved in crystal contacts, only proteins consisting of more than 150 residues were used.

2.2. Dihedral-angle calculation

The torsion angles were calculated with the program TORCHK (J. Priestle, unpublished results) derived from the subroutine TORSON from the structure refinement program PROLSQ (Hendrickson & Konnert, 1980[Hendrickson, W. A. & Konnert, J. H. (1980). Computing in Crystallography, edited by R. Diamond, S. Ramaseshan & K. Venkatesan, pp. 13.01-13.23. Bangalore: Indian Academy of Sciences.]). The program checks that adjacent residues in the coordinate list are actually bonded (C-N distance < 2.5  Å) before calculating the backbone dihedral angles [varphi], [psi] and [omega]. When alternate conformations existed, only the major one, as deduced from the occupancies, was used. When more than one copy of the protein existed in the coordinate file, only the A chain was taken, with the exception of one protein (methyl-coenzyme M reductase), which had a dimer of heterogeneous trimers, for which one copy of each of the unique chains (A, B and C) was used. There was no cutoff for ignoring atoms based on temperature factor, occupancy, or other criteria. A fourth side-chain torsion angle ([chi]4) for proline was defined (C[gamma]-C[delta]-N-C[alpha]), which is usually not considered, but is a restrained dihedral angle nevertheless. Dihedral angles defined to help maintain planarity in aromatic amino acid side chains were not examined as these angles are already well defined (exactly 0° or 180°).

2.3. Torsion-angle analysis

The output from TORCHK for the 46 proteins was then sorted by amino acid type and the individual torsion-angle types grouped together and analysed using Microsoft Excel. In the X-PLOR parameter file, torsion angles are defined according to their innermost atom types, with the exception of the [varphi] angle, which is defined by all four atom types. Early in the analysis it was found necessary to establish a new X-PLOR atom type for the C[alpha] of proline (CH1E-->CH1P), since its geometry is considerably different from the other amino acids with respect to torsion angles. In total, twenty dihedral restraints define all the torsion angles in a protein structure.

The torsion angles measured for a single type were sorted and grouped in 10° bins for frequency analysis. The frequency plots theoretically should reflect the inverse of the energy plots for that dihedral-angle type (low predicted energy should correspond to high population). The multiplicity (n) was determined by visual inspection of the population frequencies. The angle offsets ([delta]) were derived from the torsion-angle minima ([theta]min), which were ascertained by determining the average angle around the frequency maxima. Because of skewing, these were not necessarily the maxima themselves. The X-PLOR dihedral energy formula, while simple, is rather constrained. It demands that the energy minima (population maxima) be evenly spaced and always be exactly (360/n)° apart. This was not always the case and sometimes judicious choices of [delta] had to be made that obeyed this stringent requirement, yet reflected the observed angle population distribution as well as possible. A special case is the side-chain torsion angles of proline, which are constrained by the ring system to angles of roughly +30° or -30°, which translates to a multiplicity of 6 (360°/60° between energy minima). Unavoidably, this then also implies possible minima at the very unlikely angles of ±90° and ±150°.

To check whether there were significant differences in the dihedral-angle frequencies as a function of resolution within this group of 46 high-resolution structures, four representative dihedral angles, namely [omega] + [chi]5 of arginine, [chi]2 of an aromatic amino acid (tyrosine), a side-chain dihedral angle of proline ([chi]1), and the most common side-chain dihedral angle (between CH2E atom types), were analysed in eleven structures solved at `low' resolution (1.2  Å) and in the nine highest-resolution structures (<1.0  Å), and then compared.

2.4. Force-constant determination

In addition to determining more appropriate n and [delta] values based on the observed frequency populations, force constants that are properly scaled to the other stereochemical restraints must also be established. These are generally derived from the observed deviations around the mean (`ideal') values and are directly proportional to 1/[sigma]2. For dihedral-angle restraints this is complicated by the fact that the energy function has multiple minima and is not the usual parabolic function, but a cosine function that approximates a parabolic function near the mean, as seen from the Taylor series for the cosine:

[\cos(\alpha) = 1 - \alpha^{2}/2! + \alpha^{4}/4! - \alpha^{6}/6! + \alpha^{8}/8!\ldots, \eqno (3)]

where the angle [alpha] is in radians.

As long as [alpha] is small (error of <0.01 for deviations of <40°), the decrease of the cosine (increase in dihedral-angle restraint energy) is proportional to one-half the square of the deviation from the ideal: ([alpha]model - [alpha]ideal)2/2. The function is further complicated by the fact that it usually has multiple minima, which has the effect of `squeezing' the energy function (making it increase more quickly) at higher values of n. Lastly, absolute values for the force constants must be found that are in line with those from the other stereochemical restraints. As detailed in the §3[link], the force constant is proportional to 2/([sigma]2n2). The absolute value can be determined by examining the current force constants for bond lengths and angles as a function of 1/[sigma]2 of their parameters.

2.5. Implementation and assessment

The X-PLOR protein topology file protein.top was modified to include a new atom type (CH1P) for the C[alpha] of proline. The X-PLOR protein parameter file protein_rep.param was modified to include the new proline torsion angles. The force constants, multiplicities and/or angle offsets of the dihedral-angle restraints were modified to reflect the torsion-angle propensities actually observed in the high-resolution protein structures.

Assessment of the new parameters versus the original values is not trivial. The dihedral energy of the system cannot be used, since this is directly proportional to the force constants selected. In addition, the original dihedral parameters leave the backbone conformational angle ([varphi], [psi]) unrestrained, i.e. they had a force constant of zero. One possible assessment criteria could be the root-mean-square deviation (r.m.s.d.) from ideal values, but this value is also flawed since it depends on the multiplicity of rotation. Higher multiplicities automatically produce narrower ranges of possible deviation, since once a torsion angle deviates too far, it `belongs' to the next minimum. A normalized measure of the r.m.s.d., i.e. r.m.s.d. divided by the maximum possible deviation (r/m.d.), should better demonstrate the distribution of the torsion angles around the minimum and can be compared with what it would be if the torsion angles were randomly distributed (r/m.d. [asymptotically equal to] 0.577). Note that an incorrect selection of [theta]min can lead to deviations being systematically worse than random.

The best evidence that the new restraints are better, of course, is whether the use of these parameters causes protein structures, especially those being refined with low-resolution diffraction data, to refine better, i.e. overall better R factors and stereochemistry. Unfortunately, this can only be determined after a significantly large number of structures have been refined independently using both sets of parameters and a comparison made. This is, regrettably, beyond the scope of the current study, but it seems reasonable to assume that dihedral-angle restraints that reflect actual angle propensities in protein structures can only be an improvement over parameters that tend to draw the structure into improbable conformations.

3. Results

3.1. Torsion-angle analysis of high-resolution protein structures

46 ultra-high-resolution structures were selected, which ranged in resolution from 1.20 to 0.78  Å and in size from 151 to 501 residues (Table 1[link]). The vast majority (85%) had a single protein chain in the asymmetric unit. Nine were present as dimers and one hexamer (a dimer of trimers) was seen. Most of the structures were refined with SHELXL (Sheldrick & Schneider, 1997[Sheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319-343.]). REFMAC (Murshudov et al., 1997[Sheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319-343.]) was used in nine cases; X-PLOR was used for two structures, while CNS and RESTRAIN (Driessen et al., 1989[Driessen, H., Haneef, M. I. J., Harris, G. W., Howlin, B., Khan, G. & Moss, D. S. (1989). J. Appl. Cryst. 22, 510-516.]) were each used once. SHELXL does not use dihedral (torsion) angle restraints because of their multimodal functionality and because they provide an independent stereochemical check. REFMAC does not restrain dihedral angles, per se, but does provide specific van der Waals contact parameters for 1-4 atom pairs.

Table 1
The 46 ultra-high resolution (equal to or better than 12  Å) protein structures (>150 amino acids) selected for investigating protein dihedral angles

PDB code Protein Resolution (Å) Refinement program Reference
1A6M Oxy-myoglobin 1.00 SHELXL Vojtechovsky et al. (1999)[Vojtechovsky, J., Chu, K., Berendzen, J., Sweet, R. M. & Schlichting, I. (1999). Biophys. J. 77, 2153-2174.]
1A7S Heparin binding protein 1.12 SHELXL Karlsen et al. (1998)[Karlsen, S., Iversen, L. F., Larsen, I. K., Flodgaard, H. J. & Kastrup, J. S. (1998). Acta Cryst. D54, 598-609.]
1AMM [gamma]-B crystallin 1.20 RESTRAIN Kumaraswamy et al. (1996)[Kumaraswamy, V. S., Lindley, P. F., Slingsby, C. & Glover, I. D. (1996). Acta Cryst. D52, 611-622.]
1ATG Periplasmic molybdate-binding protein 1.20 REFMAC Lawson et al. (1997)[Lawson, D. M., Williams, C. E., White, D. J., Choay, A. P., Mitchenall, L. A. & Pau, R. N. (1997). J. Chem. Soc. Dalton Trans. pp. 3981-3984.]
1B6G Haloalkane dehalogenase 1.15 SHELXL Ridder et al. (1999)[Ridder, I. S., Rozeboom, H. J. & Dijkstra, B. W. (1999). Acta Cryst. D55, 1273-1290.]
1BXO Penicillopepsin 0.95 SHELXL Kuhn et al. (1998)[Kuhn, P., Knapp, M., Soltis, S. M., Ganshaw, G., Thoene, M. & Bott, R. (1998). Biochemistry, 37, 13446-13452.]
1BYI Dethiobiotin synthase 0.97 SHELXL Sandalova et al. (1999)[Sandalova, T., Schneider, G., Kack, H. & Lindqvist, Y. (1999). Acta Cryst. D55, 610-624.]
1C0P D-Amino acid oxidase 1.20 SHELXL Umhau et al. (2000)[Umhau, S., Pollegioni, L., Molla, G., Diederichs, K., Welte, W., Pilone, M. S. & Ghisla, S. (2000). Natl Acad. Sci. USA, 97, 12463-12468.]
1CEX Cutinase 1.00 SHELXL Longhi et al. (1997)[Longhi, S., Czjzek, M., Lamzin, V., Nicolas, A. & Cambillau, C. (1997). J. Mol. Biol. 268, 779-799.]
1CXQ Asv integrase core domain 1.02 SHELXL Lubkowski et al. (1999)[Lubkowski, J., Dauter, Z., Yang, F., Alexandratos, J., Merkel, G., Skalka, A. M. & Wlodawer, A. (1999). Biochemistry, 38, 13512-13522.]
1D2U Nitrophorin 4 1.15 SHELXL Roberts et al. (2001)[Roberts, S. A., Weichsel, A., Qiu, Y., Shelnutt, J. A., Walker, F. A. & Montfort, W. R. (2001). Biochemistry, 40, 11327-11337.]
1D5T Guanine nucleotide dissociation inhibitor 1.04 SHELXL Luan et al. (2000)[Luan, P., Heine, A., Zeng, K., Moyer, B., Greasley, S. E., Kuhn, P., Balch, W. E. & Wilson, I. A. (2000). Traffic, 1, 270-281.]
1DS1 Clavaminate synthase 1.08 SHELXL Zhang et al. (2000)[Zhang, Z. H., Ren, J., Stammers, D. K., Baldwin, J. E., Harlos, K. & Schofield, C. J. (2000). Nature Struct. Biol. 7, 127-133.]
1E4M Myrosinase 1.20 REFMAC Burmeister et al. (2000)[Burmeister, W. P., Cottaz, S., Rollin, P., Vasella, A. & Henrissat, B. (2000). J. Biol. Chem. 275, 39385-39393.]
1E9G Inorganic pyrophosphatase 1.15 SHELXL Heikinheimo et al. (2001)[Heikinheimo, P., Tuominen, V., Ahonen, A. K., Teplyakov, A., Cooperman, B. S., Baykov, A. A., Lahti, R. & Goldman, A. (2001). Proc. Natl Acad. Sci. USA, 98, 3121-3126.]
1EUW Dutpase 1.05 SHELXL González et al. (2001)[González, A., Larsson, G., Persson, R. & Cedergren-Zeppezauer, E. (2001). Acta Cryst. D57, 767-774.]
1FN8 Trypsin 0.81 SHELXL Rypniewski et al. (2001)[Rypniewski, W. R., Ostergaard, P. R., Noerregaard-Madsen, M., Dauter, M. & Wilson, K. S. (2001). Acta Cryst. D57, 8-19.]
1FSG Hypoxanthine-guanine phosphoribosyltranferase 1.05 SHELXL Heroux et al. (1999)[Heroux, A., White, E. L., Ross, L. J., Davis, R. L. & Borhani, D. W. (1999). Biochemistry, 38, 14495-14506.]
1FXM Xylanase I 1.14 SHELXL Teixeira et al. (2001)[Teixeira, S., Lo Leggio, L., Pickersgill, R. & Cardin, C. (2001). Acta Cryst. D57, 385-392.]
1FY2 Aspartyl dipeptidase 1.20 CNS Hakansson et al. (2000)[Hakansson, K., Wang, A. H.-J. & Miller, C. G. (2000). Proc. Natl Acad. Sci. USA, 97, 14097-14102.]
1G66 Acetylxylan esterase 0.90 SHELXL Ghosh et al. (2001)[Ghosh, D., Sawicki, M., Lala, P., Erman, M., Pangborn, W., Eyzaguirre, J., Gutierrez, R., Jornvall, H. & Thiel, D. J. (2001). J. Biol. Chem. 276, 11159-11166.]
1G8T Sm endonulcease 1.10 REFMAC Shlyapnikov et al. (2000)[Shlyapnikov, S. V., Lunin, V. V., Perbandt, M., Polyakov, K. M., Lunin, V. Y., Levdikov, V. M., Betzel, C. & Mikhailov, A. M. (2000). Acta Cryst. D56, 567-572.]
1GA6 Serine-carboxyl proteinase 1.00 SHELXL Wlodawer et al. (2001)[Wlodawer, A., Li, M., Dauter, Z., Gustchina, A., Uchida, K., Oyama, H., Dunn, B. M. & Oda, K. (2001). Nature Struct. Biol. 8, 442-446.]
1GCI Subtilisin 0.78 SHELXL Kuhn et al. (1998)[Kuhn, P., Knapp, M., Soltis, S. M., Ganshaw, G., Thoene, M. & Bott, R. (1998). Biochemistry, 37, 13446-13452.]
1HBN Methyl-coenzyme M reductase 1.16 REFMAC Grabarse et al. (2001)[Grabarse, W., Mahlert, F., Duin, E. C., Goubeaud, M., Shima, S., Thauer, R. K., Lamzin, V. & Ermler, U. (2001). J. Mol. Biol. 309, 315-330.]
1HE2 Biliverdin IX [beta] reductase 1.20 SHELXL Pereira et al. (2001)[Pereira, P. J. B., Macedo-Ribeiro, S., Parraga, A., Perez-Luque, R., Cunningham, O., Darcy, K., Mantle, T. J. & Collect, M. (2001). Nature Struct. Biol. 8, 215-220.]
1HET Liver alcohol dehydrogenase 1.15 REFMAC Meijers et al. (2001)[Meijers, R., Morris, R. J., Adolph, H. W., Merli, A., Lamzin, V. S. & Cedergen-Zeppezauer, E. S. (2001). J. Biol. Chem. 276, 9316-9321.]
1HVB DD-Carboxypeptidase 1.17 SHELXL Lee et al. (2001)[Lee, W., McDonough, M. A., Kotra, L. P., Li, Z.-H., Silvaggi, N. R., Takeda, Y., Kelly, J. A. & Mobashery, S. (2001). Proc. Natl Acad. Sci. USA, 98, 1427-1431.]
1I4U [alpha]-Crustacyanin 1.15 SHELXL Gordon et al. (2001)[Gordon, E. J., Leonard, G. A., McSweeney, S. & Zagalsky, P. F. (2001). Acta Cryst. D57, 1230-1237.]
1I76 Catalytic domain of matrix metallo proteinase-8 1.20 SHELXL Gavuzzo et al. (2000)[Gavuzzo, E., Pochetti, G., Mazza, F., Gallina, C., Gorini, B., D'Alessio, S., Pieper, M., Tschesche, H. & Tucker, P. A. (2000). J. Med. Chem. 43, 3377-3385.]
1IC6 Serine protease proteinase K 0.98 REFMAC Betzel et al. (2001)[Betzel, C., Gourinath, S., Kumar, P., Kaur, P., Perbandt, M., Eschenburg, S. & Singh, T. P. (2001). Biochemistry, 40, 3080-3088.]
1IXH Phosphate-binding protein 0.98 SHELXL Wang et al. (1997)[Wang, Z., Luecke, H., Yao, N. & Quiocho, F. A. (1997) Nature Struct. Biol. 4, 519-522.]
1JCJ Deoxyribose-phosphate aldolase 1.10 SHELXL Heine et al. (2001)[Heine, A., Desantis, G., Luz, J. G., Mitchell, M., Wong, C.-H. & Wilson, I. A. (2001). Science, 294, 369-374.]
1JK3 Metalloelastase 1.09 REFMAC Lang et al. (2001)[Lang, R., Kocourek, A., Braun, M., Tschesche, H., Huber, R., Bode, W. & Maskos, K. (2001). J. Mol. Biol. 312, 731-742.]
1MFM Copper, zinc superoxide dismutase 1.02 SHELXL Ferraroni et al. (1999)[Ferraroni, M., Rypniewski, W., Wilson, K. S., Viezzoli, M. S., Banci, L., Bertini, I. & Mangani, S. (1999) J. Mol. Biol. 288, 413-426.]
1MUN Adenine glycosylase 1.20 SHELXL Guan et al. (1998)[Guan, Y., Manuel, R. C., Arvai, A. S., Parikh, S. S., Mol, C. D., Miller, J. H., Lloyd, S. & Tainer, J. A. (1998) Nature Struct. Biol. 5, 1058-1064.]
1NLS Concanavalin A 0.94 SHELXL Deacon et al. (1997)[Deacon, A., Gleichmann, T., Kalb, A. J., Price, H., Raftery, J., Bradbrook, G., Yariv, J. & Helliwell, J. R. (1997). J. Chem. Soc. Faraday Trans. 93, 4305-4312.]
1QJ4 Hydroxynitrile lyase 1.10 SHELXL Gruber et al. (1999)[Gruber, K., Gugganig, M., Wagner, U. G. & Kratky, C. (1999). Biol. Chem. 380, 993-1000.]
1QLW Esterase 1.10 REFMAC Bourne et al. (2000)[Bourne, P. C., Isupov, M. N. & Littlechild, J. A. (2000). Structure, 8, 143-151.]
1QNJ Elastase 1.10 SHELXL Würtele et al. (2000)[Würtele, M., Hahn, M., Hilpert, K. & Hohne, W. (2000). Acta Cryst. D56, 520-523.]
1QQ4 [alpha]-Lytic protease 1.20 X-PLOR Derman & Agard (1999)[Derman, A. I. & Agard, D. A. (1999). Nature Struct. Biol. 7, 394-397.]
1QTW Endonuclease IV 1.02 SHELXL Hosfield et al. (1999)[Hosfield, D. J., Guan, Y., Haas, B. J., Cunningham, R. P. & Tainer, J. A. (1999). Cell, 98, 397-408.]
2NLR Endoglucanse 1.20 SHELXL Sulzenbacher et al. (1999)[Sulzenbacher, G., Mackenzie, L. F., Wilson, K. S., Withers, S. G., Dupont, C. & Davies, G. J. (1999). Biochemistry, 38, 4826-4833.]
2PTH Peptidyl-tRNA hydrolase 1.20 X-PLOR Schmitt et al. (1997)[Schmitt, E., Mechulam, Y., Fromant, M., Plateau, P. & Blanquet, S. (1997). EMBO J. 16, 4760-4769.]
3SIL Neuraminidase 1.05 SHELXL Garman et al. (1996)[Garman, E. F., Wouters, J., Vimr, E., Laver, G. & Sheldrick, G. M. (1996). Acta Cryst. A52, C-8.]
8A3H Endoglucanase 0.97 REFMAC Varrot et al. (2001)[Varrot, A., Schulein, M., Fruchard, S., Driguez, H. & Davies, G. J. (2001). Acta Cryst. D57, 1739-1742.]

The 46 high-resolution protein structures provided 65290 dihedral angles from 14007 amino acid residues with populations ranging from a minimum of 183 cysteine residues to 1367 alanine residues, so that all amino acid and dihedral-angle types had statistically large enough populations (Table 2[link]).

Table 2
Dihedral angles defined by X-PLOR using the X-PLOR atom nomenclature (Brunger et al., 1987[Brunger, A. T., Kuriyan, J. & Karplus, M. (1987). Science, 235, 458-460.]) plus atom types added by Engh & Huber (1991[Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392-400.]), the actual protein torsion angles defined by these atoms, the number of such torsion angles observed in 46 high-resolution protein structures, the average torsion-angle minima found ([theta]min) and the standard deviations ([sigma]) around these minima

Angle Examples N [theta]min (°) [sigma] (°)
NH1-CH1E [varphi] (except Gly and Pro) 12012 -89.2, 60.0 29.9
NH1-CH2G [varphi] (Gly) 1270 -90.9, 92.2 30.8
NH1-CH1P# [varphi] (Pro) 659 -65.8 10.4
CH1E-C [psi] (except Gly) 12671 -27.0, 138.1 24.8
CH2G-C [psi] (Gly) 1269 -7.0, 179.2 28.3
C-NH1 [omega], [chi]5 (Arg) 14532 179.3, 0.5 6.2
CH1E-CH2E [chi]1 (unbranched C[beta]), [chi]2 (Ile, Leu) 9007 -65.8, 64.7, 179.6 11.6
CH1E-CH1E [chi]1 (Ile, Thr, Val) 2413 -60.4, 62.7, 178.3 8.0
CH2E-CH2E [chi]2 (Arg, Gln, Glu, Lys, Met), [chi]3 (Arg, Lys), [chi]4 (Lys) 4150 -67.9, 69.4, 179.5 15.7
CH2E-C [chi]2 (Asn, Asp), [chi]3 (Gln, Glu) 2449 -12.3, 165.1 41.0
CH2E-C5 [chi]2 (His) 299 -83.0, 97.6 34.4
CH2E-CF [chi]2 (Phe) 496 -70.4, 83.2 26.0
CH2E-C5W [chi]2 (Trp) 191 -77.3, 82.7 33.8
CH2E-CY [chi]2 (Tyr) 483 -76.1, 84.1 22.8
CH2E-SM [chi]3 (Met) 220 -67.7, 72.4, 181.4 19.0
CH2E-NH1 [chi]4 (Arg) 533 -89.5, 91.8, 181.0 16.4
CH1P-CH2E# [chi]1 (Pro) 659 -24.0, 25.7 8.0
CH2E-CH2P [chi]2 (Pro) 659 -33.4, 34.2 8.9
CH2P-CH2P [chi]3 (Pro) 659 -30.6, 27.2 7.8
CH2P-N [chi]4 (Pro) 659 -12.0, 15.6 6.5
#Newly defined dihedral angle.

Analysis of a dihedral-angle type consisted of bringing all measured occurrences of that type into a single column of a Microsoft Excel worksheet. These were then sorted and the frequencies grouped into 10° bins starting with -185° and ending with +185°. The first and last bins (only 5° wide) were added together and were identical. These frequencies were then plotted and examined. Multiplicity of rotation (n) was usually easily determined by visual inspection (Fig. 1[link]). Average dihedral angles around the maximum were determined and adjusted to be (360/n)° apart. Root-mean-square deviations from these average angles were then calculated.

[Figure 1]
Figure 1
Frequency of torsion angles observed in 46 ultra-high-resolution protein structures in 10° bins. See Table 2[link] for conversion between torsion angles defined by X-PLOR atom types and those labeled using the standard protein conformation angle nomenclature. Note that unlike the rest, the peptide [omega]-angle/[chi]5 of arginine runs from 0 to 360° to avoid splitting the peak across the two ends.

With two new dihedral-angle definitions necessitated by the introduction of the new atom type CE1P for the C[alpha] of proline, 20 different dihedral-angle types for proteins were defined by X-PLOR/CNS. Table 2[link] lists these definitions, along with the locations of the torsion angles in the proteins, the number measured and what their frequency maxima and r.m.s.d. were calculated to be, based on the torsion angles observed in the 46 high-resolution protein structures. Table 3[link] presents a comparison of four representative dihedral angles calculated from only the `low'-resolution structures (1.2  Å) and the highest-resolution structures (<1.0  Å).

Table 3
Comparison of four representative dihedral-angle frequency distributions (mean ± r.m.s.d.) determined from the eleven `low'-resolution structures (12  Å) and the nine highest-resolution structures (<10  Å)

The means are always within 3° of each other (usually much less). The widths of the distributions are only significantly different for the [omega] peptide dihedral angle, which has more to do with the refinement programs used (see §3.3[link]), and for the dihedral angles CH2E-CH2E.

Angle High resolution Low resolution
[omega], [chi]5 (Arg) 179.2 ± 6.2° 179.4 ± 5.2°
CH2E-CH2E -66.1 ± 15.4° -66.8 ± 18.8°
  68.1 ± 16.6° 69.2 ± 20.4°
  178.9 ± 12.4° 178.9 ± 13.6°
[chi]2 (Tyr) -74.1 ± 19.5° -76.6 ± 19.1°
  81.3 ± 16.0° 80.1 ± 17.4°
[chi]1 (Pro) -23.6 ± 8.8° -23.7 ± 7.5°
  26.3 ± 7.1° 27.3 ± 8.8°

3.2. Calculation of appropriate force constants

Generally, the force constants of stereochemical restraints (`weights' in classical least-squares refinement) are directly proportional to 1/[sigma]2, where [sigma] is the standard deviation of the population of observed values around the mean (`ideal') value. Determining the force constants based on population distributions for dihedral angles is slightly more complicated. In the first place, the energy function itself is not the usual parabolic function [Eq = k*(qobs - qmodel)2]. However, the cosine function used for dihedral-angle restraints in CNS does approximate this function near the ideal values, although it increases only as [Delta]2/2 instead of [Delta]2 [see equation (3)[link]]. A further complication is that for the dihedral energy function, the energy increases more quickly with increasing multiplicity (n). Since the cosine function is acting like a parabolic function here, the effect is to overweight the energy by n2. The dihedral-angle restraint force constants are therefore directly proportional to 2/([sigma]2n2).

The appropriate absolute values of the dihedral-angle restraint force constants should be scaled to the other stereochemical restraints. The bond-distance and bond-angle restraints most commonly used with X-PLOR are those suggested by Engh & Huber (1991[Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392-400.]), based on their examination of small-molecule bond lengths and angles. Their force constants were calculated to give the same distribution of lengths and angles in protein structures as seen in small-molecule crystal structures when subjected to molecular dynamics at room temperature in the absence of other forces and then scaled to be consistent with the existing dihedral and improper angle parameters. Examination of the bond-distance force constants shows that kbond = 0.592/[sigma]2 ([sigma] in Å) and for bond angles kangle = 0.592/[sigma]2 ([sigma] in radians). To keep the new dihedral restraint force constants consistent with the other stereochemical restraints, they should therefore be defined as

[k_{\rm dihe} = 2 \times 0.592 / \sigma^{2}n^{2} \eqno (4)]

([sigma] in radians, n = multiplicity of rotation). Using the observed [sigma] values listed in Table 2[link], converted to radians, the force constants for the dihedral-angle restraints were calculated according to equation (4)[link] and are listed in Table 3[link].

3.3. Comparison of new and old dihedral restraints

New multiplicities of rotations (n), offset angles ([delta]) and force constants (k) were calculated for all 20 dihedral restraints based on their observed frequency distribution in 46 high-resolution protein structures (Table 2[link]). In 16 cases the frequency modality implied by the X-PLOR restraints did not agree with the observed distributions (Fig. 1[link], Table 4[link]). In four of these cases, this had no influence on the refinement procedure, since these angles are usually left unrestrained (force constant = 0), although these angles are included in the calculation of the r.m.s.d. for dihedral angles. The currently used force constants (protein_rep.param or protein.param) are all considerably larger than the observed r.m.s.d. of the torsion angles would indicate. This implies that the torsion-angle restraints are overweighted in the current parameter files relative to the other stereochemical restraints, which is a potentially disturbing state of affairs, since they are generally being restrained to incorrect `ideal' values. It should be noted that these dihedral force constants were increased by a factor of three from the original X-PLOR force constants, not because an analysis of torsion angles suggested that this should be so, but because it was noted that the force constants of the Engh & Huber (1991[Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392-400.]) bond-distance and -angle parameters on average were, respectively, roughly three times and seven times larger than the original X-PLOR force constants (Weis, 1992[Weis (1992). Comment in distributed X-PLOR parameter file protein.param.]), although Engh & Huber (1991[Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392-400.]) explicitly state that their bond and angle parameters are already scaled to the original dihedral and improper angle restraints.

Table 4
Comparison of the original and newly proposed dihedral restraint parameters

In only four cases did the old parameters agree with the observed torsion-angle distributions. The energy is calculated using the same (new) force constant for comparison purposes. r/m.d. is the root-mean-square deviation normalized by dividing by the maximum possible deviation. For a random distribution r/m.d. [asymptotically equal to] 0.577. For six of the old dihedral-angle parameters, r/m.d. is worse than random because the `ideal' population maxima are far enough away from the observed maxima to create a distribution of deviations that are systematically worse than random. Note that for the new parameters, r/m.d. is always smaller than for the old parameters, even in cases where the maximum possible deviation (m.d.) is much smaller.

Angle nold [delta]old kold nnew [delta]new knew <Eold> <Enew> r/m.d.old r/m.d.new
NH1-CH1E 3     2   1.08 0.79 0.48 0.50 0.33
NH1-CH2G 3     2 -2 1.02 0.70 0.41 0.45 0.34
NH1-CH1P 3   5 1 246 35.93 6.04 0.59 0.20 0.06
CH1E-C 3     2 -112 1.58 1.65 0.44 0.59 0.28
CH2G-C 3     2 188 1.21 1.31 0.47 0.61 0.31
C-NH1 2 180 1250 2 180 25.28 0.56 0.56 0.07 0.07
CH1E-CH2E 3   5 3   3.21 0.51 0.51 0.19 0.19
CH1E-CH1E 3   5 3   6.75 0.56 0.56 0.13 0.13
CH2E-CH2E 3   5 3   1.75 0.45 0.45 0.26 0.26
CH2E-C 3   5 2 210 0.58 0.43 0.43 0.60 0.46
CH2E-C5 3   5 2 -14 0.82 0.55 0.41 0.45 0.38
CH2E-CF 3   5 2 -12 1.44 0.98 0.46 0.44 0.29
CH2E-C5W 3   5 2 -6 0.85 1.02 0.39 0.62 0.44
CH2E-CY 3   5 2