REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use
One of the most important aspects of macromolecular structure refinement is the use of prior chemical knowledge. Bond lengths, bond angles and other chemical properties are used in restrained refinement as subsidiary conditions. This contribution describes the organization and some aspects of the use of the flexible and human/machine-readable dictionary of prior chemical knowledge used by the maximum-likelihood macromolecular-refinement program REFMAC5. The dictionary stores information about monomers which represent the constitutive building blocks of biological macromolecules (amino acids, nucleic acids and saccharides) and about numerous organic/inorganic compounds commonly found in macromolecular crystallography. It also describes the modifications the building blocks undergo as a result of chemical reactions and the links required for polymer formation. More than 2000 monomer entries, 100 modification entries and 200 link entries are currently available. Algorithms and tools for updating and adding new entries to the dictionary have also been developed and are presented here. In many cases, the REFMAC5 dictionary allows entirely automatic generation of restraints within REFMAC5 refinement runs.
Macromolecular crystal structure analysis can be regarded as an application of Bayesian statistics. This type of statistical analysis is centred on the Bayes' theorem, which can be formulated as
where P(x; F) is the probability distribution of the model's parameters x given the experimental data F, P(x) is the prior probability distribution of x and is the likelihood of x given F. A perspective on macromolecular crystallography formulated from the Bayesian viewpoint can be found in Bricogne (1997).
An essential and distinctive component of the Bayesian treatment and analysis of experimental data is the notion of prior knowledge. P(x) embeds, in real space, the most important prior information available for macromolecular crystal structure analysis. This type of information can be loosely divided into two families: (i) the available three-dimensional structures of macromolecules deposited within the Protein Data Bank and (ii) the relative invariance of elementary chemical properties such as bond lengths, bond angles, chiral volumes and planes.
A very important, although heavily underused, source of prior information for macromolecular experimental techniques is the PDB (Bernstein et al., 1977; Berman et al., 2002). It is very likely that many features of newly determined structures are already present within the PDB. This aspect of the utilization of the available information is growing rapidly and there are already some applications of it in such branches of crystal structure analysis as model building (Jones et al., 1991) and density modification (Terwilliger, 2003). In the future, a greater utilization of this type of information can be envisaged. Careful analysis and statistically sensible use of this information will definitely enhance and extend the applicability of the currently available experimental techniques for macromolecular-structure analysis (e.g. crystallography). This information can also be used in other branches of computational macromolecular biology, such as three-dimensional structure prediction and homology modelling.
The importance of using known chemical properties such as bond lengths and bond angles as subsidiary conditions in macromolecular crystallographic refinement has been recognized for a long time (Waser, 1963; Diamond, 1972; Jack & Levitt, 1978; Konnert & Hendrickson, 1980). The primary justification for the use of these properties is that the experimental data alone are not sufficient to completely define the three-dimensional structure of macromolecules. Therefore, in order to extract information from the experiment while retaining chemical integrity it is necessary to use some prior chemical information. This is true virtually at all resolutions. Even when data at atomic or subatomic resolution are available, some restraints are needed to rationalize regions of the molecule which are too disordered. Moreover, the availability of data at very high resolutions encourages the analysis of chemical properties, such as charge densities, which are not considered at lower resolution. This in turn increases the number of parameters to be refined. To keep the observations-to-parameters ratio to a reasonable value, the use of prior chemical knowledge is required.
The Bayesian framework provides a natural platform for the incorporation of prior knowledge for data analysis. The purpose of the present contribution is to describe the design, organization and some practical aspects of the use of the dictionary of prior chemical information used by the maximum-likelihood macromolecular crystallographic refinement program REFMAC5 (Murshudov et al., 1997) from the CCP4 suite (Collaborative Computational Project, Number 4, 1994).
In general, two main approaches have been used by various programs to organize prior chemical information. The first approach uses the concept of chemical atom types. The second approach is based on larger monomer fragments.
In the atom-type approach chemical elements are assigned to different atom types depending on their chemical environment. For example, C atoms characterized by different degrees of sp-hybridization or aromatic C atoms surrounded by different neighbours constitute different atom types. The most popular atom types are those used by the AMBER (Allinger, 1977; Pearlman et al., 1995) and CHARMM (Brooks et al., 1983) programs. All possible bond distances between pairs of atom types as well as all possible angles defined by three atom types and torsion angles defined for four atom types are tabulated. These values are used to define the initial bond lengths, angles, torsion angles and other chemical parameters of a compound. All these values are generally refined using some semi-empirical energy function (see Ponder & Case, 2003; Brooks et al., 1983). Usually, such an energy function requires well defined point charges on each atom type.
Although the atom-type approach has been and is being successfully used in crystallographic refinement [X-PLOR (Brünger, 1992) and CNS (Brünger et al., 1998)], it is exposed to a few potential problems. Firstly, for general cases, the number of possible atom types is enormously large and the number of possible bonds and angles is so huge that it becomes impractical to store all required information. To deal with this problem, simplifications are used in many cases (Allinger, 1977). Secondly, atom types and their parameters have been carefully analysed only in a limited number of cases. When less usual atom types are encountered, tabulated values might not be directly transferable. An additional problem of the atom-type approach arises in the case of metal coordination. A metal atom can have very different coordination properties while preserving the same formal oxidation state. As a result, metric properties related to the same metal-atom type are extremely hard to handle in an automatic manner using atom types.
The second approach used to encode prior chemical knowledge, the monomer approach, is particularly suited to the case of biological macromolecules (proteins, DNA/RNA, polysaccharides). It explicitly uses the fact that that these compounds are made up of repeating units. These building blocks (monomers), which can formally exist as independent entities, form larger molecules by undergoing reactions that link monomers together. In general, the linkage of monomers partly changes their nature; for example, by introducing or removing atoms. It can also generate new bonds, angles, torsion angles, planes and chiralities. The monomer approach handles in a natural way various changes affecting the monomers themselves. Many proteins are modified after translation or during their activity. Binding of carbohydrates to asparagine residues or phosphorylation of serines are two such examples. These modifications alter the characteristics of monomers whilst largely preserving their intimate nature. They can be handled by describing the change brought about on monomers in terms of atoms added to and removed from the original monomer.
Pioneering refinement programs such as PROLSQ (Konnert & Hendrickson, 1980) and NUCLSQ (Westhof et al., 1988) used the monomer approach. However, in both programs links were hard-coded. In addition, PROLSQ dealt only with polypeptides, whereas NUCLSQ dealt only with nucleic acids. Extension to other polymers such as sugars or mixtures of different polymers was almost impossible and required substantial modifications to the code.
The full exploitation of the advantages of the monomer approach requires dynamic definition of links and modifications; for example, by the use of code-independent external data files. It also requires the availability of accurate complete descriptions of monomers, links and modifications. Ideally, a package that uses this dictionary design should have tools to add new entries and to update old ones.
The dictionary used by the program REFMAC5 has been designed according to flexibility criteria. It is largely based on the monomer approach described in the previous section and allows dynamic definition of links and modifications. It contains carefully analysed descriptions for most common monomers, modifications and links. When necessary, owing to the impossibility of storing all possible information, prior chemical knowledge is managed semi-automatically using the atom-type approach.
The REFMAC5 dictionary is written in an extended mmCIF format (Bourne et al., 1997). This is based on the STAR style (Hall, 1991) and the CIF format (Hall et al., 1991) used in small-molecule crystallography. The attractive side of the mmCIF format is that any data file based on it can easily be extended without affecting the functionalities of programs already using it.
The REFMAC5 dictionary contains a list of monomers, modifications and links along with their descriptions. Monomer descriptions define the stereochemical parameters of independent compounds. Modifications and links encapsulate the changes brought about on them by chemical reactions. Modifications typically act on a single monomer, whilst links join monomers together.
The currently distributed version of the dictionary has entries for all amino acids as well as for many of their possible modifications, for all nucleic acids and some of their modifications and for most common sugars and their modifications. It also has entries for many organic and inorganic compounds frequently encountered when solving macromolecular structures. As some monomers have several well established common names, the dictionary contains a list of synonyms capable of handling them. The dictionary also contains frequently encountered links such as trans/cis and methylated peptide links, sugar–sugar and sugar–protein links, as well as DNA/RNA links.
More than 2000 monomer entries, 100 modification entries and 200 link entries are currently available. Such a large dictionary covers most common users' needs. A full list of monomers, modifications and links available within the REFMAC5 dictionary can be found at the web page https://www.ysbl.york.ac.uk/~alexei/dictionary.html .
The dictionary can be extended easily by users. Users can create and organize personal monomer entries as well as modifications and links. In case of conflict, a user's definitions always override those stored within the distributed dictionary.
At present, the dictionary is used mainly by the program REFMAC5 for restrained refinement. However, its organization is such that it can easily be used by other programs dealing with macromolecules; for example, the model-building program COOT (Emsley & Cowtan, 2004). Applications for molecular simulation and modelling that use the REFMAC5 dictionary are currently being developed.
For a monomer to be completely defined, information must be available about its constitutent atom(s) and, if present, about its bonds, angles, torsion angles, planes and chiral centres. An example of a complete monomer description is given in Fig. 1.
Monomers are described by the following categories.
Ideal values for bond lengths and bond angles for standard amino acids present in the dictionary have been taken from Engh & Huber (1991). Ideal values for bond lengths and angles for nucleic acids have been taken from Kennard & Taylor (1982). Ideal values for bond lengths and angles for most saccharides have been taken from Saenger (1983).
At present, about 1000 monomers of the 2000 available in the REFMAC5 dictionary are present with a complete description. The remaining monomers are present with a minimal description. Work is in progress to deliver in the shortest time possible a dictionary in which all entries are present with checked complete descriptions.
A modification is a formalism which describes changes brought about on a single monomer by chemical reactions. An example of modification is shown in Fig. 2(a). Its dictionary description is given in Fig. 3. A modification allows atoms, bonds, angles, torsion angles, planes and chiral centres to be added to or deleted from monomers. The use of modifications greatly reduces the number of monomer descriptions that need to be stored and allows proper description of links between monomers, as some of them require monomers to first undergo modifications prior to linkage. Modifications can also be used for non-chemical changes on monomers such as changes in residue name. This is a convenient way of handling cases of multiple monomer names. In such cases the modification keyword is `RENAME'. This keyword is also used to overcome the three-letter restriction imposed by the PDB convention.
The link formalism allows the joining of monomers together. An example of a link is shown in Fig. 2(b). Its description is given in Fig. 4. Links can be considered to be the external counterpart of monomer descriptions. Whereas monomer descriptions give the internal structure of single chemical compounds, link descriptions define in detail the result of chemical reactions between monomers. Link descriptions contain information about the monomers or the group of monomers they act on as well as about the modifications these monomers should undergo prior to linkage. In the current version of the dictionary, a link can form only one bond. However, the introduction of several angles, torsion angles, planes and chiral centres is allowed.
Although the REFMAC5 dictionary is largely based on monomers, it also contains an atom-type library. At present, it contains about 200 atom types. It includes all chemical elements as well as many atom types commonly encountered in chemistry. Each entry has information about the chemical element the atom type belongs to as well as about its van der Waals (VDW) and ionic radii. The atom-type library also contains information about possible bonds between atom types. For many pairs of atom types, bond orders and bond lengths are tabulated. Angles corresponding to some of the atom-type triplets are also listed. The atom-type library is in mmCIF format. Therefore, it can easily be updated and extended. A full list of all atom-type library entries can be found at the web page https://www.ysbl.york.ac.uk/~alexei/dictionary.html .
The bond lengths listed in the atom-type library have been taken from the International Tables for Crystallography (Allen et al., 1992; Orpen et al., 1992). VDW and ionic radii of atoms have been taken from various sources including Greenwood & Earnshaw (1989) and Cotton & Wilkinson (1972). Unfortunately, to our knowledge there is no single general reference for bond angles. Some of the angles have been taken from examples from the Cambridge Structural Database (Allen, 2002), while others have been derived using general information about atoms, i.e. their hybridization and the nature of the surrounding atoms.
The atom-type library serves two main purposes: (i) it provides information about VDW and ionic radii as well as about atoms' hydrogen-bonding capability that is used to define non-bonding interactions in the course of refinement and (ii) it provides information about initial bond lengths and angles when new monomer entries are created.
Generation of restraints for crystallographic refinement requires that all monomers present in the file which represent the structure to be refined are completely described.
In addition to the REFMAC5 dictionary, we have developed tools that allow the generation of complete monomer descriptions when these are not available. This is typically the case when new monomers, i.e. monomers not present in the dictionary, or when dictionary entries stored only with a minimal descriptions are encountered. The most usual new monomers are ligands that are not commonly found in macromolecular crystallographic analysis, which are of interest principally in the context of particular crystallographic projects. The program LIBCHECK is part of the CCP4 package and can be used to create complete monomer dictionary entries. LIBCHECK is best used through its graphical front-end SKETCHER (Potterton et al., 2003), which is part of the CCP4i interface.
A complete monomer description can be generated once a minimal description is available. The algorithm used in this procedure is described in §5.3. Users have two main ways to create minimal descriptions for new monomers: (i) from their chemical structure and (ii) from their Cartesian coordinates.
If the only reliable information available about a new monomer is its chemical structure, the best way to create its minimal description is with the aid of the program SKETCHER. The monomer is simply drawn specifying its constituting atoms and bonds (Fig. 5a). The information represented graphically is essentially the minimal monomer description. Using this information, LIBCHECK creates the complete monomer description using the algorithm described in §5.3. If desired, REFMAC5 can be invoked to idealize the structure. After the optimization, all information returns to SKETCHER and is displayed (Fig. 5b). The user can therefore check whether the desired description has been produced.
Although the primary purpose of LIBCHECK is to generate dictionary entries from which restraints for refinement can be created, this program can also be utilized to generate monomer Cartesian coordinates. This is particularly useful when a set of initial coordinates is required for model building. The program LIBCHECK can be invoked from the graphical interface SKETCHER to generate monomer Cartesian coordinates from the complete monomer description. As this description contains all necessary information, it is perfectly suited for coordinate generation. To this end, the Z-matrix representation of the monomer is used. This representation is commonly encountered in computational chemistry (Leach, 1997) and is closely related to the internal representation of monomers.
In general, the Z matrix contains 3N − 6 parameters, where N is the number of atoms present in the monomer. This matrix does not define the orientation or rotation of the monomer. The first atom of the monomer is arbitrarily placed at the origin of the Cartesian coordinates system. For the second atom, only the bond length with respect to the first atom is stored. This atom can therefore be given any triplet of coordinates that satisfies the known bond length. For the third atom, the bond length with respect to the second atom and the bond angle formed with the first two atoms are given. Any position that satisfies these two parameters can be given to the third atom. Starting from the fourth atom, all atoms are defined by bond lengths, bond angles and torsion angles.
If reliable Cartesian coordinates are available for a monomer, its minimal description can be extracted from them. To this end, the monomer-connectivity graph is first derived from atom coordinates. Two atoms are considered bonded if their distance is shorter than a certain threshold which is element-dependent. If atoms belong to the group (C, N, O, S, P, B, I, Cl) they are considered bonded if their distance is within 1.8 Å. If one atom belongs to the above list and the other is a heavy atom they are considered bonded if their distance is shorter than 2.3 Å. If atoms are heavy atoms they are considered bonded if their distance is within 2.8 Å. H atoms are considered bonded to any atom if their distance is shorter than 1.2 Å.
Bond orders and an approximation to atom types are then defined iteratively. The procedure is schematically as follows.
Once all bond orders have been defined, a complete monomer description can be created according to the algorithm presented in the following section. It is important to stress that monomer descriptions should be derived from Cartesian coordinates only when these are of high quality. Small errors in bond lengths affect the derived bond orders and thus the final description of the monomer. Reliable coordinates for small molecules can be obtained from the Cambridge Structural Database (CSD; Allen, 2002) if their crystallographic structures have been determined. Alternatively, if only approximate coordinates are available these can be improved using molecular-mechanics or quantum-chemical geometry-optimization tools available within various packages.
Once a minimal monomer description is at hand, it can be used to derive the complete description. In brief, the algorithm used for this purpose is the following.
Two web resources can produce complete monomer descriptions compatible with REFMAC5. The first resource is hosted at the European Bioinformatics Institute (Golovin et al., 2004) and can be accessed at the web address https://www.ebi.ac.uk/msd-srv/chempdb/cgi-bin/cgi.pl . The second is the program PRODRG (van Aalten et al., 1996), which can be found at the web address https://davapc1.bioch.dundee.ac.uk/programs/prodrg/prodrg.html . Other programs, such as the CORINA suite (Sadowski et al., 1994) available from https://www2.chemie.uni-erlangen.de/software/corina/index.html , can also give coordinates that can be used to create REFMAC5 monomer descriptions. CORINA can be used with help of the CACTVS (Ihlenfeldt et al., 1994) interface.
Generation of restraints for the purpose of crystallographic refinement is an entirely automatic process conducted within REFMAC5 refinement runs when all monomers listed in a coordinate file are present in the dictionary with a complete description. This is the case when protein and nucleic acid structures as well as their complexes are refined. It is also the case when common sugars and organic/inorganic compounds are present in the structure. How to deal with ligands for which no dictionary entries are available and how to create complete descriptions for them has been described in §5
For restraints to be applied in refinement, the various monomers present in a coordinate file (typically in PDB format) need first to be recognized in the dictionary so that information can be taken from their complete descriptions; if minimal descriptions are available for some of them, complete descriptions can be created as described in §5.3.
Atom coordinates corresponding to the various monomers present in the file are used to create a calculated graphical representation of the monomers. These monomers are matched against those present in the dictionary. Matching is carried out at the level of monomer and atom names and on the basis of connectivity. If some atoms are missing in the input file, the current implementation assumes an inaccuracy in the input file. These atoms are simply flagged as missing atoms. If requested they can be restored.
Failure in monomer matching activates a quick search of the whole dictionary. A sketchy description of the algorithm used in this search is as follows.
In the current version of the algorithm the properties for the file monomer described above must be mirrored exactly by those of the dictionary monomer. In principle, softer criteria could also be used. This option is not yet available.
In case no dictionary monomer can be matched with the quick-search procedure, a more expensive global search can be performed. This search uses the Ullman's graph-matching algorithm (Ullman, 1976), which allows exact subgraph matching. If even the global search is unable to find a matching monomer in the dictionary the file monomer is treated as a `new monomer' and its complete description is generated from its coordinates as described in §5.2.
Monomers in the file are assigned to different chains depending on their group types (peptide, DNA/RNA, pyranose, non-polymer) and other criteria. In the case of polypeptides and DNA/RNA group types, chain-identification codes, consecutive residue numbering and optionally distance criteria are employed. If the latter criteria is used then consecutive monomers in the coordinate file are considered to be linked if their distance is shorter than a predefined value. For example, in the case of peptide monomers, monomers are considered to be linked if the distance between Ci and Ni+1 is less than 1.728 Å. Distance criteria are useful when dealing with insertions and deletions. If none of the criteria above are fulfilled, then a `gap' link is created. For saccharide-group types, distance and same chain-identification code criteria are used. Non-polymers cannot form chains.
Once chains have been identified, the program applies standard links according to the chain's group types. For example, for peptide chains trans/cis-peptide links are defined according to the ω angle. If the angle is between 80 and 280° the trans conformation is assumed. Similarly, standard links for polysaccharides and DNA/RNA are also defined.
Once chains have been defined, standard links such as disulfide bridges and sugar–protein links are automatically activated on the basis of distance criteria. If user-defined links are found in the header of the coordinate file, these are used. Distance criteria are also used to recognize potential links that are not part of the dictionary. For example, if two atoms are found to be very close, a potential link between them is proposed. By default, links determined in this way are not used unless explicitly requested.
After all standard links and modifications have been defined, other links, such as links between alternative conformations and symmetry-related atoms, are dealt with. By default, monomers in alternative conformation are assigned the same links. For example, if two cysteine residues (Cys1 and Cys2) are present in double conformation (A and B), by default both will be considered involved in SS links even if only one pair is engaged in a disulfide bridge according to distance criteria. `S' links will be formed between the SG atoms of ACys1 and ACys2 and between the SG atoms of BCys1 and BCys2, respectively. The user has the possibility of modifying the default assignments, as shown in Fig. 7. The same figure shows an example of how links between between symmetry-related elements are defined.
Links between alternative conformation are a flexible tool that can handle complex situations. For example, in the crystal structure of β-mannanase in complex with 2,4-dinitrophenyl 2-deoxy-2-fluoro-β-mannotrioside the unhydrolysed substrate and a covalent intermediate are present simultaneously (Ducros et al., 2002; Fig. 8). In the unhydrolysed form of the substrate the sugar moiety MAF is bound to the BEN compound, whereas in the covalent intermediate MAF is bound a glutamate residue. In the intermediate the CD—OE1 and CD—OE2 bonds of the glutamate are no longer equivalent. Once the descriptions of all links and modifications required have been defined, links between alternative conformations can handle this complicated type of situation (see Fig. 7).
A flexible machine/human-readable dictionary of monomers, links, modification and related items has been created and tested on a wide range of compounds. The dictionary is currently used for macromolecular restrained refinement by the program REFMAC5. It can also be used by other macromolecular programs such as model-building and macromolecular-modelling and simulation applications.
Flexibility in the organization of the dictionary allows researchers to add personal entries and to override existing descriptions. The most common crystallographic restraints are dealt with in an automatic manner. Complicated cases can also be handled with some user intervention.
Tools and algorithms have been developed to update and add new entry descriptions semi-automatically. If reliable coordinates are available, `ideal' restraints can be extracted from them. When monomer descriptions are created from chemical structures, target values for restraints are taken from a built-in atom-type library. Currently, work to improve restraints using Cambridge Structural Database tools and quantum-chemical calculations is being considered.
Although addition of new entries is at present fairly automatic, the generation of links and modifications requires user intervention. Automatization of the latter process requires a database of possible reactions encountered in macromolecules. Link descriptions allow not only the definition of covalent bonds but also of other bonds. The automatic handling of Watson–Crick restraints between base pairs is currently under development. Future versions of the dictionary will contain tools to use popular computational chemical file formats such as SMILES (Weininger, 1988) and MDL MOLFILES (Dalby et al., 1992).
The dictionary is distributed by CCP4 under the Part 0 licence that is LGPL-compatible. Programs and interface are available from CCP4 under the Part 2 licence. Neither programs nor dictionary nor algorithms have been patented in order to make sure that they are available to users as well as to the developer community.
‡Current address: IFOM – The FIRC Institute of Molecular Oncology, Via Adamello 16, 20139 Milano, Italy
This work was supported by grants from the Wellcome Trust (GNM, RAS), BBSRC (AAV, AAL) and CCP4 (LP, SM, FL). We thank people from the YSBL, CCP4 staff and the user community for their continuous support and encouragement.
Aalten, D. van, Bywater, R., Findlay, J., Hendlich, M., Hooft, R. & Vriend, G. (1996). J. Comput. Aided Mol. Des. 10, 255–262. CrossRef PubMed Google Scholar
Allen, F. H. (2002). Acta Cryst. B58, 380–388. Web of Science CrossRef CAS IUCr Journals Google Scholar
Allen, F., Kennard, O., Watson, D., Brammer, L., Orpen, A. & Taylor, R. (1992). International Tables for Crystallography, Vol. C, edited by A. J. C. Wilson, pp. 685–706. Dordrecht: Kluwer Academic Publishers. Google Scholar
Allinger, N. (1977). J. Am. Chem. Soc. 99, 8127. CrossRef Web of Science Google Scholar
Berman, H. M., Battistuz, T., Bhat, T. N., Bluhm, W. F., Bourne, P. E., Burkhardt, K., Feng, Z., Gilliland, G. L., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D., Ravichandran, V., Schneider, B., Thanki, N., Weissig, H., Westbrook, J. D. & Zardecki, C. (2002). Acta Cryst. D58, 899–907. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542. Google Scholar
Bourne, P., Berman, H., McMahon, B., Watenpaugh, K., Westbrook, J. & Fitzgerald, P. (1997). Methods Enzymol. 277, 571–590. CrossRef PubMed CAS Web of Science Google Scholar
Bricogne, G. (1997). Methods Enzymol. 276, 361–423. CrossRef CAS Web of Science Google Scholar
Brooks, B., Bruccoleri, R., Olafson, B., States, D., Swaminathan, S. & Karplus, M. (1983). J. Comput. Chem. 4, 187–217. CrossRef CAS Web of Science Google Scholar
Brünger, A. T. (1992). X-PLOR Manual, Version 3.1. Yale University, New Haven, USA. Google Scholar
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, N., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals Google Scholar
Cahn, R., Ingold, C. & Prelog, V. (1966). Angew. Chem. 78, 413–447. CrossRef Google Scholar
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals Google Scholar
Cotton, F. & Wilkinson, G. (1972). Advanced Inorganic Chemistry. New York: Interscience. Google Scholar
Dalby, A., Nourse, J., Hounshell, D., Gushurst, A., Grier, D., Leland, B. & Laufer, J. (1992). J. Chem. Inf. Comput. Sci. 32, 244–255. CrossRef CAS Google Scholar
Diamond, R. (1971). Acta Cryst. A27, 436–452. CrossRef CAS IUCr Journals Web of Science Google Scholar
Ducros, V., Sechel, D., Murshudov, G., Gilbert, H., Szabo, L., Stoll, D., Withers, S. & Davies, G. (2002). Angew. Chem. Int. Ed. 41, 2824–2827. CrossRef CAS Google Scholar
Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392–400. CrossRef CAS Web of Science IUCr Journals Google Scholar
Golovin, A. et al. (2004). Nucleic Acids Res. 32, D211–D216. Web of Science CrossRef PubMed CAS Google Scholar
Greenwood, N. & Earnshaw, A. (1989). Chemistry of the Elements. Oxford: Pergamon Press. Google Scholar
Hall, S. (1991). J. Chem. Inf. Comput. Sci. 31, 326–333. CrossRef CAS Google Scholar
Hall, S., Allen, A. & Brown, I. (1991). Acta Cryst. A47, 655–685. CrossRef CAS Web of Science IUCr Journals Google Scholar
Ihlenfeldt, W., Takahashi, Y., Abe, H. & Sasaki, S. (1994). J. Chem. Inf. Comput. Sci. 34, 109–116. CrossRef CAS Google Scholar
IUPAC (1979). Nomenclature of Organic Chemistry, Sections A, B, C, D, E, F and H. Oxford: Pergamon Press. Google Scholar
Jack, A. & Levitt, M. (1978). Acta Cryst. A34, 931–935. CrossRef CAS IUCr Journals Web of Science Google Scholar
Jones, A. T., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110–119. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kennard, O. & Taylor, R. (1982). J. Am. Chem. Soc. 104, 3209–3212. Google Scholar
Konnert, J. & Hendrickson, W. (1980). Acta Cryst. A36, 344–350. CrossRef CAS IUCr Journals Web of Science Google Scholar
Leach, A. (1997). Molecular Modelling: Principles and Applications. Singapore: Longman. Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
Orpen, A., Brammer, L., Allen, F., Kennard, O., Watson, D. & Taylor, R. (1992). International Tables for Crystallography, Vol. C, edited by A. J. C. Wilson, pp. 707–791. Dordrecht: Kluwer Academic Publishers. Google Scholar
Pearlman, D., Case, D., Caldwell, J., Ross, W., Cheatham, T. III, DeBolt, S., Ferguson, D., Seibel, G. & Kollman, P. (1995). Comput. Phys. Commun. 91, 1–41. CrossRef CAS Web of Science Google Scholar
Ponder, J. & Case, D. (2003). Adv. Protein Chem. 66, 27–85. CrossRef PubMed CAS Google Scholar
Potterton, E., Briggs, P., Turkenburg, M. & Dodson, E. (2003). Acta Cryst. D59, 1131–1137. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sadowski, J., Gasteiger, J. & Klebe, G. (1994). Chem. Inf. Comput. Sci. 34, 1000–1008. CrossRef CAS Web of Science Google Scholar
Saenger, W. (1983). Principles of Nucleic Acid Structure. Berlin: Springer–Verlag. Google Scholar
Terwilliger, T. C. (2003). Acta Cryst. D59, 1688–1701. Web of Science CrossRef CAS IUCr Journals Google Scholar
Ullman, J. (1976). J. Assoc. Comput. Mach. 23, 31–42. CrossRef Google Scholar
Waser, J. (1963). Acta Cryst. 16, 1091–1094. CrossRef CAS IUCr Journals Web of Science Google Scholar
Weininger, D. (1988). J. Chem. Inf. Comput. Sci. 28, 31–36. CrossRef CAS Web of Science Google Scholar
Westhof, E., Dumas, P. & Moras, D. (1988). Acta Cryst. A44, 112–123. CrossRef CAS Web of Science IUCr Journals Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.