research papers
AceDRG: a stereochemical description generator for ligands
aStructural Studies, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, England, and bInstitute of Biotechnology, Saulėtekio al. 7, LT-10257 Vilnius, Lithuania
*Correspondence e-mail: garib@mrc-lmb.cam.ac.uk
The program AceDRG is designed for the derivation of stereochemical information about small molecules. It uses local chemical and topological environment-based atom typing to derive and organize bond lengths and angles from a small-molecule database: the Crystallography Open Database (COD). Information about the states of atoms, whether they belong to small rings (up to seven-membered rings), ring aromaticity and nearest-neighbour information is encoded in the atom types. All atoms from the COD have been classified according to the generated atom types. All bonds and angles have also been classified according to the atom types and, in a certain sense, bond types. Derived data are tabulated in a machine-readable form that is freely available from CCP4. AceDRG can also generate stereochemical information, provided that the basic bonding pattern of a ligand is known. The basic bonding pattern is perceived from one of the computational chemistry file formats, including SMILES, mmCIF, SDF MOL and SYBYL MOL2 files. Using the bonding chemistry, atom types, and bond and angle tables generated from the COD, AceDRG derives the `ideal' bond lengths, angles, plane groups, aromatic rings and information, and writes them to an mmCIF file that can be used by the program REFMAC5 and the model-building program Coot. Other and model-building programs such as PHENIX and BUSTER can also use these files. AceDRG also generates one or more coordinate sets corresponding to the most favourable conformation(s) of a given ligand. AceDRG employs RDKit for chemistry perception and for initial conformation generation, as well as for the interpretation of SMILES strings, SDF MOL and SYBYL MOL2 files.
Keywords: AceDRG; refinement; ligand chemistry; Crystallography Open Database; RDKit.
1. Introduction
Macromolecular crystallography (MX) is the most widely used experimental technique in structural biology that allows the study of three-dimensional structures of macromolecules in atomic, and sometimes electronic, detail, which is an essential step in understanding biological processes. In recent years, single-particle cryo-EM has made substantial advances (Kühlbrandt, 2014) and thus is now being used routinely. Both techniques allow the derivation of snapshots of reactions or molecular binding processes. For this type of study, a structure of a single molecule is often not sufficient; it is more common to study structures of macromolecules in complex with small ligands mimicking intermediate states or close to a transition state. Moreover, the quality and quantity of the experimental data are often deficient (low resolution with small signal-to-noise ratio). This means that the data alone are not sufficient to derive chemically and structurally sensible atomic models; the data must be supplemented by prior knowledge pertaining to the chemistry and structure of the molecules under study in order to address the problem of missing high-resolution information (Murshudov et al., 2011; Nicholls et al., 2012; Schröder et al., 2010; Adams et al., 2010; Smart et al., 2012). Experimental data produced by MX and cryo-EM usually contain long-range information. As the resolution of the data increases, shorter and shorter-range information becomes available. Owing to the mobility of atoms and dynamic/static disorder, even at very high resolution electronic details are not visible, the signal is reduced and thus local resolution is reduced. Additional information is almost always needed. The most widely used information is that regarding the chemistry of bonds and angles (Vagin et al., 2004). This was recognized a long time ago, and has been used to stabilize atomic structure when only limited and noisy data are available. For amino acids and the `ideal' values have been tabulated a number of times (Engh & Huber, 1991, 2001; Parkinson et al., 1996). There are several good software tools designed for the derivation of accurate values for the bonds and angles in small molecules (Moriarty et al., 2009; Smart et al., 2011; Schüttelkopf & van Aalten, 2004). These are either based on molecular-mechanics force fields, Mogul (Bruno et al., 2004) from the Cambridge Structural Database (CSD) or semi-empirical quantum-chemical (QM) calculations (Rocha et al., 2006). Programs such as LIBCHECK (Vagin et al., 2004) and JLigand (Lebedev et al., 2012) available from CCP4 (Winn et al., 2011) can also be used to generate ligand descriptions with sufficient quality.
Information regarding the local geometry of a small compound can be derived using two different approaches.
Although the number of structures (around 367 000) in the COD (Gražulis et al., 2012) is almost three times lower than that (around 900 000) in the CSD (Groom et al., 2016), one main advantage of the COD is that it is free, in the sense that all its data have been placed in the public domain by the COD contributors, and derived data can be freely distributed. Therefore, testing developed algorithms using the COD is relatively easy. However, the developed algorithm and its implementation in AceDRG is such that any source of reliable data, including the CSD or high-level QM-derived structures, can be used to regenerate/supplement the existing database of atom types, bonds and angles. Moreover, the CSD already offers very good state-of-the-art tools for the derivation of ligand descriptions based on entries in the CSD, specifically Mogul (Bruno et al., 2004). It should be noted that phenix.elbow (Moriarty et al., 2009) from the PHENIX software suite (Adams et al., 2010), grade (Smart et al., 2011) from Global Phasing and pyrogen from Coot (Emsley et al., 2010) use Mogul to generate accurate ligand descriptions. We decided to develop an alternative algorithm to derive bonds and angles from the COD and generate ligand descriptions. In designing the algorithms and software, we were mindful that the database should be dynamically extensible, i.e. as the number of small molecules increases, or new sources of small-molecule structures become available, this database can be updated with little effort.
The Protein Data Bank (PDB; Berman et al., 2002) is a rich source of information about structures and macromolecular chemistry. The wider community of biologists often use the entries deposited in the PDB without having much background in structural biology. Therefore, it is necessary to make sure that the entries deposited are of sufficient reliability and accuracy, and that they are consistent with the experimental data as well as with prior chemical and structural information. The PDB has done excellent work in the organization of data, including that pertaining to ligand chemistry (Dimitropoulos et al., 2006; Feng et al., 2004). However, despite the efforts by the wwPDB, there are still a number of errors in the PDB, especially regarding ligands (Pozharski et al., 2013; Weichenberger et al., 2013). There have even been some claims that most of the errors in the PDB are owing to errors in ligands (Liebeschuetz et al., 2012; Reynolds, 2014). Most of the errors can be attributed to overinterpretation and misinterpretation of the electron density, with the experimenter having a strong desire to see ligand electron density, which is often the focus of studies involving ligand–protein complexes. However, the number of errors owing to the inaccurate chemical description of ligands is not negligible. In general, it would be very hard to describe ligand geometry if incorrect ligand chemistry is assumed. However, it is possible to reduce such errors by accurately designing a software program with some chemical intelligence. AceDRG is designed to reduce such errors, giving a sufficiently accurate ligand description and thus helping to reduce the number of errors in the PDB.
The current version of AceDRG makes extensive use of tools available in the computational chemistry toolkit RDKit (https://www.rdkit.org).
Organization of this paper. In §2, we briefly introduce the program AceDRG. We then describe atom types including ring and aromaticity perception in §3. In §4, we describe the organization of derived atom-type, bond and angle tables. In §5, we describe the derivation of stereochemical information about ligands. §6 gives several examples of application. Finally, we summarize the current state and give our views on future perspectives. This paper attempts to describe the algorithms implemented in the program AceDRG. The source code is available from CCP4 (Winn et al., 2011) under the LGPL license; further details can be found in the code and documentation.
2. AceDRG
AceDRG is a multifunctional software tool that analyses molecules in small-molecule databases (currently only the COD), extracts all atom types, bond lengths and angles from those databases, and organizes them in a hierarchical manner. It reads an input file containing basic chemical information about a ligand, such as a bonding graph and stereochemistry. It derives atom types from the bonding graph and maps them to those extracted from the small-molecule database. It can also generate one or more coordinate sets corresponding to energetically favourable conformation(s) of ligands.
3. Atom types
The atom typing used in AceDRG encapsulates the local topological and chemical environments of atoms. This includes the atom's number of bonds and those of its neighbours (up to the third neighbours) and, if they belong to ring(s), information regarding ring size and aromaticity. The current algorithm only considers the extended organic set of atoms: B, C, N, O, S, P, Se, F, Cl, Br, I and H. These atoms cover 93% of the chemical entities contained in the PDB. Dealing with metals requires a different approach; they will be dealt with in the future.
Since the
state of atoms and the number, size and aromaticity of rings play essential roles in atom-type definitions, we shall describe them next.3.1. Hybridization
. In brief, is defined by the difference between the atomic valence and the number of connections. For example, a C atom with four bonds is always sp3, one with three bonds is sp2 and one with two bonds is sp1 (sp). By default, N and B are sp3 if they have three or four bonds, and sp2 if they have two bonds. Formal charges are also assigned during default assignment. For example, if N has four bonds then its formal charge is +1 and if B has four bonds then its formal charge is −1 (i.e. the difference between the valence and the number of connections). See Table 1 for more details.
perception for atoms is performed in several steps. In the first step, for each atom the default state is assigned using the rules described in Table 1
|
For C atoms the default rules are sufficient. However, for some other atoms, such as N, B and O, further
is needed.If an atom is N or B and it has three connections then its et al., 2017), then we refine the state as follows. If three atoms are connected to the target atom then three vectors are formed. If all three vectors are co-planar (i.e. they are on the same plane) then the target atom is considered to be sp2. Co-planarity of vectors is equivalent to the statement that one of the vectors is perpendicular to the normal of a plane formed by the remaining two vectors. If abs(θ − 90°) < c then two vectors are considered perpendicular, where θ is the angle between the third vector and the normal of the plane formed by the two remaining vectors. The current value of the parameter c is 5°, although it can be readjusted if necessary. This approach was found to be useful when classifying all atoms from the COD. When working with only a bonding graph, such as from a SMILES string (Weininger, 1988; Weininger et al., 1989), then this approach is not applicable.
state is revised according to the local chemical environment. If accurate atomic coordinates are available for a particular molecule, for example those from the COD after validation (LongIf there are no reliable coordinates available then the decision regarding
state is made according to the local environment of the atom (this part of the algorithm applies for N and B with three bonds).sp3. If both connected atoms are not H atoms, and at least one of them has default sp2 then the O atom is considered to be sp2.The
states of H and halogen atoms are always set to `none', as they do not affect the atom types of connected atoms and therefore their classification.3.2. Ring perception
The set of smallest rings is determined using a modified version of the algorithm described by Downs et al. (1989). The articles by Figueras (1996), Hanser et al. (1996) and Leach et al. (1990) were also consulted. Since the rings are used as part of atomic classification in AceDRG, we need only obtain information regarding any rings containing the atom being classified. Moreover, we need only consider rings with limited size; in the current version we only use rings containing up to seven atoms. The algorithm can be considered as a limited depth-first search algorithm. The depth of the search depends on the maximum ring size to include. However, the algorithm is flexible enough to be extended to larger ring sizes. The algorithm is as follows.
This algorithm finds all small cycles up to a given size. Moreover, the algorithm also gives the list of atoms belonging to the same ring.
3.3. Aromaticity
A ring or fused-ring system is considered to be aromatic if all atoms belonging to the system are in the sp2 state and the number of π electrons obeys Hückel's 4n + 2 rule (Coulson et al., 1978), where n is an integer. Table 2 describes the number of π electrons that each atom contributes to the total count. The algorithm and the table are extended versions of those described by Balaban (1985). In addition to the rules described by Balaban (1985), we add several rules for S, P and O atoms (Wilberg et al., 2001; Chivers, 2005; Chivers & Manners, 2009; Krygowski et al., 2009; Fowler et al., 2004). The whole fused-ring systems are considered first. If the whole system obeys Hückel's rule then the whole system, and thus each ring in the system, is considered to be aromatic, even though some of the contributing rings may not obey this rule. If the whole system does not obey the 4n + 2 rule then each of the smallest rings in the system are considered one by one. If any of the smallest rings obey this rule then it is considered to be aromatic.
|
3.3.1. Example: FAD and FDA
Flavin adenine dinucleotide (FAD) is a redox cofactor, and in many biological processes it is converted to dihydroflavine-adenine dinucleotide (FDA) by accepting two electrons and two protons (Fig. 2). Both FAD and FDA contain fused three-ring systems: flavin groups in oxidized (FAD) and reduced (FDA) forms. However, these fused systems are very different. In FAD all three rings are contained in one aromatic system. In FDA the outer rings are aromatic, whilst the middle ring is not. As a result, the flavin ring plane can be bent more in FDA than in FAD (Walsh & Miller, 2003).
According to the π-electron count in the flavin of FAD there are 14 π electrons, making it aromatic (14 = 4n + 2, with n = 3). In FDA the number of π electrons in the ring system is 16 (16 = 4n with n = 4). Therefore, the flavin of FDA is considered to be anti-aromatic. In FDA the outer rings both contain six π electrons, making both of them aromatic rings, whilst the middle ring contains eight π electrons and thus is not considered to be aromatic.
Fig. 3 shows different numbers of π electrons in a series of sulfur–nitrogen rings, according to the rules shown in Table 2. These are in agreement with suggestions made in other studies (Wilberg et al., 2001; Fowler et al., 2004; Chivers, 2005; Chivers & Manners, 2009).
3.4. Atom types
Once the bonding graph, atom atom C23 is in the class with identifier code C[5,6a](C[5,5]C[5,5]C[5,6]H)(C[5,6a]C[6a]C[5])(C[6a]C[6a,6a]H){1|O<1>,2|C<4>,2|H<1>,4|C<3>}.
ring membership, size and aromaticities are known then the atoms can be classified using their local topological and chemical environments. For example, in Fig. 4This means that the original atom is a C atom and it belongs to a five-membered non-aromatic and a six-membered aromatic ring (represented by C[5,6a]). It has three first neighbours. The first of those neighbours is a C atom, which belongs to two five-membered rings. This neighbour has three second neighbours: one of them is a C atom belonging to two five-membered rings, the next is also a C atom belonging to five and six-membered rings, and the third is an H atom. Obviously, this first-neighbour atom also connects to the original atom. Similarly, the second first neighbour of the original atom is a C atom belonging to two rings: one five-membered non-aromatic ring and one six-membered aromatic ring. This atom also has two additional neighbours: a C atom belonging to a five-membered non-aromatic ring and a C atom belonging to a six-membered aromatic ring. The third first neighbour of the original atom is a C atom in a six-membered aromatic ring with two additional neighbours: a C atom in two six-membered aromatic rings and an H atom. Finally, the third-neighbour composition of the original atom is as follows: an O atom with one bond, two C atoms with four bonds, two H atoms with one bond and four C atoms with three bonds.
Evidently, each atom class encodes its local chemical environment. The number of such atom classes derived from the COD is around 260 000. Since the space of atom classes is very large, if not infinite, it can be expected that some atom types for a new ligand might not be in the list of atom types derived from the COD. One must remember that the purposes of small-molecule and macromolecular crystallography are very different, and thus it can be expected that they have a tendency to target different types of chemical compounds in their studies. Therefore, the probability of a given atom class being absent from the COD, or any other large database of small molecules, is not negligible. Consequently, it is necessary to have some generalization of atom classes. In other words, we need to be able to reduce the information encoded in the atom types in a way that does not lose too much information. The generalization used depends on particular bonds and angles; these are described in the next section.
4. Tables of bonds and angles
Once all atom types have been identified and classified, AceDRG creates and organizes tables pertaining to bonds and angles. Since the number of potentially different atom types is infinite, it is possible for a pair of atom types in a given compound, as defined above, to not be in the list of bonds. Therefore, we need well organized tables of atom types, bonds and angles for the fast and efficient searching of exact atom types as well as fast generalization, if and when needed.
The bond tables are organized in a hierarchical manner, with seven levels with various generalizations and fine-tuning. We refer to each level as the `generalized' atom types. These levels are (i) hash code, (ii) combination of
states of atoms, (iii) information about inter-ring and intra-ring bonds, (iv) first-neighbour connections, (v) details about the first-neighbour connections, (vi) atom types without third-neighbour information and (vii) the full atom types. The first level, hash code, encodes basic properties of the individual atom types. The second and third levels contain information about the bonds. The remaining levels comprise properties of the atom types.The hash codes encode essential chemical properties of the atoms. Each property is defined as an integer number referring to the position of the atom in the property list. These include (i) the position of the element in the periodic table, i.e. an integer representation of element names, (ii) the number of connections, (iii) the size of the smallest ring that the atom belongs to and (iv) whether the atom is part of an aromatic ring. If required, other chemical properties can be added. The hash level can already be considered to be a relatively fine-grained atom typing; the number of AceDRG hash-level atom types (around 180) is more than that in the REFMAC energy library (around 100) for the same extended organic set. One advantage of this hash-code-level atom typing over the current REFMAC energy library is that it uses a constructive algorithm, allowing it to be extended easily by adding more chemical information.
4.1. Bonds
The bond tables are organized to facilitate fast searching for atom-type pairs and, if a given pair is not in the table, then to quickly find a reasonable approximation to the atom-type pair, and thus bond lengths. Each line corresponds to a bond record, which comprises the following information about the bonded atom-pair type.
|
At each level, the average bond length, standard deviation and the number of observations are stored. Note that the number of observations and standard deviations are used in further decision-making.
4.2. Searching for bond values in the AceDRG tables
Searching the table (Long et al., 2017) for a given pair of atom types is performed level by level. If an exact match is found, and the number of observations at this level is more than four, then the corresponding bond length and standard deviation are taken. If not then we repeat the search at a higher level. At each level, we check the number of observations used to calculate the mean bond length and standard deviation. If the standard deviation is more than 0.03, or the number of observations is less than four, then we go to the next level. Otherwise, we accept the mean bond length and standard deviation from this level.
If no candidate entries are found that satisfy these two conditions up to the hash level, we select the lowest level with more than four observations. This applies to all levels where no matching of `generalized' atom types happens. If there is no match of `generalized' atom types, even at the hash level, then we use atom types from the REFMAC energy library and use the corresponding simplified bond lengths as fall-back values. In this case, the standard deviation is assigned to be 0.02. It should be noted that in the test of 9000 ligands from the Chemical Component Dictionary (CCD) of the PDB we have not seen a single case where the use of REFMAC energy types is necessary.
5. Ligand description and coordinate generation
Fig. 5 shows a flow chart describing the derivation of stereochemical information and coordinate set(s) using basic chemistry as input. The workflow is relatively simple and comprises four steps.
5.1. Chemistry sanitization
AceDRG first uses RDKit to sanitize the molecule, making sure that it is consistent with basic chemistry, for example that the numbers of connections and valences are consistent. Then, using functional groups, it assigns formal charges to atoms of groups such as carboxyl, amine, sulfate and phosphate groups. In total there are 25 functional groups used by AceDRG at the moment. The number of functional groups can be extended without difficulty.
5.2. Planes
If an atom is in the sp2 state then it, together with all atoms that it is bonded to, are assumed to be on the same plane. If an individual ring is aromatic, all atoms in the ring and their connected outside-ring atoms are in a plane. If a fused multiple-ring system is aromatic then all atoms in each of the smallest rings, together with the atoms that they are bonded to, are considered to be on the same plane. This allows some deformation of large planar systems, such as flavin rings, during if the experimental data are sufficiently strong to indicate that there must be a departure from planarity. However, all atoms of the smallest rings will try to stay on the same plane.
5.3. Chiralities
Just like in a SMILES string, the chiral centres in the monomer library (generated by AceDRG) are local chiralities. That is, if the central atom is sp3 and the number of bonded non-H atoms is not less than three then the atom is considered to be a chiral centre. If the Cahn–Ingold–Prelog (CIP) priorities of at least two atoms (lone pairs of electrons are considered to be dummy atoms) are the same, or the input file does not have any indication of chiral centres (by coordinates or otherwise), then the sign of the chiral volume is assumed to be `both', indicating that at least two atoms bonding to the central atom can swap places without changing stereochemistry. In some cases, chiral volume signs can be assigned even for nonchiral centres. This can be useful because the atom names in the PDB file make nonchiral centres chiral by nomenclature. If the CIP priorities of atoms bonded to the central atom are different, or the input file indicates that this centre must be chiral with definite sign, then the program considers this centre as a genuine chiral centre with definite sign.
6. Examples of application
Here, we use two examples from the PDB to demonstrate AceDRG-generated dictionary values in practice. In general, the bond lengths and angles generated by AceDRG seem to be reasonably accurate (Tucker & Steiner, 2017). The first example aims to demonstrate that although the bond values generated from AceDRG are more accurate, and thus the refined structure should in principle be better in terms of chemical structure, the differences between structures refined using different dictionary values are so small that they are barely visible by eye and are unlikely to cause incorrect biological conclusions. The second example demonstrates the importance of aromaticity perception, and how it may affect inferred biological conclusions.
6.1. Example 1: PDB entry 3o8h, ligand name O8H (Willand et al., 2010)
The electron density corresponding to the ligand (Fig. 6) is of sufficient quality, with the exception of the iodinated benzene ring (this might be owing to radiation damage resulting in partial cleavage of the I atom, causing slight disorder of the benzene ring). Fig. 6 demonstrates that AceDRG perceives the aromaticity of the rings correctly. The bond distance between N21 and N22 in the PDB file is around 1.22 Å, which is shorter than it should be. The corresponding AceDRG-derived bond length is around 1.32 Å (for the full dictionary, see Supporting Information), which seems to reflect the fact that this ring is aromatic and the bond length is longer than a double bond (around 1.24 Å) but shorter than a single bond (around 1.41 Å). Unfortunately, with current PDB entries it is impossible to compare AceDRG-derived dictionary values (or values produced by any other software) with those used during the analysis of the PDB structures. Nevertheless, despite the apparently large differences between the bond lengths, the overlaid ligands before and after with AceDRG dictionary values show very little visible difference.
6.2. Example 2: FAD versus FADH2, PDB entry 3hdy, ligand name FDA
As is well known, FAD/FADH2 conversion plays an important role in many biological processes. However, there seems to be a great deal of confusion in labelling and refining this cofactor. One of the problems is that in many calculations the flavin group is assumed to be a flat plane. However, even in FAD the flavin can be bent, although not as much as in FADH2. There is also a half-oxidized state of the flavin moiety that is usually not considered in detail. The reason for this is that the half-oxidized state is an intermediate between the fully reduced and fully oxidized states, and the probability of observing this state in isolation is very small. However, if the structural environment is favourable then the half-oxidized state could be stabilized. In general, while testing various ligands we came to the conclusion that there needs to be some initiative similar to PDB_REDO (Joosten et al., 2012) to reanalyse all ligands in the PDB. The most challenging part of such a project would be the analysis of the stability of compounds in isolation and in the structural environments that they are in. FAD is one of the examples that requires special attention.
PDB entry 3hdy (Partha et al., 2009) contains several ligands, including FAD and FDA, representing FAD and FADH2. Our focus is only on FDA. Fig. 7 shows the geometry and the electron density before and after using AceDRG dictionary values (for the full dictionary, see Supporting Information). It is evident that after the flavin plane becomes flatter, and deformation of the plane is smooth over the whole flavin moiety. Analysis of the electron density and ligand alone cannot give a definite answer about the of the ligand; one would need to use other complementary techniques for this. However, electron density and ligand geometry, if handled with care, can become crucial pieces of evidence suggesting favourability of one or another state.
7. Conclusions and future perspectives
The program AceDRG has been designed to extract and organize atom types from small-molecule databases. The current version uses the freely available COD, although the algorithms and implementations are flexible, and any source of reliable small-molecule coordinate sets can be used to supplement/update/replace the relevant tables.
Tests show that AceDRG works reasonably well for a large class of cases without metals. However, there are still problems with some of the cases. One case to note is N with three connections where one of the atoms it is bonded to is sp2. Owing to similarities in electronic structures, we expect B to exhibit a similar type of behaviour. By default these atoms are considered to be sp2, with some correction added in order to account for the local environment. In many cases, we can make decisions regarding the states of N using the local environment. However, there are a number of cases where it is hard, if possible at all, to make such decisions. One can imagine cases where the same type of N atom with similar covalent environments might have sp2 or sp3 depending on their environment. The N atom within the piperidine group is one such example. If this N atom is bound to an sp2 C atom then it can be in the sp2 or sp3 state. Moreover, there may be cases where the state of N could be an intermediate between sp2 and sp3. To deal with such cases, small-molecule databases such as the COD would need to be analysed and such cases identified. Subsequently, the state and thus the geometric parameters of the whole compound would need to be adjusted depending on the environment.
Metal-containing compounds pose special problems. There are a multitude of problems that need to be contemplated carefully and dealt with robustly before we can claim that we can deal with metal-containing compounds for MX and cryo-EM fitting.
By default the current version of AceDRG presumes that the pH of the environment of the compound is 7.0, but this can be overridden by the user. This approach covers a sufficiently large class of problems. However, one can imagine cases where the local environment of a ligand is different and the same ligand can exist with different protonation states in different environments. If there are only one or two protonation states then such cases can be tabulated and the decision as to which ligand geometry definition to use can be made during model building and However, if the number of protonation states is very large then a better approach could be interactively changing the protonation states of particular regions of a ligand during model building. This would require interaction between model-building programs (e.g. Coot) and ligand description-generator programs (e.g. AceDRG). This may allow sufficient flexibility, although it requires the user to have sufficient knowledge about chemistry. If such an approach is to be used then the programs should be able to guide users by suggesting the best possible protonation states of particular regions in a particular environment.
AceDRG assumes that each tautomer is one independent ligand. If the number of different states is small then it would be possible to generate descriptions of all tautomers and use them as and when they are needed. However, if the number of such states is very large then the interaction between Coot and AceDRG must be designed so as to decide the best possible tautomers depending on the environment. Again, there must be certain chemical intelligence in the model-building program (Coot) in order to suggest the best state consistent with the current environment of the ligand.
One of the problems that has not been dealt with here is the position of H atoms. It is unlikely that positions of the H atoms in the models from the COD (and from the CSD) have sufficient accuracy, unless experiments are based on neutron diffraction. In many cases H atoms are added in their riding positions (Sheldrick, 2008), and thus H atoms in the COD (and the CSD) are unlikely to reflect the observations alone; they reflect the prior knowledge regarding chemistry used by the programs generating it, and only to some degree the experimental data. Even if H atoms have been refined using experimental data alone, it is unlikely that their positions can be considered to be particularly accurate; if neutron diffraction is used then H-atom positions will reflect the positions of protons. If X-ray diffraction is used then H atoms should reflect the positions of electrons; even at very high resolution these positions are much less accurate than those of heavier atoms. In general, we need to consider X-ray, neutron and electron diffraction experiments: X-rays see electrons, neutrons see nuclei positions and electrons see both. Thus, in future updates of the dictionary of monomers we will need to consider all of these cases. Perhaps we will need to carry out high-level QM calculations for a small set of molecules in order to derive proton and electron positions for various atom types. Even this will not be a complete solution for the hydrogen problems: the position and electron density around H atoms may depend on their environments.
AceDRG is a standalone program, distributed by CCP4, which can be used via the command line or embedded within a graphical user interface. Currently it does not have its own GUI. In future, programs such as JLigand (Lebedev et al., 2012) and Lidia (Emsley et al., 2010) will need to be adapted to make the program accessible to wider range of users.
Supporting information
AceDRG description of the ligand O8H. DOI: https://doi.org/10.1107/S2059798317000067/ba5260sup1.txt
AceDRG description of FDA (FADH2). DOI: https://doi.org/10.1107/S2059798317000067/ba5260sup2.txt
Acknowledgements
This work was supported by the Medical Research Council (grant No. MC_UP_A025_1012). RAN is funded by CCP4/STFC (grant No. PR140014). SG, AV and AM have received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 689868 and from the Research Council of Lithuania (grant No. MIP-025/2013, 2011–2013). AM was supported by the European Social Fund and Lithuanian state budget (grant No. VP1-2.2-ŠMM-09-V-01-003).
References
Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Balaban, A. T. (1985). J. Chem. Inf. Model. 25, 334–343. CrossRef CAS Google Scholar
Berman, H. M. et al. (2002). Acta Cryst. D58, 899–907. Web of Science CrossRef CAS IUCr Journals Google Scholar
Brown, I. D. (2009). Chem. Rev. 109, 6858–6919. Web of Science CrossRef PubMed CAS Google Scholar
Bruno, I. J., Cole, J. C., Kessler, M., Luo, J., Motherwell, W. D. S., Purkis, L. H., Smith, B. R., Taylor, R., Cooper, R. I., Harris, S. E. & Orpen, A. G. (2004). J. Chem. Inf. Comput. Sci. 44, 2133–2144. Web of Science CSD CrossRef PubMed CAS Google Scholar
Chivers, T. (2005). A Guide to Chalcogen–Nitrogen Chemistry. Singapore: World Scientific. Google Scholar
Chivers, T. & Manners, I. (2009). Inorganic Rings and Polymers of the p-Block Elements: From Fundamentals to Applications. London: Royal Society of Chemistry. Google Scholar
Clark, M., Cramer, R. D. & Van Opdenbosch, N. (1989). J. Comput. Chem. 10, 982–1012. CrossRef CAS Web of Science Google Scholar
Coulson, C. A., O'Leary, B. & Mallion, R. B. (1978). Hückel Theory for Organic Chemists. London: Academic Press. Google Scholar
Dalby, A., Nourse, J. G., Hounshell, W. D., Gushurst, A. K. I., Grier, D. L., Leland, B. A. & Laufer, J. (1992). J. Chem. Inf. Model. 32, 244–255. CrossRef CAS Google Scholar
Dimitropoulos, D., Ionides, J. & Henrick, K. (2006). Curr. Protoc. Bioinformatics, Unit 14.3. https://doi.org/10.1002/0471250953.bi1403s15. Google Scholar
Downs, G. M., Gillet, V. J., Holliday, J. D. & Lynch, M. F. (1989). J. Chem. Inf. Model. 29, 172–187. CrossRef CAS Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Engh, R. A. & Huber, R. (1991). Acta Cryst. A47, 392–400. CrossRef CAS Web of Science IUCr Journals Google Scholar
Engh, R. A. & Huber, R. (2001). International Tables for Crystallography, Vol. F, edited by M. G. Rossmann & E. Arnold, pp. 382–392. Dordrecht: Kluwer Academic Publishers. Google Scholar
Feng, G., Chen, L., Maddula, L., Akcan, O., Oughtred, R., Berman, H. M. & Westbrook, J. (2004). Bioinformatics, 20, 2153–2155. CrossRef CAS Google Scholar
Figueras, J. (1996). J. Chem. Inf. Comput. Sci. 36, 986–991. CrossRef CAS Google Scholar
Fowler, P. W., Rees, C. W. & Soncini, A. (2004). J. Am. Chem. Soc. 126, 11202–11212. Web of Science CrossRef PubMed CAS Google Scholar
Gražulis, S., Chateigner, D., Downs, R. T., Yokochi, A. F. T., Quirós, M., Lutterotti, L., Manakova, E., Butkus, J., Moeck, P. & Le Bail, A. (2009). J. Appl. Cryst. 42, 726–729. Web of Science CrossRef IUCr Journals Google Scholar
Gražulis, S., Daškevič, A., Merkys, A., Chateigner, D., Lutterotti, L., Quirós, M., Serebryanaya, N. R., Moeck, P., Downs, R. T. & Le Bail, A. (2012). Nucleic Acids Res. 40, D420–D427. Web of Science PubMed Google Scholar
Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. (2016). Acta Cryst. B72, 171–179. Web of Science CSD CrossRef IUCr Journals Google Scholar
Hanser, T., Jauffret, P. & Kaufmann, G. A. (1996). J. Chem. Inf. Model. 36, 1146–1152. CAS Google Scholar
Joosten, R. P., Joosten, K., Murshudov, G. N. & Perrakis, A. (2012). Acta Cryst. D68, 484–496. Web of Science CrossRef CAS IUCr Journals Google Scholar
Karunan Partha, S., van Straaten, K. E. & Sanders, D. A. (2009). J. Mol. Biol. 394, 864–877. CrossRef CAS Google Scholar
Krygowski, T. M., Cyrański, M. K. & Matos, M. A. R. (2009). Aromaticity in Heterocyclic Compounds. Berlin, Heidelberg: Springer-Verlag. Google Scholar
Kühlbrandt, W. (2014). Elife, 3, e03678. Google Scholar
Leach, A. R., Dolata, D. P. & Prout, K. (1990). J. Chem. Inf. Model. 30, 316–324. CrossRef CAS Google Scholar
Lebedev, A. A., Young, P., Isupov, M. N., Moroz, O. V., Vagin, A. A. & Murshudov, G. N. (2012). Acta Cryst. D68, 431–440. Web of Science CrossRef CAS IUCr Journals Google Scholar
Liebeschuetz, J., Hennemann, J., Olsson, T. & Groom, C. R. (2012). J. Comput. Aided Mol. Des. 26, 169–183. Web of Science CSD CrossRef CAS PubMed Google Scholar
Long, F., Nicholls, R. A., Emsley, P., Gražulis, S., Merkys, A., Vaitkus, A. & Murshudov, G. N. (2017). Acta Cryst. D73, 103–111. CrossRef IUCr Journals Google Scholar
McNicholas, S., Potterton, E., Wilson, K. S. & Noble, M. E. M. (2011). Acta Cryst. D67, 386–394. Web of Science CrossRef CAS IUCr Journals Google Scholar
Moriarty, N. W., Grosse-Kunstleve, R. W. & Adams, P. D. (2009). Acta Cryst. D65, 1074–1080. Web of Science CrossRef CAS IUCr Journals Google Scholar
Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nicholls, R. A., Long, F. & Murshudov, G. N. (2012). Acta Cryst. D68, 404–417. Web of Science CrossRef CAS IUCr Journals Google Scholar
Parkinson, G., Vojtechovsky, J., Clowney, L., Brünger, A. T. & Berman, H. M. (1996). Acta Cryst. D52, 57–64. CrossRef CAS Web of Science IUCr Journals Google Scholar
Pozharski, E., Weichenberger, C. X. & Rupp, B. (2013). Acta Cryst. D69, 150–167. Web of Science CrossRef CAS IUCr Journals Google Scholar
Reynolds, C. H. (2014). ACS Med. Chem. Lett. 5, 727–729. CrossRef CAS Google Scholar
Rocha, G. B., Freire, R. I., Simas, A. M. & Stewart, J. J. P. (2006). J. Comput. Chem. 27, 1101–1111. CrossRef CAS Google Scholar
Schröder, G. F., Levitt, M. & Brunger, A. T. (2010). Nature (London), 464, 1218–1222. Web of Science PubMed Google Scholar
Schüttelkopf, A. W. & van Aalten, D. M. F. (2004). Acta Cryst. D60, 1355–1363. Web of Science CrossRef IUCr Journals Google Scholar
Sheldrick, G. M. (2008). Acta Cryst. A64, 112–122. Web of Science CrossRef CAS IUCr Journals Google Scholar
Smart, O. S., Womack, T. O., Flensburg, C., Keller, P., Paciorek, W., Sharff, A., Vonrhein, C. & Bricogne, G. (2012). Acta Cryst. D68, 368–380. Web of Science CrossRef CAS IUCr Journals Google Scholar
Smart, O. S., Womack, T. O., Sharff, A., Flensburg, C., Keller, P., Paciorek, W., Vonrhein, C. & Bricogne, G. (2011). Grade v.1.1.1. Global Phasing Ltd, Cambridge, England. Google Scholar
Steiner, R. & Tucker, J. (2017). Acta Cryst. D73, 93–102. CrossRef IUCr Journals Google Scholar
Szabo, A. & Ostlund, N. S. (1989). Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory. New York: McGraw–Hill. Google Scholar
Touw, W. G., van Beusekom, B., Evers, J. M. G., Vriend, G. & Joosten, R. P. (2016). Acta Cryst. D72, 1110–1118. CrossRef IUCr Journals Google Scholar
Vagin, A. A., Steiner, R. A., Lebedev, A. A., Potterton, L., McNicholas, S., Long, F. & Murshudov, G. N. (2004). Acta Cryst. D60, 2184–2195. Web of Science CrossRef CAS IUCr Journals Google Scholar
Walsh, J. D. & Miller, A.-F. (2003). J. Mol. Struct. 623, 185–195. Web of Science CrossRef CAS Google Scholar
Weichenberger, C. X., Pozharski, E. & Rupp, B. (2013). Acta Cryst. F69, 195–200. Web of Science CrossRef CAS IUCr Journals Google Scholar
Weininger, D. (1988). J. Chem. Inf. Model. 28, 31–36. CrossRef CAS Web of Science Google Scholar
Weininger, D., Weininger, A. & Weininger, J. L. (1989). J. Chem. Inf. Model. 29, 97–101. CrossRef CAS Web of Science Google Scholar
Wilberg, E., Wilberg, N. & Holleman, A. F. (2001). Inorganic Chemistry. New York: Academic Press. Google Scholar
Willand, N., Desroses, M., Toto, P., Dirié, B., Lens, Z., Villeret, V., Rucktooa, P., Locht, C., Baulard, A. & Deprez, B. (2010). ACS Chem. Biol. 5, 1007–1013. CrossRef CAS Google Scholar
Winn, M. D. et al. (2011). Acta Cryst. D67, 235–242. Web of Science CrossRef CAS IUCr Journals Google Scholar
Zheng, H., Langner, K. M., Shields, G. P., Hou, J., Kowiel, M., Allen, F. H., Murshudov, G. N. & Minor, W. (2017). Acta Cryst. D73, https://doi.org/10.1107/S2059798317000584. CrossRef IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.