- 1. Introduction
- 2. Methods
- 3. Results and discussion
- 4. Examples of application
- 5. Conclusions and future perspectives
- A1. Mode identification and distance statistics
- B1. Maximum-likelihood estimation (MLE)
- B2. Symmetrized von Mises estimation
- B3. Algorithm for bond-angle estimation
- C1. Overview of Procrustes matching
- C2. Procrustes method formulation
- C3. Steps in Procrustes matching
- C4. Determining optimal correspondence by permutations
- C5. Resulting Procrustes distance
- D1. Extraction of the metal environment from the component dictionary and the macromolecular model
- D2. Matching the metal environment with the coordination library
- D3. Compiling statistics: bonds
- D4. Compiling statistics: angles
- Supporting information
- References
- 1. Introduction
- 2. Methods
- 3. Results and discussion
- 4. Examples of application
- 5. Conclusions and future perspectives
- A1. Mode identification and distance statistics
- B1. Maximum-likelihood estimation (MLE)
- B2. Symmetrized von Mises estimation
- B3. Algorithm for bond-angle estimation
- C1. Overview of Procrustes matching
- C2. Procrustes method formulation
- C3. Steps in Procrustes matching
- C4. Determining optimal correspondence by permutations
- C5. Resulting Procrustes distance
- D1. Extraction of the metal environment from the component dictionary and the macromolecular model
- D2. Matching the metal environment with the coordination library
- D3. Compiling statistics: bonds
- D4. Compiling statistics: angles
- Supporting information
- References
research papers
Improving macromolecular structure refinement with metal-coordination restraints
aInstitute of Molecular Biology and Biotechnology, Ministry of Science and Education, 11 Izzat Nabiyev, Baku, Azerbaijan, bMRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom, cBiological Sciences, Institute for Life Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom, and dStructural Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan
*Correspondence e-mail: keitaro-yamashita@g.ecc.u-tokyo.ac.jp, garib@mrc-lmb.cam.ac.uk
This article is part of the Proceedings of the 2024 CCP4 Study Weekend.
Metals are essential components for the structure and function of many proteins. However, accurate modelling of their coordination environments remains a challenge due to the complexity and diversity of metal-coordination geometries. To address this, a method is presented for extracting and analysing coordination information, including bond lengths and angles, from the Crystallography Open Database. By using these data, comprehensive descriptions of metal-containing components are generated. A stereochemical information generator for a particular component within a specific macromolecule leverages an example PDB/mmCIF file containing the component to account for the actual surrounding environment. A matching process has been developed and implemented to align the derived metal structures with idealized coordinates from a coordination geometry library. Additionally, various strategies, depending on the quality of the matches, were employed to compile distance and angle statistics for the refinement of macromolecular structures. The developed methods were implemented in a new program, MetalCoord, that classifies and utilizes the metal-coordination geometry. The effectiveness of the developed algorithms was tested using metal-containing components from the PDB. As a result, metal-containing components from the CCP4 monomer library have been updated. The updated monomer dictionaries, in concert with the derived restraints, can be used in most structural biology computations, including macromolecular crystallography, single-particle cryo-EM and even molecular mechanics.
Keywords: refinement; restraints; metal-coordination geometry; macromolecular crystallography; cryo-EM.
1. Introduction
The determination of the three-dimensional structures of macromolecules and their complexes with various ligand molecules is an important step in understanding the biological processes in which they participate. The most widely used experimental techniques for this purpose are macromolecular crystallography (MX) and single-particle analysis (SPA) using electron cryo-microscopy (cryo-EM). In both methods, particularly when data are limited to medium and low resolution, the experimental data alone are insufficient to precisely position all atoms. Therefore, Bayesian statistics, utilizing prior knowledge about the building blocks of macromolecules and ligand molecules, are employed. For this approach to be effective, accurate bond lengths, angles and torsion angles, along with their associated standard deviations, must be tabulated and stored in a monomer library (Vagin et al., 2004). When new components are encountered, their stereochemical description should be created and provided to refinement and model-building programs. Such descriptions can be generated using high-quality software tools, including eLBOW (Moriarty et al., 2009) from the Phenix package, grade (Smart et al., 2021) from Global Phasing and AceDRG (Long et al., 2017) from CCP4. Although these programs can generate stereochemical information for most chemical components, they encounter difficulties with metal-containing components. As a result, the model quality around the metal atoms in many macromolecular atomic structures is often lower than that of the rest of the atomic model.
The automatic definition of stereochemistry around metal atoms without additional information is a challenging problem. Bond lengths and angles depend on various factors, including the charge of the metal, its coordination geometry and the chemistry of the surrounding atoms. Additionally, the same metal can exist in two or more different states within the same components, depending on the protein environment. Another complication is that the bonding pattern around metal atoms in metal-containing components is often incomplete. Generally, it is not feasible to isolate a metal-containing component from its environment. Generating stereochemical information for such components in isolation and applying it later during refinement and model building is difficult, if not impossible. These components only become complete when they are within proteins and surrounded by protein and/or solvent atoms. Furthermore, in many cases metals are part of an active site, and during the catalytic reaction of macromolecules it is not uncommon for oxidation states, coordination geometry and stereochemical information to change (see, for example, Bolton et al., 2024). In other words, the context is important. Recent statistical analyses of metal-binding sites in metalloproteins by Bazayeva et al. (2024) have also provided valuable insights into typical metal-coordination distances. These data serve as reference information for refinement and validation, highlighting the variability and complexity of metal interactions across different environments.
There have been several significant efforts to address the challenges of dealing with metals in macromolecular structures, most notably checkMyMetal and metalPDB (Zheng et al., 2014; Putignano et al., 2018). Additionally, there are software tools and data tables that focus on specific metals (Moriarty et al., 2009; Touw et al., 2016; Harding et al., 2010). However, to effectively address the current issues in the PDB and to minimize future problems, it is essential to develop a sufficiently general and versatile tool that can handle most metal-containing components. Such a tool should be capable of generating accurate stereochemical information even when the metal is in a different environment.
This contribution describes a set of methods for extracting coordination information, along with corresponding bond lengths and angles, from the small-molecular database the Crystallography Open Database (COD; Gražulis et al., 2009), and using these data to generate a comprehensive description of components, including details around the metal that account for the actual environment in which the metal atoms are situated. A method for generating context-dependent stereochemical information has also been developed and implemented.
2. Methods
2.1. Extraction and organization of the metal environment
2.1.1. Selection of COD entries
Crystal structures from the Crystallography Open Database (COD; Gražulis et al., 2009), determined using single-crystal X-ray diffraction with a resolution better than 0.82 Å and an R factor1 below 0.1, were selected for further analysis. While these criteria do not entirely eliminate incorrect structures, subsequent filtering steps ensure that the structures included in the statistical analysis are of adequate quality. Moreover, structures with partially occupied non-H atoms within the unit cell were omitted from the study. These selection criteria are similar to those employed by Long et al. (2017). From the filtered data, only entries containing at least one metal atom in the asymmetric unit were considered for further examination (Table 1).
|
2.1.2. Generation of the metal environment
For each selected crystal structure, all atoms within three unit cells in each of the x, y and z directions were generated using all of the symmetry operators of the crystal. For each metal in the asymmetric unit, all atoms within the distance d12 ≤ α(r1 + r2) were extracted and saved in a file, where d12 is the distance between the considered atoms and r1 (metal) and r2 are their `covalent' radii (Cordero et al., 2008). Three sets of coordinates were generated with α = 1.1, α = 1.2 and α = 1.3. All files were then divided into two sets: (i) those without any metal–metal `bonds' and (ii) those with at least one metal–metal `bond'. The total number of metal-environment structures generated was 429 579. Of these, 228 063 files, which did not contain any metal–metal bonds, were used for further analysis. It is important to note that a single crystal may contain multiple metals, either with the same identity or with different identities. Environments were extracted for all metal atoms within the asymmetric unit of the crystal.
The current coordination geometry classes do not include cases with metal–metal bonds. These will be considered in the future; however, in macromolecules, it is extremely rare to observe metal–metal interactions.
2.1.3. Classification of metal environments
We began with 31 ideal metal-coordination classes (Table 2), denoted as `pre-existing'. To create coordination classes that are independent of the metal and ligand identity, the bond lengths between metal and nonmetal atoms were normalized to a value of 1. The limitations of this normalization are partially mitigated by employing full Procrustes matching with scaling (Dryden & Mardia, 2016; Appendix C).
|
Metal-environment structures extracted from the COD were assigned to the coordination classes through an iterative process. Initially, all H atoms were removed from the files as a preprocessing step. For each file, we extracted all atoms with metal–nonmetal distances of less than 1.3 × (r1 + r2). Coordination class assignment was then attempted using combinatorial Procrustes analysis (Appendix C). If the assignment was unsuccessful, atoms within a distance of 1.2 × (r1 + r2) were considered, followed by atoms within 1.1 × (r1 + r2). Upon successful class assignment, the metal and the corresponding atoms were extracted and saved as separate files.
This iterative method for assigning structures to coordination classes ensured that each structure was assigned to the class with the highest possible coordination number. This approach helped to minimize complications arising from slight variations in bond lengths that could affect the coordination geometry. For instance, a structure initially classified as octahedral could be reclassified as square planar if two opposite vertices are excluded due to slightly longer than expected bond lengths. Similarly, removing one vertex might shift the classification to square pyramidal.
Following the initial classification, the structures underwent additional review based on the following criteria.
|
When structures could not be matched to any of the existing coordination classes, new idealized structures were created, and the matching and classification process was repeated. Consequently, the total number of coordination classes increased to 95 (Table 2). Table 3 lists the metal atoms with their likely coordination for the cases with more than 500 members. A table containing all elements with their classes can be found in the supporting information.
|
The above iterative process of assigning and defining classes allowed us to classify the majority, although not all, of the coordination environments within the data set (Table 1).
2.2. Metal-containing component description generation
The algorithm for generating stereochemical information involves three steps.
2.2.1. Algorithm 1: metal–ligand description generation
2.2.2. Algorithm 2: initial metal–ligand description generation
2.2.3. Algorithm 3: update the stereochemical information around metal atoms
2.3. Implementation
The algorithms for generating initial stereochemical information and coordinates for metal-containing components have been implemented in the AceDRG program. The matching, extraction, compilation and application of stereochemical information pertaining to metal environments have been incorporated into a new program, MetalCoord.
The MetalCoord program operates in two primary modes and one secondary mode.
|
2.4. Program availability
AceDRG is available as part of the CCP4 suite, whereas MetalCoord can be accessed on GitHub at https://github.com/Lekaveh/MetalCoordAnalysis together with a tutorial describing its application. The program will also be included in the next version of CCP4. Servalcat, which now can perform geometry optimization and maximum-likelihood crystallographic refinement, is available both from CCP4 and on GitHub at https://github.com/keitaroyam/servalcat. The entire monomer library, along with the updated entries, is available from an upcoming version of CCP4 as well as on GitHub at https://github.com/MonomerLibrary/monomers.git.
3. Results and discussion
Our primary objective was to update the descriptions of metal-containing components provided by CCP4 (Agirre et al., 2023), as they have not been revised since their introduction in the early 2000s (Vagin et al., 2004). Although some frequently used components, such as haem, vitamin B12 (monomer codes HEM and B12) and certain iron–sulfur clusters (for example monomer codes SF4, SF3 and FS2), have been sporadically revised and manually corrected, there have been no systematic efforts to review and amend all metal-containing components. This revision is long overdue, and we are now addressing these issues.
To update the descriptions, we initially reviewed all 756 entries (as of February 2024) in the Chemical Component Dictionary (CCD; Dimitropoulos et al., 2006) that contain metal atoms. While many of these entries are correct, we manually assessed each one to identify and rectify potential chemical inaccuracies. We discovered that at least 50 of the CCD entries exhibit varying degrees of inaccuracy. Some issues relate to structural integrity, while others could lead to incorrect chemical interpretations. It is important to note that in many cases the structures in the PDB entries are correct; however, their chemistry from the CCD does not meet the same standards. This discrepancy is presumably due to miscommunication between the PDB and depositors.
Problematic cases could be roughly divided into two classes.
|
Out of 884 CCD metal-containing entries, we updated 809 using AceDRG, MetalCoord and Servalcat. This includes all non-obsolete metal-containing ligands containing more than one atom. We also excluded eight ligands containing boron clusters. Before updating, we needed to correct the chemistry of over 90 of them. Besides employing the update monomer library, it is generally recommended to use MetalCoord in stats mode to generate external restraints prior to macromolecular structure refinement. This ensures that the correct coordination geometry is identified and that the corresponding bond lengths and angles are applied.
4. Examples of application
From refinements of numerous structures while testing the updated CCP4 monomer library, we present a few example cases to demonstrate the improvement in refinement stability and structure model quality.
The structures shown in this section were re-refined using Servalcat employing the updated monomer library and restraints based on MetalCoord analysis. The cryo-EM SPA structure refinements were carried out in the refine_spa_norefmac mode and the crystal structure refinements in the refine_xtal_norefmac mode against structure-factor amplitudes. The refined structures, along with the scripts used, are publicly available at https://doi.org/10.5281/zenodo.13694559. The refinement statistics are briefly reported in Supplementary Table S1 and selected external restraints used during refinement are listed in Supplementary Table S2.
4.1. Haem-like components
Haem-like cofactors which bind a metal cation in their centre play fundamental roles in numerous large biomolecular complexes, including photosystems and respiratory complexes.
The beneficial impact of the updated library can be shown on the structure of monomeric photosystem II from Synechocystis (PDB entry 6wj6) determined using cryo-EM SPA at a resolution of 2.58 Å (Gisriel et al., 2020). Our re-refinement improved the chemical correctness of the model as well as its agreement with the experimental density. The updated dictionary for chlorophyll A (monomer code CLA) allowed modelling of the magnesium cation out of the porphyrin plane (Fig. 3a). This enabled interaction with Thr179 via a water molecule. It should be noted that the maximum coordination number of the magnesium ion in chlorophyll A was set to five for the generation of the restraints for refinement (option -c 5 for MetalCoord in stats mode). Otherwise, irrelevant C atoms close to some magnesium ions were taken into account due to inaccurate input coordinates, which causes an incorrect increase in the magnesium coordination number to six. When using the further refined coordinates, MetalCoord interpreted these problematic cases correctly (i.e. a maximum coordination number of five) without any extra option being specified. Furthermore, in this structure model, the modelling of the iron-cation coordination in the haem molecules (monomer code HEM) with the neighbour histidine residues was considerably improved (Fig. 3b).
4.2. Hybrid iron–sulfur–oxygen cluster
The dictionary for the hybrid iron–sulfur–oxygen cluster (monomer code FS2) was incorrectly defined in the CCP4 monomer library in the past. The outlier analysis in Servalcat of the crystal structure of the hybrid cluster protein (PDB entry 1w9m), solved at a resolution of 1.35 Å (Aragão et al., 2008; Fig. 3c) using the old dictionary (from CCP4 version 9.0.004), reported 14 bond-length and 16 bond-angle outliers with a Z-score higher than 5 for atoms of the FS2 monomer, despite the structure being correct. Consequently, this dictionary was manually revised (with FE5—FE6, FE6—FE7, FE5—O1 and FE8—O9 bonds removed, as they were either redundant or incorrect) and subsequently optimized in MetalCoord. Refinement using the updated dictionary resulted in only one significant outlier: a distance between the FE7 atom of the cluster and the hydroxyl group of Glu268. Such specific molecular interactions cannot be adequately described in a component dictionary file. Nevertheless, MetalCoord provides an analysis that generates external restraints suitable for a particular structure when an input model file is provided (see Appendix D1). In this case, the restraints generated for the cluster also include the `ideal' value for the problematic distance mentioned (1.99 ± 0.13 Å), corresponding to trigonal bipyramidal coordination geometry, which is close to the distance observed in the deposited structure (2.14 Å).
Although the monomer library can be considered to be a reasonable starting point for metal-containing components, we also recommend running MetalCoord in stats mode while specifying a structure in the input. This will provide additional restraints suitable for the particular case, including molecular interactions.
4.3. Aluminium coordination depending on chemical context
The exact conformation of a molecule generally depends on its chemical environment. In the case of metal-containing components, the surrounding environment can also influence the metal-coordination geometry. For instance, the Al atom in aluminium trifluoride (monomer code AF3) within the nitrogenase-like dark-operative protochlorophyllide oxidoreductase complex (PDB entry 2ynm; Moser et al., 2013) exhibits trigonal bipyramidal coordination (Fig. 3d), whereas it adopts an octahedral (square bipyramidal) coordination (Fig. 3e) in the dUTPase (PDB entry 4dl8; Hemsworth et al., 2013).
A dictionary in the monomer library can accurately describe only a single conformation of a metal-containing component, for example the trigonal bipyramidal coordination of the aluminium centre in aluminium chloride, which is the default option in the library. However, the MetalCoord program analyses the ideal bond geometry while considering the metal environment when an input structure is provided (see Appendix D1). This allows the definition of a component dictionary and restraints suited to a particular chemical context, for example the octahedral coordination in the dUTPase example.
4.4. Ferricyanide
Due to the suboptimal treatment of metal atoms in the past, the harmonic restraints for the ferricyanide ions [Fe(CN)6]3− (monomer code FC6, with partial occupancy) in the crystal structure of bilirubin oxidase (Koval', Švecová et al., 2019; Malý et al., 2020) were applied in conjunction with a modified component dictionary to prevent geometric distortion. This type of restraint fixes atoms to their current positions, which is generally not an appropriate approach. MetalCoord now provides information to define more appropriate restraints and dictionaries based on ligand chemistry, which can be considered a more relevant refinement strategy. The result of the re-refinement is shown in Fig. 3(f).
Furthermore, in this structure, additional restraints were generated for copper cations. However, the automatic decision on their coordination number as four proved to be incorrect. To optimize the MetalCoord run, the option -c 3 was used to reset the maximum coordination number.
4.5. Zinc and haem in nitric oxide reductase
The crystal structure of nitric oxide reductase (PDB entry 3ayf; Matsumoto et al., 2012) contains a zinc ion which interacts with a water molecule placed close to the iron centre of the haem molecule. MetalCoord analysis of the zinc ion reported two possible coordinations: trigonal-bipyramid or square-pyramid. Thus, two independent refinements were performed in Servalcat when using restraints based on either of these two options. Both coordination possibilities were indistinguishable in the resulting coordinates, given the data quality.
5. Conclusions and future perspectives
This contribution addresses one of the longstanding challenges within the CCP4 suite, and perhaps within the broader field of atomic structure derivation, for molecules containing metal-containing compounds. With the aid of the current AceDRG, MetalCoord and Servalcat software, the refinement of ligands with metals should now be semi-automatic. Given the versatility of metals and their responsiveness to environmental variations, it is recommended to generate and apply restraints specific to each structure under study before each refinement session. This approach will ensure that the correct coordination geometry is identified and utilized.
In many cases, it may be reasonable to define the component without the metal and then add the metals as separate components. If this approach is adopted, MetalCoord should be used in the stats mode. The program will then generate appropriate restraints for each metal atom based on its current environment. This approach would be effective in many cases; however, it is still neccessry to derive an accurate monomer library distributed by CCP4.
Although the metal-containing components in the current version of the monomer library can be considered to be satisfactory, there is much more work to be done. One future direction should involve comparative analyses of metal-coordination geometries between small-molecule databases (such as COD) and the PDB. In the current work, we used a naive, agnostic approach with the assumption that metals in macromolecules and small molecules are equally distributed. However, it is likely that biological macromolecules utilize metals that are readily available in the environment where the organism resides. To prioritize research and methodological developments aimed at improving macromolecular structures, it is necessary to conduct statistical and comparative analyses of small-molecule and macromolecular structure databases.
Another important direction is the validation of metal environments in deposited structures. While there are well established validation tools and protocols for proteins and DNA/RNA, and some exist for ligands, particularly for bonding and nonbonding interactions, such tools are not yet available for metals and their environments. The resources within MetalCoord could be further utilized for this purpose. Additionally, validation of charges in the local environment might also aid in the correct interpretation of chemistry. In light of the application of machine-learning techniques to derive, interpret and predict structures, chemically accurate structure derivation and annotations are more important than ever.
In the context of X-ray crystallography, further validation could be achieved through anomalous scattering, making it crucial to retain Friedel pairs in PDB submissions.
During data acquisition (using X-rays, electrons or even neutrons), metals may undergo changes in oxidation states, altering their coordination geometry (Carugo & Carugo, 2005; Yano et al., 2005; Hattne et al., 2018). While MetalCoord can generate restraints for uniform oxidation-state changes, partial oxidation presents a challenge due to the coexistence of multiple coordination geometries. To account for this, MetalCoord can generate restraints for user-defined alternative conformations corresponding to different oxidation states. Semi-automation of this process will require the integration of tools such as molecular graphics, chemoinformatics and precise difference-map calculations. MetalCoord will serve as a key component within this integrated workflow.
Another important issue that is easy to underestimate is the communication between depositors and the PDB. Enhancing this communication is essential to ensure the accuracy and reliability of metal-containing structures. While the deposition process for non-metal-containing components has seen substantial improvements, considerable work remains to optimize the deposition and documentation of metal-containing components.
APPENDIX A
Distance and angle statistics
A1. Mode identification and distance statistics
The metal–ligand distance distributions for metals can exhibit multiple modes. To identify these modes, determine their occurrence probabilities and estimate the widths (standard deviation) of the corresponding modes, we use the following procedure.
Silverman's method (Silverman, 1981) is used to determine the number and approximate locations of the modes. This method applies kernel density estimation to the data, resulting in a smooth empirical density. The local maxima of this density are found using simple scanning methods. Different kernel sizes are tested and the smallest bandwidth that results in the specified number of modes is selected.
Once the number and approximate positions of the modes have been identified, a Gaussian mixture model (GMM; see, for example, Bishop, 2006) method is used to optimize the mode positions, the probabilities of the occurrences of modes and their widths.
APPENDIX B
Symmetrized von Mises distribution
Given the circular nature of bond angles, the von Mises distribution (see, for example, Dryden & Mardia, 2016) is used to model their distribution,
where α0 is the mean angle, X is a measure of the concentration and I0 is the zeroth-order modified Bessel function of the first kind. Bond angles are generally symmetric, meaning that the function of the angle is symmetric: f(α) = f(−α) = f(2π − α). Therefore, a symmetrized von Mises distribution seems to be more appropriate (if the angle is around π, then all observed angles will be less than π, and therefore the estimated angle will be less than π),
It can be verified that this distribution is symmetric around 0 and π.
The symmetrized von Mises distribution can also be conveniently expressed as
where X1 = X cos α0, X1 = X sin α0 and .
With a given data set of angles the parameters X1 and X2 are estimated using maximum-likelihood estimation (MLE).
B1. Maximum-likelihood estimation (MLE)
For the ordinary von Mises distribution, the negative log-likelihood function is
where K is the number of data points and αi are the observed angles.
Expressed in terms of X1 and X2,
Define
The negative log likelihood becomes
The minimum of LL0 is found, and the corresponding values of X1 and X2 are used to estimate the angle and width parameters using the relationships
with m(X) = I1(X)/I0(X).
B2. Symmetrized von Mises estimation
To estimate α0 near π, we use the symmetrized distribution. The minus log likelihood is
Once LL1 has been minimized the values of X1 and X2 are used to estimate α0 and X.
B3. Algorithm for bond-angle estimation
For K observations , initial values of α0 and X are estimated using the method of moments.
|
These initial values are further refined using the Fisher scoring method. The Newton–Raphson optimization method was found to be fast and accurate for this case.
APPENDIX C
Procrustes matching method
C1. Overview of Procrustes matching
Procrustes matching is a statistical technique (Dryden & Mardia, 2016; Crosilla et al., 2019) that is used to compare and align two or more shapes by eliminating differences in location, scale and orientation. This method is extensively applied in fields such as structural biology, morphometrics, computer vision and other areas where shape analysis is crucial. The primary objective of the Procrustes method is to achieve the optimal superimposition of two sets of points, ensuring that the corresponding points are as close as possible, according to the least-squares criterion.
C2. Procrustes method formulation
Given two sets of points, X = [x1, x2, …, xn] and Y = [y1, y2, …, yn], where xi, yi ∈ Rm are coordinates of points in m-dimensional space, the Procrustes method minimizes the objective function
where b is a scaling factor, R is an orthogonal rotation matrix, t is an m-dimensional translation vector, ∥ · ∥F denotes the Frobenius norm and .
The objective is to find the optimal parameters b, R and t that align configuration X to configuration Y such that the sum of squared distances between corresponding points is minimized.
C3. Steps in Procrustes matching
|
C4. Determining optimal correspondence by permutations
When only the correspondence between metal atoms is known, and the specific correspondence between other atoms in the structures is unknown, we test all possible permutations of the points in one configuration relative to the other. The objective is to find the permutation that minimizes the Procrustes distance. It is important to note that this method is feasible for a small number of points, which is applicable in our case. The maximum number of atoms, excluding the metal, is 24, but this number is rarely more than ten. For a larger number of points, alternative methods, such as the stochastic Procrustes method, should be considered.
|
APPENDIX D
Details of extraction of metal-coordination information
D1. Extraction of the metal environment from the component dictionary and the macromolecular model
To determine the coordination geometry around the metal in the macromolecular model, the model file (typically PDB or mmCIF) and/or the monomer CIF file containing the component of interest must first be read and processed. In stats mode, the monomer CIF file is not required.
In the update mode, if more than one instance of the metal-containing component is present in the model file, the one with the lowest B value and the highest average occupancy is selected. In this mode, the monomer CIF file is read first and each metal in the component is considered one by one. For each metal, the bonds to the metal and the corresponding bonded atoms are saved in a separate object. If at least one of the atoms bonded to any of the metal atoms is absent in the model file, the program terminates with an appropriate error message. Atoms from the model file that are close to the metal atoms are also added to the object. Before addition, further filtering is performed (see below).
In the stats mode, only the macromolecular model file is read. All instances of the metal-containing component are considered one after another. Again, each metal within the specified component is considered. All atoms in the model file, including the component itself, are added to an object if they are close to the considered metal (the neighbour list is calculated using GEMMI). Filtering is again applied to reduce the probability of selecting incorrect atoms.
When adding atoms to the tentative list of atoms potentially forming bonds with the metal, their alternative location must match if both the metal and the considered atom have an alt loc code.
Atoms around the metal are selected using the following filtering procedure (filtering is applied only to those atoms that are from the model file but are not in the list of bonded atoms to the metal).
|
If the symmetry-related atoms are making contact with the metal then the following additional procedure is used [note that all atoms are considered in step (i) and then the remaining atoms are considered in step (ii)].
|
Here, we define occupancy as the crystallographic occupancy: the proportion of the atom in the asymmetric unit.
D2. Matching the metal environment with the coordination library
To accurately characterize the metal environments, we employ a combinatorial Procrustes matching procedure that aligns the metal and its environment (derived as described above) with idealized coordinates from the coordination library. In the current implementation, we use only 54 out of the 95 coordination classes. Currently, we exclude coordination classes where nonmetal atoms are bonded to each other. The exceptions are `sandwich-like structures', where all atoms of at least one ring form bonds with the metal.
D3. Compiling statistics: bonds
The program uses several strategies for compiling bond and angle statistics. For calculations of bond statistics, the following strategies are used.
|
D4. Compiling statistics: angles
For calculations of angle statistics, the following strategies are used.
|
Supporting information
Link https://doi.org/10.5281/zenodo.13694559
This archive contains re-refined macromolecular structures presented in this article. The JSON files describing metal coordination were generated using MetalCoord. These JSON files were converted to restraints using the included script json2restraints.py. The structures were re-refined using Servalcat.
List of all coordination classes within metalCoord. DOI: https://doi.org/10.1107/S2059798324011458/von5002sup1.csv
Full list of frequencies of coordination classes for each metal. DOI: https://doi.org/10.1107/S2059798324011458/von5002sup2.csv
Idealized coordinates of coordination classes. DOI: https://doi.org/10.1107/S2059798324011458/von5002sup3.tgz
Supporting tables of restraints and refinement statistics. DOI: https://doi.org/10.1107/S2059798324011458/von5002sup4.pdf
Footnotes
‡These authors equally contributed to this work.
1Given that multiple R factors may be listed in a COD entry and not all may be present, we selected crystal structures where at least one of the R factors is less than 0.1.
2Here we use the term `user' loosely; it can refer to an actual human user or another program, for example CCP4i2.
3A compound is considered to be a sandwich-like structure if it contains one or two cyclopentadienyl rings bonded to a metal.
4A structure is considered to be haem-like if it contains at least one metal atom and four five-membered rings, each consisting of four C atoms and one N atom, with the N atoms forming bonds with the metal atom.
Acknowledgements
We thank Jake Grimmett, Ivan Clayson and Toby Darling from the MRC–LMB Scientific Computing Department for computing support and resources. We thank Marcin Wojdyr for his help with GEMMI implementation. Part of this work was conducted during KB's visit to the MRC Laboratory of Molecular Biology.
Funding information
The following funding is acknowledged: Medical Research Council (grant No. MC_UP_A025_1012 to Garib N. Murshudov); University of Southampton (STFC grant No. 8521412 to Ivo Tews).
References
Agirre, J., Atanasova, M., Bagdonas, H., Ballard, C. B., Baslé, A., Beilsten-Edmands, J., Borges, R. J., Brown, D. G., Burgos-Mármol, J. J., Berrisford, J. M., Bond, P. S., Caballero, I., Catapano, L., Chojnowski, G., Cook, A. G., Cowtan, K. D., Croll, T. I., Debreczeni, J. É., Devenish, N. E., Dodson, E. J., Drevon, T. R., Emsley, P., Evans, G., Evans, P. R., Fando, M., Foadi, J., Fuentes-Montero, L., Garman, E. F., Gerstel, M., Gildea, R. J., Hatti, K., Hekkelman, M. L., Heuser, P., Hoh, S. W., Hough, M. A., Jenkins, H. T., Jiménez, E., Joosten, R. P., Keegan, R. M., Keep, N., Krissinel, E. B., Kolenko, P., Kovalevskiy, O., Lamzin, V. S., Lawson, D. M., Lebedev, A. A., Leslie, A. G. W., Lohkamp, B., Long, F., Malý, M., McCoy, A. J., McNicholas, S. J., Medina, A., Millán, C., Murray, J. W., Murshudov, G. N., Nicholls, R. A., Noble, M. E. M., Oeffner, R., Pannu, N. S., Parkhurst, J. M., Pearce, N., Pereira, J., Perrakis, A., Powell, H. R., Read, R. J., Rigden, D. J., Rochira, W., Sammito, M., Sánchez Rodríguez, F., Sheldrick, G. M., Shelley, K. L., Simkovic, F., Simpkin, A. J., Skubak, P., Sobolev, E., Steiner, R. A., Stevenson, K., Tews, I., Thomas, J. M. H., Thorn, A., Valls, J. T., Uski, V., Usón, I., Vagin, A., Velankar, S., Vollmar, M., Walden, H., Waterman, D., Wilson, K. S., Winn, M. D., Winter, G., Wojdyr, M. & Yamashita, K. (2023). Acta Cryst. D79, 449–461.
Web of Science
CrossRef
IUCr Journals
Google Scholar
Aragão, D., Mitchell, E. P., Frazão, C. F., Carrondo, M. A. & Lindley, P. F. (2008). Acta Cryst. D64, 665–674.
CrossRef
IUCr Journals
Google Scholar
Bazayeva, M., Andreini, C. & Rosato, A. (2024). Acta Cryst. D80, 362–376.
CrossRef
IUCr Journals
Google Scholar
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Berlin, Heidelberg: Springer-Verlag.
Google Scholar
Bolton, R., Machelett, M. M., Stubbs, J., Axford, D., Caramello, N., Catapano, L., Malý, M., Rodrigues, M. J., Cordery, C., Tizzard, G. J., MacMillan, F., Engilberge, S., von Stetten, D., Tosha, T., Sugimoto, H., Worrall, J. A. R., Webb, J. S., Zubkov, M., Coles, S., Mathieu, E., Steiner, R. A., Murshudov, G., Schrader, T. E., Orville, A. M., Royant, A., Evans, G., Hough, M. A., Owen, R. L. & Tews, I. (2024). Proc. Natl Acad. Sci. USA, 121, e2308478121.
CrossRef
PubMed
Google Scholar
Carugo, O. & Djinović Carugo, K. (2005). Trends Biochem. Sci. 30, 213–219.
Web of Science
CrossRef
PubMed
CAS
Google Scholar
Cordero, B., Gómez, V., Platero-Prats, A. E., Revés, M., Echeverría, J., Cremades, E., Barragán, F. & Alvarez, S. (2008). Dalton Trans., pp. 2832–2838.
Google Scholar
Crosilla, F., Beinat, A., Fusiello, A., Maset, E. & Visintini, D. (2019). Advanced Procrustes Analysis Models in Photogrammetric Computer Vision. Cham: Springer International.
Google Scholar
Dimitropoulos, D., Ionides, J. & Henrick, K. (2006). Curr. Protoc. Bioinformatics, 15, 14.
CrossRef
Google Scholar
Dryden, I. L. & Mardia, K. V. (2016). Statistical Shape Analysis, with Applications in R, 2nd ed. Chichester: John Wiley & Sons.
Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501.
Web of Science
CrossRef
CAS
IUCr Journals
Google Scholar
Gisriel, C. J., Zhou, K., Huang, H.-L., Debus, R. J., Xiong, Y. & Brudvig, G. W. (2020). Joule, 4, 2131–2148.
CrossRef
CAS
Google Scholar
Gražulis, S., Chateigner, D., Downs, R. T., Yokochi, A. F. T., Quirós, M., Lutterotti, L., Manakova, E., Butkus, J., Moeck, P. & Le Bail, A. (2009). J. Appl. Cryst. 42, 726–729.
Web of Science
CrossRef
IUCr Journals
Google Scholar
Greenwood, N. & Earnshaw, A. (1997). Chemistry of the Elements, 2nd ed. Oxford: Butterworth-Heinemann.
Google Scholar
Harding, M. M., Nowicki, M. W. & Walkinshaw, M. D. (2010). Crystallogr. Rev. 16, 247–302.
Web of Science
CrossRef
CAS
Google Scholar
Hattne, J., Shi, D., Glynn, C., Zee, C.-T., Gallagher-Jones, M., Martynowycz, M. W., Rodriguez, J. A. & Gonen, T. (2018). Structure, 26, 759–766.
Web of Science
CrossRef
CAS
PubMed
Google Scholar
Hemsworth, G. R., González-Pacanowska, D. & Wilson, K. S. (2013). Biochem. J. 456, 81–88.
CrossRef
CAS
PubMed
Google Scholar
Koval', T., Švecová, L., Østergaard, L. H., Skalova, T., Dušková, J., Hašek, J., Kolenko, P., Fejfarová, K., Stránský, J., Trundová, M. & Dohnálek, J. (2019). Sci. Rep. 9, 13700.
Web of Science
PubMed
Google Scholar
Landrum, G. (2016). Rdkit. https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4.
Google Scholar
Lewandowski, E. M., Skiba, J., Torelli, N. J., Rajnisz, A., Solecka, J., Kowalski, K. & Chen, Y. (2015). Chem. Commun. 51, 6186–6189.
CrossRef
CAS
Google Scholar
Long, F., Nicholls, R. A., Emsley, P., Gražulis, S., Merkys, A., Vaitkus, A. & Murshudov, G. N. (2017). Acta Cryst. D73, 112–122.
Web of Science
CrossRef
IUCr Journals
Google Scholar
Malý, M., Diederichs, K., Dohnálek, J. & Kolenko, P. (2020). IUCrJ, 7, 681–692.
Web of Science
CrossRef
PubMed
IUCr Journals
Google Scholar
Matsumoto, Y., Tosha, T., Pisliakov, A. V., Hino, T., Sugimoto, H., Nagano, S., Sugita, Y. & Shiro, Y. (2012). Nat. Struct. Mol. Biol. 19, 238–245.
Web of Science
CrossRef
CAS
PubMed
Google Scholar
Moriarty, N. W., Grosse-Kunstleve, R. W. & Adams, P. D. (2009). Acta Cryst. D65, 1074–1080.
Web of Science
CrossRef
CAS
IUCr Journals
Google Scholar
Moser, J., Lange, C., Krausze, J., Rebelein, J., Schubert, W.-D., Ribbe, M. W., Heinz, D. W. & Jahn, D. (2013). Proc. Natl Acad. Sci. USA, 110, 2094–2098.
CrossRef
CAS
PubMed
Google Scholar
Putignano, V., Rosato, A., Banci, L. & Andreini, C. (2018). Nucleic Acids Res. 46, D459–D464.
Web of Science
CrossRef
CAS
PubMed
Google Scholar
Silverman, B. W. (1981). J. R. Stat. Soc. Ser. B Stat. Methodol. 43, 97–99.
CrossRef
Google Scholar
Smart, O. S., Sharff, A., Holstein, J., Womack, T., Flensburg, C., Keller, P., Paciorek, W., Vonrhein, C. & Bricogne, G. (2021). Grade2, version 1.6.0. Global Phasing Ltd, Cambridge, United Kingdom.
Google Scholar
Touw, W. G., van Beusekom, B., Evers, J. M. G., Vriend, G. & Joosten, R. P. (2016). Acta Cryst. D72, 1110–1118.
Web of Science
CrossRef
IUCr Journals
Google Scholar
Vagin, A. A., Steiner, R. A., Lebedev, A. A., Potterton, L., McNicholas, S., Long, F. & Murshudov, G. N. (2004). Acta Cryst. D60, 2184–2195.
Web of Science
CrossRef
CAS
IUCr Journals
Google Scholar
Wojdyr, M. (2022). J. Open Source Softw. 7, 4200.
CrossRef
Google Scholar
Yamashita, K., Palmer, C. M., Burnley, T. & Murshudov, G. N. (2021). Acta Cryst. D77, 1282–1291.
Web of Science
CrossRef
IUCr Journals
Google Scholar
Yano, J., Kern, J., Irrgang, K.-D., Latimer, M. J., Bergmann, U., Glatzel, P., Pushkar, Y., Biesiadka, J., Loll, B., Sauer, K., Messinger, J., Zouni, A. & Yachandra, V. K. (2005). Proc. Natl Acad. Sci. 102, 12047–12052.
Web of Science
CrossRef
PubMed
CAS
Google Scholar
Zheng, H., Chordia, M. D., Cooper, D. R., Chruszcz, M., Müller, P., Sheldrick, G. M. & Minor, W. (2014). Nat. Protoc. 9, 156–170.
Web of Science
CrossRef
CAS
PubMed
Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.