research papers
accessHigh-throughput automated for small-molecule MicroED data
aDepartment of Chemistry, University of California, Riverside, 900 University Avenue, Riverside, CA 92521, USA
*Correspondence e-mail: [email protected]
Interest in electron diffraction (ED) for structural characterization of both proteins and small molecules has grown significantly over the last decade. While ab initio phasing methods remain the gold standard for ED data from small-molecule samples, radiation beam damage during data collection and poor crystallinity of the nanocrystalline sample can make this method unfeasible – particularly for challenging molecules that exhibit conformational flexibility. (MR) is the most commonly used phasing method for protein ED data and can circumnavigate issues related to diminished data quality. However, its application to small molecules has been limited due to the lack of methods for generating optimal trial conformations. Herein, a high-throughput automated workflow has been developed to solve a novel ED structure of corilagin, a macrocyclic gallotannin with pharmaceutical relevance, which could not be solved with ab initio phasing. The method was validated against three similar macrocycles with known structures (paritaprevir-α, paritaprevir-β and grazoprevir) at varying data resolution limits (1.0, 1.2, 1.4, 1.5, 1.6, 1.8 and 2.0 Å). At all these resolutions for all three structures, the developed workflow was successful and produced solutions with R factors and RMSD values within acceptable limits of the ab initio solved structure.
Keywords: small-molecule electron diffraction; MicroED; 3DED; electron crystallography; molecular replacement; high-throughput methods; automation algorithms.
CCDC reference: 2543412
1. Introduction
Microcrystal electron diffraction (MicroED), also known as 3D electron diffraction (3DED), has seen a steady gain in traction in the field of crystallography over the last decade (Shi et al., 2013
; Jones et al., 2018
; Mu et al., 2021
; Gemmi et al., 2019
; Danelius et al., 2023b
; Haymaker & Nannenga, 2024
). For small-molecule structures, following the early developments of ab initio determination directly from powder formulations (Gorelik et al., 2012
; van Genderen et al., 2016
; Jiang et al., 2022
), recent advances have demonstrated the ability to determine hydrogen-atom positions (Palatinus et al., 2017
) and absolute configuration (Brázda et al., 2019
), highlighting the method's potential and applicability. The ease of sample preparation represents a significant advantage of MicroED for small molecules and has contributed substantially to its widespread adoption as a structure determination technique. In contrast, sample preparation remains a challenging and rate-limiting step in macromolecular MicroED. For small-molecule analyses, nanocrystalline material is frequently obtainable directly from powdered samples, thereby eliminating the need for crystallization screening. Nevertheless, powder-derived specimens may still present obstacles to achieving high-resolution diffraction data. Such limitations typically result from inadequate crystallinity or from electron-beam-induced radiation damage during data acquisition (Hattne, 2021
). These limitations render a significant number of datasets from MicroED experiments (∼50% of datasets from complex molecules, from our experience) impossible to phase by direct methods, i.e. solving the based on the diffraction amplitudes alone (Sheldrick et al., 2012
). In situations where the data resolution is not sufficient for direct methods, global optimization techniques, such as simulated annealing, have successfully been used for crystal structure determination from ED data (Feyand et al., 2012
; Burla et al., 2015
; Gemmi et al., 2019
; Woollam et al., 2020
; Andrusenko et al., 2020
; Lightowler et al., 2022
). Global optimization utilizes knowledge about the connectivity of the molecule (Brunger, 1991
; Burla et al., 2015
; Woollam et al., 2020
), a condition that holds in many cases.
In the realm of protein crystallography, where poorer quality diffraction data are the de facto norm due to similar poor crystallinity (Timofeev & Samygina, 2023
), molecular replacement (MR) is the most effective and widely used alternative to ab initio methods, representing more than 80% of X-ray structures deposited in the Protein Data Bank (PDB) (Burley & Berman, 2021
). MR is a conceptually simple method where the position and orientation of a trial structure in the crystal is optimized using a series of translational and rotational functions to best align the calculated structure-factor amplitudes from the structure in a particular position in the crystal lattice with structure factors determined from the experimental diffraction data (Rossmann, 1990
). A major fragment of the trial structure needs to share the same conformation as the correct structure for successful MR. If the fragment is unknown, conformational searches using global optimization methods such as simulated annealing will be essential for screening different conformations. Because the basis for the optimization in an MR algorithm relies heavily on differences in structure factors between the calculated and experimental data, it is readily apparent that the conformation of the trial structure input into the MR algorithm is of critical importance. MR will only succeed when the trial model closely resembles the true structure (Abergel, 2013
).
With the recent surge in popularity of large, complex and conformationally flexible molecules as drug candidates that can target previously undruggable proteins [e.g. flat or featureless proteins with no apparent binding site (DeGoey et al., 2018
)], structural elucidation of these compounds is important for further development. This class of compounds are referred to as `beyond Rule of 5' (bRo5) compounds, as their molecular properties (molecular weight, lipophilicity, number of hydrogen-bond donors and acceptors) are beyond the Lipinski Rule of 5 (Ro5) (Lipinski et al., 1997
). bRo5 compounds, such as macrocycles, are known to display inherent flexibility and are best described as conformational ensembles in solution. Because bond rotations in macrocycles are highly constrained and correlated, generating the full conformational space is computationally demanding; as a result, global optimization methods are rarely used as the primary approach for producing macrocyclic conformational ensembles. Instead, many conformer generators exist based on a myriad of methods [e.g. machine learning (Jiang et al., 2022
), molecular mechanics (Watts et al., 2014
), molecular dynamics (Poongavanam et al., 2018
) and distance geometry (Seidel et al., 2023
)]. Recent work has shown that distance geometry methods are able to sample the conformational space to produce structures for these complex molecules within 2 Å RMSD cutoff of the experimentally determined structure, with noticeably improved efficiency compared with the more conventional molecular mechanics methods (Seidel et al., 2023
). While an RMSD of 2 Å is within the suggested limit for MR in protein crystallography, the RMSD cutoff of 0.25 Å for MR of small molecules is much lower (van de Streek & Neumann, 2010
). Some recent work has shown that MR can indeed be applied to small-molecule systems when combining this phasing method with fragmentation of the target molecule into rigid fragments and conformation screening using a multicomponent MR search (Gorelik et al., 2023
; Deschner et al., 2025
). This approach will prove useful for traditional Ro5 compounds that can be easily fragmented into rigid moieties, but is not easily amenable to the bRo5 candidates that lack any obvious fragmenting patterns.
Herein, we expand the capabilities of MR applied to small-molecule systems by designing an algorithm to automate the process of solving the structure for macrocyclic bRo5 molecules using MicroED data without any fragmentation. The input ensemble for our automated MR was calculated using the open-source conformer ensemble generation tool CONFORGE (Seidel et al., 2023
). For sampling macrocycle conformers, this tool implements a purely stochastic approach based on distance geometry and force-field-driven structure refinement. Paritaprevir and grazoprevir [Fig. 1
(a) and (b)] were chosen as validation structures to ensure that the high-throughput automated molecular replacement (HAMR, Fig. 2
) algorithm works properly, and corilagin [Fig. 1
(c)] was chosen as a novel structure to highlight the capability of the algorithm.
| | Figure 1 Macrocyclic `beyond Rule of 5' (bRo5) molecules used to validate the HAMR method: (a) paritaprevir, (b) grazoprevir and (c) corilagin. |
| Figure 2 (a) Visual representation of HAMR algorithm logic as applied to a validation compound – paritaprevir-α at 1.4 Å data resolution. (b) Generalized logical flowchart for the most significant steps involved in the HAMR algorithm. In step 2, solutions can be ranked via LLG or R factor. In step 3, ΔRMSD represents the average RMSD change for each dihedral angle during a full 360° rotation in steps of 10°. In step 6, the amount of filtering is specified during setup and can be omitted for a more complete search of conformational space. |
Paritaprevir and grazoprevir are both exemplary validation targets, as they are early bRo5 compounds with successful use in the clinic, both being FDA approved macrocyclic drugs, and both having MicroED structures solved using ab initio methods (Danelius et al., 2023a
; Bu et al., 2024
; Wieske et al., 2026
). Additionally, paritaprevir is an especially good candidate for validation as there are two known polymorphic conformers (referred to here as paritaprevir-α and paritaprevir-β), with an all-atom RMSD of approximately 0.8 Å, and it is imperative that the developed method can correctly discern between these to be useful for MicroED data that may represent multiple conformers. Finally, corilagin was chosen both due to its multiple potential medical uses documented in the literature (Li et al., 2018
), and as a real scenario of a MicroED dataset that was not possible to solve with ab initio methods despite a plethora of good quality data.
2. Materials and methods
2.1. Materials
Corilagin was commercially available from Invivochem. Approximately 0.5 mg powder was dissolved in ultrapure H2O in a 4 ml scintillation vial and air dried in a fume hood. The dried residue was scraped from the glass wall.
2.2. Grid preparation
The grid preparation followed the procedure as described previously (Unge et al., 2024
; Lin et al., 2024
). One TEM grid coated with continuous carbon support film (200 mesh, 3.05 mm OD, Ted Pella) was prepared by glow discharging in the negative mode for 60 s on each side at 15 mA in a PELCO easiGlow (Ted Pella). The grid was gently mixed with the scraped dry residue in the scintillation vial, taken out and clipped for loading into a cryo-TEM.
2.3. MicroED data collection
The MicroED data collection followed the procedure described previously (Unge et al., 2024
; Lin et al., 2024
), using EPU-D (Thermo Fisher Scientific) on a Talos Arctica cryogenic transmission electron microscope (Thermo Fisher Scientific) operating at 80 K with an acceleration voltage of 200 kV, corresponding to a wavelength of 0.0251 Å. The whole-grid atlas was acquired at a magnification of 210×. Microcrystals were screened at a magnification of 3400×. Upon identification of single microcrystals, the selected area aperture size 50 (1.4 µm in diameter) was inserted, and the microscope was switched to diffraction mode for taking still diffraction images under the eucentric height and parallel electron beam conditions (C2 lens intensity 45.2%, C2 aperture size 70 and spot size 11). Once high-resolution diffraction spots were observed, MicroED data were continuously recorded on the Falcon III detector (Thermo Fisher Scientific) at an exposure rate of 1 second per frame as the stage was continuously rotating from −69° to +69° at a speed of 0.6° per second. Each single-crystal dataset was collected as a movie in MRC format at a total electron fluence of 2.30 e Å−2.
2.4. MicroED data processing
For each single-crystal dataset, the diffraction images were extracted from the raw MRC movie and converted to SMV format using the MRC2SMV software freely available at https://cryoem.ucla.edu/microed (Hattne et al., 2015
). The diffraction images were processed in XDS, in which a resolution cutoff was applied at a mean I/σ(I) ≥ 1.0 and CC1/2 ≥ 0.3 in the highest resolution shell (Kabsch, 2010
; Brehm et al., 2023
). Reflection intensities from different single-crystal datasets were scaled and merged in XSCALE (Kabsch, 2010
; Brehm et al., 2023
). Reflections were converted to SHELX HKL format in XDSCONV (Kabsch, 2010
; Brehm et al., 2023
). For corilagin, since no solution could be obtained from ab initio phasing using SHELXT or SHELXD (Sheldrick, 2015
; Schneider & Sheldrick, 2002
), the reflections were converted to CCP4 MTZ format in XDSCONV (Kabsch, 2010
; Brehm et al., 2023
) for phasing using MR. For grazoprevir, paritaprevir-α and paritaprevir-β, the reflection files from previously published MicroED structures (Danelius et al., 2023a
; Bu et al., 2024
; Wieske et al., 2026
) were directly used and converted from SHELX HKL format to CCP4 MTZ format using F2MTZ in the CCP4i program suite (Potterton et al., 2003
). Resolutions were cut to 1.0, 1.2, 1.4, 1.5, 1.6, 1.8, 2.0, 2.5 and 3.0 Å using the highest resolution CPP4 MTZ reflection files with mtzutils in the CCP4 program suite to simulate low-resolution data for paritaprevir-α, paritaprevir-β and grazoprevir (Potterton et al., 2003
). The crystallographic data for corilagin can be found in Table 2, and the data for grazoprevir, paritaprevir-α and paritaprevir-β in the supporting information, Tables S3–S25.
2.5. Automated molecular replacement
Initial conformers were generated from SMILES notation using the CONFORGE package as provided within the Python bindings for the C++ CDPKit library, with default settings including 20 kcal mol−1 energy window, 300 output conformers, 0.5 Å minimum RMSD between conformers and 2 h maximum generation time (Seidel et al., 2023
). Additionally, the number of sampled conformers was unlimited for CONFORGE conformer generation for corilagin, which was necessary to sample a wider range of the conformational landscape of this molecule. Similar settings were used for conformer generation using RDKit's EKTDGV3 (Wang et al., 2020
).
During HAMR cycles, conformers were generated by modifying dihedral angles using Python bindings for the C++ RDKit library. Structure factors were calculated using Python bindings for the C++ gemmi library via the provided electron density calculator in electron scattering mode with approximate isotropic temperature factor fitting and scaling without any solvent mask (Wojdyr, 2022
). Molecular replacement was performed using the PHASER method as provided in the CCP4 program suite by utilizing the binary executable directly (Potterton et al., 2003
; McCoy et al., 2007
).
As a part of the pre-validation check before initiating MR in PHASER, the algorithm performs a composition check to ensure the volume of the structure will fit inside the asymmetric unit (ASU) defined by the unit-cell dimensions (McCoy et al., 2007
). However, because PHASER was developed almost exclusively for proteins and other macromolecules, with only a small nod to small molecules as protein ligands, this composition check is often inaccurate for small molecules, resulting in an incorrectly failed pre-validation check. To circumvent this, a workaround was used where the composition of the was defined only as a lone hydrogen atom, which will always pass this pre-validation check. This composition is later updated during the PHASER algorithm with the actual provided structure after this pre-validation is complete and successful, thereby not reducing the validity of any of the results of MR.
2.6. Refinement
was performed on solutions after completion of HAMR cycling using the refine program as provided in the PHENIX program suite by utilizing the binary executable directly (Liebschner et al., 2019
). The Rfree test set was generated before every at 5% of the total intensities, except for paritaprevir-α at 1.8 Å resolution, where 10% of the total intensities were used. Individual sites in individual sites in real space, individual isotropic atomic displacement parameters, simulated annealing in Cartesian and torsion space, optimization of XYZ weighting, and optimization of ADP weighting using the electron scattering form factors were performed for five cycles each for all refinements.
3. Results
3.1. Initial fragmentation MR of known structures
Initially, we attempted to utilize the fragmentation-based multicomponent search MR for small molecules developed by Gorelik et al. (Gorelik et al., 2023
). However, as is readily apparent in the structures for both paritaprevir and grazoprevir shown in Fig. 1
, both lack any obvious way to fragment the compound into solely rigid fragments. While paritaprevir does contain the pyrimidine and benzoquinoline side chains which are rigid, the sulfonyl side chain and core have significant flexibility. Similarly, grazoprevir contains a tert-butyl side chain, which is rigid, but again the sulfonyl side chain and core have significant flexibility. Despite this, a fragmentation strategy was used as an initial trial set where the macrocyclic core and all side chains, regardless of flexibility, were separated into individual fragments as our best approximation of the appropriate MR search input structures. Attempting this in all possible permutations of fragments and core unfortunately did not result in a solved structure, highlighting the necessity of rigid moieties when using the fragmentation-based MR procedure.
3.2. Initial conformer generation analysis
We devised a new strategy wherein a series of conformers were generated for each compound and then individually phased using MR in an automated fashion using a Python script. We tested two open-source, freely available conformer generators based on distance geometry (CONFORGE and EKTDGV3) for this conformer generation. The initial MR phasing results were evaluated by the log-likelihood gain (LLG) and the translation function Z-score (TFZ), the two best predictors of model accuracy when using MR (McCoy et al., 2005
). Although it has been suggested that for small molecules TFZ- and LLG-based data analysis cannot readily discriminate structures that are somewhat similar to the correct structure (Gorelik et al., 2023
), these metrics do serve as a useful tool for eliminating faulty solutions that are not at all similar to the correct solution. Our analysis showed that of the two conformer generators, the average LLG and TFZ score for grazoprevir, paritaprevir-α and paritaprevir-β are best for CONFORGE, suggesting that this conformer generator produces the highest quality conformers for our method (Table S1).
Although this trend is useful and did point us in the right direction, we were unable to arrive at any solution with acceptable Rwork, Rfree or a qualitatively low RMSD compared with the solution found using ab initio methods after and further modification of the conformers was necessary to successfully produce a valid solution via MR.
3.3. HAMR of grazoprevir, paritaprevir-α and paritaprevir-β
A scheme was devised that involved the automated modification of the conformers generated by CONFORGE by modifying each dihedral angle individually and determining the accuracy of these modified conformers using both the R factor of the initially phased electrostatic potential map and the LLG from the re-phased modified conformer, which is shown visually in Fig. 2
. In an automated fashion, this process was repeated for each dihedral angle that is not within the macrocycle core – which is avoided due to the complexity of modifying dihedral angles within a cyclic structure while maintaining the correct chemical bonding. Instead, sampling of conformational space of the macrocyclic core is handled by CONFORGE, as for each HAMR attempt a different input structure with a different core conformation is optimized. This system was designed as a genetic algorithm, where the top solutions from each cycle are used as starting models for the next cycle until all dihedral angles have been fully optimized, as this problem can be viewed as a simple global optimization problem of LLG, which is well suited for genetic algorithms (Katoch et al., 2021
).
After selection of the best CONFORGE conformer and modification of the selected dihedral angle, the R factor of the modified conformer in the same position in the as the previously phased result is used as a form of filtering to determine which modified conformers are worth performing MR on. This was done as the MR process is the largest bottleneck in this algorithm, and from our testing this R factor serves as an approximate proxy for determining the initial validity of the modified conformers, despite still needing MR for determination of the pose of the conformer within the to arrive at a final solved structure. Similarly, to avoid running unnecessary MR computations, a de-duplication check is performed each cycle, wherein conformers within 0.02 Å RMSD of an already phased solution are discarded, as increased structural diversity of conformers tested represents a wider search of total conformational space.
As previously mentioned, this method was applied to grazoprevir, paritaprevir-α and paritaprevir-β at each dataset's highest resolution (0.99, 0.85 and 0.95 Å, respectively) along with 1.0, 1.2, 1.4, 1.5, 1.6, 1.8, 2.0 and 2.5 Å resolution as a simulation of lower quality data, for which all results are summarized in Table 1
; further refinement statistics are provided in Tables S3–S27. Example electrostatic potential maps for all validation compounds are included in Fig. 3
, showing that all atoms fall within the electrostatic potential map, but the sharpness of the map dramatically decreases with decreasing resolution. We investigated the observed trends in RMSD, Rwork and Rfree by performing MR and with the same settings on the ab initio solved structure and observed similar trends compared to the HAMR solved structures (Table S2).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Figure 3 HAMR structure solutions (light blue sticks) for paritaprevir-α (a–c), paritaprevir-β (d–f) and grazoprevir (g–j) at varying resolutions with the resultant 2mFo − DFc map (dark blue mesh). 2mFo − DFc maps are contoured at 3.0, 2.5 and 1.5σ above the mean for 1.0, 1.5 and 2.0 Å resolution cutoffs, respectively. |
3.4. HAMR of corilagin
After we had validated the developed method on grazoprevir, paritaprevir-α and paritaprevir-β, we set out to productively use this method by solving the structure of another macrocycle without a known structure, corilagin, which our lab has been unable to solve with ab initio methods, likely due to the disorder from the hexose ring and multiple hydroxyl groups (Willart et al., 2010
). The process for our HAMR method applied to corilagin was exactly the same as for the validation compounds except for including one additional setting in the initial CONFORGE conformer generation. In CONFORGE, the number of sampled conformers, which is normally limited to 2000 by default without issue, was set to unlimited. This molecule is somewhat less flexible than the validation compounds, resulting in an early exit of CONFORGE without sufficient sampling of conformational space due to the appearance of many duplicate conformers. The HAMR output structure showed void volumes along the crystallographic a axis [Fig. 4
(b)]. Although voids do not inherently prevent structure determination, they can contribute to difficulties in solving the structure using ab initio methods, due to relatively poor and low-efficiency crystal packing (Steed & Steed, 2015
). Additionally, unlike our validation compounds, weak positive electrostatic potential difference densities were observed in the HAMR output, which were attributed to two unmodelled water-molecule sites that were partially occupied in the void volumes, as the preparation of corilagin involved recrystallization from water. Currently HAMR does not support automatic addition of solvent molecules, so these water molecules were manually added to the crystal structure and refined once again with the same settings as used previously to arrive at the final structure. The data for the solved structure obtained from this HAMR run after manual solvent addition for corilagin are summarized in Table 2
, and the structure is depicted in Fig. 4
(a).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| | Figure 4 (a) HAMR solution structure (light blue sticks) of corilagin at 1.1 Å resolution with resultant 2mFo − DFc map (dark blue mesh). The 2mFo − DFc map is contoured to 3.0σ above the mean. (b) Crystal packing analysis of HAMR output for the corilagin structure shows void volumes along the crystallographic a axis. |
4. Discussion
4.1. Trends in data
As can be seen in Table 1
, the R factors generally improve inversely with data resolution – i.e. lower-resolution MicroED data produce improved statistics until an absolute limit is reached wherein only limited useful structural information is able to be resolved from the MicroED data, which is 2.5 Å for all validation compounds. This inverse trend is primarily due to a noticeable increase in I/σ(I) for lower-resolution shells, as shown in Fig. 5
; because the R factor is calculated as a function of the standard error in the intensities, it should rightfully decrease with a decreased standard error, which is what we observe. Furthermore, the RMSD between the ab initio solved structure and the HAMR solved structure has the opposite trend, generally increasing as a function of decreased resolution. Once again, this intuitively makes sense: as the resolution decreases, the number of reflections also decreases, leading to less structural information that can be determined from the reflections and a less accurate model at lower resolution. However, all RMSD values are reasonable and do not suggest that a different conformer is produced from HAMR compared with ab initio methods, as the RMSD values are well below the commonly used threshold for unique conformers of 0.5 Å RMSD (Olanders et al., 2020
). Instead, the RMSDs illustrate slight differences in atomic coordinates for every single atom, rather than drastically large conformational changes (Figs. S1–S3).
| | Figure 5 Comparison of refined Rwork values for HAMR solutions for validation compounds to I/(I)σ values of the datasets with resolution cutoffs applied, displaying an inverse relationship between these two statistics. |
4.2. Limitations of HAMR
As mentioned previously, HAMR reliably solved the structures for paritaprevir and grazoprevir up to 2.0 Å. Beyond this resolution, however, the method failed to provide meaningful structural information. This threshold is evident from the R factors, which exceed 0.45 at 2.5 Å for all structures. Correspondingly, the RMSD values for our validation compounds also increase significantly – exceeding 1 Å in all cases – further indicating that the practical lower resolution limit is approximately 2 Å, as reflected in the observable R factors.
Additionally, at lower resolutions (>1.4 Å) for all compounds, the electrostatic potential map becomes less detailed, as can be seen in Fig. 3
. This is expected, but does lead to indistinguishability of similar scatterers. This indistinguishability is a noticeable limitation at this resolution, as can be seen in Table 1
, where the unadjusted RMSD for paritaprevir-β above 1.5 Å lies well above the RMSD values at higher resolutions. Because carbon and nitrogen have somewhat similar scattering factors and also because of the blurring of the electron density, the method cannot easily distinguish between these atoms at the lower resolutions. Thus, the pyrimidine ring side chain on paritaprevir appears pseudo-C2 symmetric. This results in the output structure from HAMR methods above 1.4 Å for paritaprevir-β having no preference for this pyrimidine ring in the correct position or flipped 180° along the axis in the plane of the substituted ring. For this reason, we acknowledged this limitation and corrected this RMSD in Table 1
for paritaprevir-β, by calculating the RMSD based on atomic positions of heavy atoms alone not considering atomic species, which is shown in this table as the adjusted RMSD. After performing this correction, a similar trend to paritaprevir-α and grazoprevir – namely slightly decreasing the RMSD as a function of increasing resolution – is observed. However, at the high resolutions the method can easily distinguish between these atoms and arrives at the correct solution for both paritaprevir-α and paritaprevir-β.
Interestingly, we do not observe this same indistinguishability issue in paritaprevir-α, presumably due to the difference in data quality in these two datasets; while the completeness for both datasets are high, 89.0% and 98.4% respectively, paritaprevir-α has a noticeably higher signal-to-noise ratio than paritaprevir-β: the I/σ(I) values of the highest resolution dataset are 2.8 and 2.3, respectively. Thus, the improved I/σ(I) of the paritaprevir-α dataset contains enough information to effectively distinguish between carbon and nitrogen in this case, suggesting that the HAMR method may be relatively robust to small differences in data completeness but is significantly sensitive to the noisiness of the data. We further observed that this correlation is also observed with the R factors – i.e. as I/σ(I) increases and the data become less noisy, both Rwork and Rfree improve, as can be seen in Fig. 5
. This explains why the R factors, which are representative of model accuracy, improve with lower resolution despite the decreased amount of information present at lower resolutions, and further supports the observed sensitivity of our method to dataset noise.
Finally, we have only tested this method on pharmaceutically relevant macrocyclic small molecules. While in principle this method should work well for non-macrocyclic molecules, this remains unexplored territory. Similarly, we have only tested this method on MicroED data; however, both the implementations for PHASER and PHENIX are readily able to perform these same calculations on X-ray data, and such data should be compatible with this HAMR method.
4.3. Estimated time to completion
As was mentioned previously, the largest bottleneck in this algorithm is the MR process, as when using the default settings of HAMR several thousand conformers must be phased via MR. Because the field of MR applied to small molecules is largely unexplored systematically, it is not easy to predict how long a single MR calculation will take, and during method development a trend was also not easy to deduce. On average, the completion time for all compounds at the highest resolution cutoff is of the order of one to five hours on a personal laptop. This time to completion is generally similar at all resolutions, except at very low resolutions (approximately ≥ 1.5 Å), where multiple HAMR cycles must be performed due to the decreased discriminatory ability of LLG as a metric at these low resolutions, resulting in a noticeably increased time to completion.
5. Conclusions
Within this paper, we report a novel high-throughput workflow for of pharmaceutically relevant molecules using automated MR and MicroED data. This was validated at resolutions ranging from 1.0 to 2.0 Å against three ab initio solved structures, which showed good agreement for all HAMR solutions and the respective ab initio structures, and publishable R factors. Additionally, the method was used to solve a novel macrocycle with pharmaceutical relevance, corilagin, that could not be solved with ab initio methods. We predict that this method will see widespread use for even more complex and conformationally flexible molecules than those described here, including important bioactive toxins and natural products, which typically yield resolutions greater than 1.0 Å. This tool is available for download on request.
Supporting information
CCDC reference: 2543412
Tables S1-S25 and figures S1-S3. DOI: https://doi.org/10.1107/S2052252526002095/vq5007sup1.pdf
Coordinates for coilagin in format. DOI: https://doi.org/10.1107/S2052252526002095/vq5007sup2.cif
Coordinates for corilagin in PDB format. DOI: https://doi.org/10.1107/S2052252526002095/vq5007sup3.txt
Reflection data for corilagin in MTZ format. DOI: https://doi.org/10.1107/S2052252526002095/vq5007sup4.bin
| C27H22O18·2(H2O) | c = 24.41 Å |
| Orthorhombic, P21221 | V = 2699.61 Å3 |
| a = 6.96 Å | T = 100 K |
| b = 15.89 Å | Plate, white |
| x | y | z | Uiso*/Ueq | ||
| C1 | 0.143391 | 0.475459 | 1.17837 | 0.0269* | |
| C10 | −0.043965 | 0.257459 | 1.31733 | 0.0409* | |
| C11 | −0.118534 | 0.207868 | 1.27485 | 0.0552* | |
| C12 | −0.154598 | 0.122782 | 1.28354 | 0.0593* | |
| C13 | −0.231465 | 0.073254 | 1.24196 | 0.0541* | |
| C14 | −0.273563 | 0.107741 | 1.19136 | 0.0578* | |
| C15 | −0.238937 | 0.19226 | 1.18218 | 0.0447* | |
| C16 | −0.162931 | 0.241914 | 1.22368 | 0.0420* | |
| C17 | −0.128448 | 0.334363 | 1.21049 | 0.0343* | |
| C18 | −0.013793 | 0.438769 | 1.14462 | 0.0290* | |
| C19 | 0.026724 | 0.438328 | 1.08419 | 0.0324* | |
| C2 | 0.343678 | 0.443174 | 1.16445 | 0.0377* | |
| C20 | 0.23046 | 0.410134 | 1.06891 | 0.0404* | |
| C21 | 0.246551 | 0.274451 | 1.02651 | 0.0310* | |
| C22 | 0.22931 | 0.179485 | 1.0338 | 0.0424* | |
| C23 | 0.267528 | 0.125048 | 0.99091 | 0.0474* | |
| C24 | 0.250718 | 0.038893 | 0.998775 | 0.0477* | |
| C25 | 0.194827 | 0.007930 | 1.04957 | 0.0531* | |
| C26 | 0.156465 | 0.062493 | 1.09222 | 0.0494* | |
| C27 | 0.173563 | 0.148333 | 1.08444 | 0.0443* | |
| C3 | 0.371551 | 0.356138 | 1.18591 | 0.0356* | |
| C4 | 0.263936 | 0.283765 | 1.26477 | 0.0500* | |
| C5 | 0.137069 | 0.294401 | 1.31525 | 0.0317* | |
| C6 | 0.203592 | 0.343677 | 1.35834 | 0.0514* | |
| C7 | 0.090661 | 0.355446 | 1.40451 | 0.0661* | |
| C8 | −0.088937 | 0.318441 | 1.40713 | 0.0465* | |
| C9 | −0.155172 | 0.269857 | 1.36383 | 0.0457* | |
| O1 | 0.144396 | 0.565012 | 1.16908 | 0.0390* | |
| O10 | −0.172557 | 0.388297 | 1.24159 | 0.0420* | |
| O11 | −0.045115 | 0.351544 | 1.15953 | 0.0305* | |
| O12 | −0.003161 | 0.521463 | 1.06354 | 0.0415* | |
| O13 | 0.237212 | 0.324168 | 1.07395 | 0.0356* | |
| O14 | 0.266954 | 0.303526 | 0.982184 | 0.0400* | |
| O15 | 0.289799 | −0.016237 | 0.955432 | 0.0690* | |
| O16 | 0.176868 | −0.078792 | 1.0581 | 0.0657* | |
| O17 | 0.099856 | 0.031215 | 1.14327 | 0.0575* | |
| O18 | 0.383046 | 0.445188 | 1.10402 | 0.0361* | |
| O2 | 0.341092 | 0.357459 | 1.24245 | 0.0507* | |
| O3 | 0.294827 | 0.216364 | 1.24564 | 0.0518* | |
| O4 | 0.156753 | 0.404659 | 1.44814 | 0.0443* | |
| O5 | −0.203735 | 0.330209 | 1.45363 | 0.0448* | |
| O6 | −0.336925 | 0.232726 | 1.3669 | 0.0569* | |
| O7 | −0.113218 | 0.086155 | 1.33446 | 0.0542* | |
| O8 | −0.26681 | −0.012461 | 1.25056 | 0.0375* | |
| O9 | −0.350431 | 0.057206 | 1.14986 | 0.0499* | |
| H1 | 0.058190 | 0.591444 | 1.19349 | 0.0489* | |
| H2 | 0.113937 | 0.459788 | 1.22073 | 0.0342* | |
| H3 | 0.447126 | 0.483325 | 1.18493 | 0.0474* | |
| H10 | 0.018391 | 0.093581 | 1.34269 | 0.0670* | |
| H11 | −0.317385 | −0.036312 | 1.21815 | 0.0471* | |
| H12 | −0.485488 | 0.062493 | 1.1497 | 0.0619* | |
| H13 | −0.271264 | 0.219762 | 1.14265 | 0.0556* | |
| H14 | −0.139655 | 0.476592 | 1.15322 | 0.0369* | |
| H15 | −0.07227 | 0.394464 | 1.06535 | 0.0409* | |
| H16 | 0.10273 | 0.555698 | 1.07309 | 0.0519* | |
| H17 | 0.257902 | 0.430525 | 1.02733 | 0.0504* | |
| H18 | 0.310632 | 0.149592 | 0.951377 | 0.0589* | |
| H19 | 0.173419 | −0.032788 | 0.938677 | 0.0849* | |
| H20 | 0.101724 | −0.102329 | 1.02975 | 0.0811* | |
| H21 | 0.145977 | 0.067401 | 1.17108 | 0.0711* | |
| H22 | 0.143678 | 0.191253 | 1.11783 | 0.0552* | |
| H4 | 0.516666 | 0.335244 | 1.17731 | 0.0447* | |
| H5 | 0.269396 | 0.31391 | 1.1667 | 0.0447* | |
| H6 | 0.343821 | 0.372941 | 1.35584 | 0.0637* | |
| H7 | 0.11523 | 0.460795 | 1.44343 | 0.0552* | |
| H8 | −0.144253 | 0.304973 | 1.48432 | 0.0560* | |
| H9 | −0.360057 | 0.214539 | 1.40316 | 0.0703* | |
| O19 | −0.080891 | −0.111077 | 1.15965 | 0.0750* | |
| H25 | 0.061782 | −0.111077 | 1.15965 | ||
| H26 | −0.128448 | −0.060052 | 1.14048 | ||
| O20 | −0.631896 | 0.20384 | 1.45408 | 0.0940* | |
| H23 | −0.489223 | 0.20384 | 1.45408 | ||
| H24 | −0.679453 | 0.254865 | 1.43491 |
Footnotes
‡These authors contributed equally to this work.
Acknowledgements
We thank the UCLA MicroED Imaging Center (MEDIC) for access to expertise and equipment. MEDIC is supported by funds from NIGMS P41GM136508. This research was supported by the University of California, Riverside.
References
Abergel, C. (2013). Acta Cryst. D69, 2167–2173. Web of Science CrossRef CAS IUCr Journals Google Scholar
Andrusenko, I., Potticary, J., Hall, S. R. & Gemmi, M. (2020). Acta Cryst. B76, 1036–1044. Web of Science CSD CrossRef IUCr Journals Google Scholar
Brázda, P., Palatinus, L. & Babor, M. (2019). Science 364, 667–669. Web of Science PubMed Google Scholar
Brehm, W., Triviño, J., Krahn, J. M., Usón, I. & Diederichs, K. (2023). J. Appl. Cryst. 56, 1585–1594. Web of Science CrossRef CAS IUCr Journals Google Scholar
Brunger, A. T. (1991). Annu. Rev. Phys. Chem. 42, 197–223. CrossRef Web of Science Google Scholar
Bu, G., Danelius, E. L. H., Wieske, L. H. E. & Gonen, T. (2024). Adv. Biol. 8, 2300570. Web of Science CSD CrossRef Google Scholar
Burla, M. C., Caliandro, R., Carrozzini, B., Cascarano, G. L., Cuocci, C., Giacovazzo, C., Mallamo, M., Mazzone, A. & Polidori, G. (2015). J. Appl. Cryst. 48, 306–309. Web of Science CrossRef CAS IUCr Journals Google Scholar
Burley, S. K. & Berman, H. M. (2021). Structure 29, 515–520. Web of Science CrossRef CAS PubMed Google Scholar
Danelius, E., Bu, G., Wieske, L. H. & Gonen, T. (2023a). ACS Chem. Biol. 18, 2582–2589. Web of Science CSD CrossRef CAS PubMed Google Scholar
Danelius, E. K., Patel, B., Gonzalez, B. & Gonen, T. (2023b). Curr. Opin. Struct. Biol. 79, 102549. Web of Science CrossRef PubMed Google Scholar
DeGoey, D. A., Chen, H. J., Cox, P. B. & Wendt, M. D. (2018). J. Med. Chem. 61, 2636–2651. Web of Science CrossRef CAS PubMed Google Scholar
Deschner, F., Mostert, D., Daniel, J. M., Voltz, A., Schneider, D. C., Khangholi, N., Bartel, J., Pessanha de Carvalho, L., Brauer, M., Gorelik, T. E., Kleeberg, C., Risch, T., Haeckl, F. P. J., Herraiz Benítez, L., Andreas, A., Kany, A. M., Jézéquel, G., Hofer, W., Müsken, M., Held, J., Bischoff, M., Seemann, R., Brötz-Oesterhelt, H., Schneider, T., Sieber, S., Müller, R. & Herrmann, J. (2025). Cell. Chem. Biol. 32, 586–602.e15. Web of Science CSD CrossRef CAS PubMed Google Scholar
Feyand, M., Mugnaioli, E., Vermoortele, F., Bueken, B., Dieterich, J. M., Reimer, T., Kolb, U., de Vos, D. & Stock, N. (2012). Angew. Chem. Int. Ed. 51, 10373–10376. Web of Science CSD CrossRef CAS Google Scholar
Gemmi, M., Mugnaioli, E., Gorelik, T. E., Kolb, U., Palatinus, L., Boullay, P., Hovmöller, S. & Abrahams, J. P. (2019). ACS Cent. Sci. 5, 1315–1329. Web of Science CrossRef CAS PubMed Google Scholar
Gorelik, T. E., Lukat, P., Kleeberg, C., Blankenfeldt, W. & Mueller, R. (2023). Acta Cryst. A79, 504–514. Web of Science CSD CrossRef IUCr Journals Google Scholar
Gorelik, T. E., van de Streek, J., Kilbinger, A. F. M., Brunklaus, G. & Kolb, U. (2012). Acta Cryst. B68, 171–181. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Hattne, J. (2021). CryoEM: Methods and Protocols, pp. 309–319. New York: Springer US. Google Scholar
Hattne, J., Reyes, F. E., Nannenga, B. L., Shi, D., de la Cruz, M. J., Leslie, A. G. W. & Gonen, T. (2015). Acta Cryst. A71, 353–360. Web of Science CrossRef IUCr Journals Google Scholar
Haymaker, A. & Nannenga, B. L. (2024). Curr. Opin. Struct. Biol. 84, 102741. Web of Science CrossRef PubMed Google Scholar
Jiang, R., Gogineni, T., Kammeraad, J., He, Y., Tewari, A. & Zimmerman, P. M. (2022). J. Comput. Chem. 43, 1880–1886. Web of Science CrossRef CAS PubMed Google Scholar
Jones, C. G., Martynowycz, M. W., Hattne, J., Fulton, T. J., Stoltz, B. M., Rodriguez, J. A., Nelson, H. M. & Gonen, T. (2018). ACS Cent. Sci. 4, 1587–1592. Web of Science CSD CrossRef CAS PubMed Google Scholar
Kabsch, W. (2010). Acta Cryst. D66, 125–132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Katoch, S., Chauhan, S. S. & Kumar, V. (2021). Multimed. Tools Appl. 80, 8091–8126. Web of Science CrossRef PubMed Google Scholar
Li, X., Deng, Y., Zheng, Z., Huang, W., Chen, L., Tong, Q. & Ming, Y. (2018). Biomed. Pharmacother. 99, 43–50. Web of Science CrossRef CAS PubMed Google Scholar
Liebschner, D., Afonine, P. V., Baker, M. L., Bunkóczi, G., Chen, V. B., Croll, T. I., Hintze, B., Hung, L.-W., Jain, S., McCoy, A. J., Moriarty, N. W., Oeffner, R. D., Poon, B. K., Prisant, M. G., Read, R. J., Richardson, J. S., Richardson, D. C., Sammito, M. D., Sobolev, O. V., Stockwell, D. H., Terwilliger, T. C., Urzhumtsev, A. G., Videau, L. L., Williams, C. J. & Adams, P. D. (2019). Acta Cryst. D75, 861–877. Web of Science CrossRef IUCr Journals Google Scholar
Lightowler, M., Li, S., Ou, X., Zou, X., Lu, M. & Xu, H. (2022). Angew. Chem. 134, e202114985. CrossRef Google Scholar
Lin, J. G., Bu, J., Unge, J. & Gonen, T. (2024). Adv. Sci. 11, 2406494. Web of Science CSD CrossRef Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. (1997). Adv. Drug Deliv. Rev. 23, 3–25. CrossRef CAS Web of Science Google Scholar
McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674. Web of Science CrossRef CAS IUCr Journals Google Scholar
McCoy, A. J., Grosse-Kunstleve, R. W., Storoni, L. C. & Read, R. J. (2005). Acta Cryst. D61, 458–464. Web of Science CrossRef CAS IUCr Journals Google Scholar
Mu, X., Gillman, C., Nguyen, C. & Gonen, T. (2021). Annu. Rev. Biochem. 90, 431–450. Web of Science CrossRef CAS PubMed Google Scholar
Olanders, G., Alogheli, H., Brandt, P. & Karlén, A. (2020). J. Comput. Aided Mol. Des. 34, 231–252. Web of Science CrossRef CAS PubMed Google Scholar
Palatinus, L. P., Brázda, P., Boullay, O., Perez, M., Klementová, S., Petit, V., Eigner, M., Zaarour, M. & Mintova, S. (2017). Science 355, 166–169. Web of Science CSD CrossRef CAS PubMed Google Scholar
Poongavanam, V., Danelius, E., Peintner, S., Alcaraz, L., Caron, G., Cummings, M. D., Wlodek, S., Erdelyi, E., Hawkins, P. C. D., Ermondi, G. & Kihlberg, J. (2018). ACS Omega 3, 11742–11757. Web of Science CrossRef CAS PubMed Google Scholar
Potterton, E., Briggs, P., Turkenburg, M. & Dodson, E. (2003). Acta Cryst. D59, 1131–1137. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rossmann, M. G. (1990). Acta Cryst. A46, 73–82. CrossRef CAS Web of Science IUCr Journals Google Scholar
Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779. Web of Science CrossRef CAS IUCr Journals Google Scholar
Seidel, T., Permann, C., Wieder, O., Kohlbacher, S. M. & Langer, T. (2023). J. Chem. Inf. Model. 63, 5549–5570. Web of Science CrossRef CAS PubMed Google Scholar
Sheldrick, G. M. (2015). Acta Cryst. A71, 3–8. Web of Science CrossRef IUCr Journals Google Scholar
Sheldrick, G. M., Gilmore, C. J., Hauptman, H. A., Weeks, C. M., Miller, R. & Usón, I. (2012). International Tables for Crystallography, Vol. F, ch 16.1, pp. 413–432. https://doi.org/10.1107/97809553602060000850. Google Scholar
Shi, D., Nannenga, B. L., Iadanza, M. G. & Gonen, T. (2013). eLife 2, e01345. Web of Science CrossRef PubMed Google Scholar
Steed, K. M. & Steed, J. W. (2015). Chem. Rev. 115, 2895–2933. Web of Science CrossRef CAS PubMed Google Scholar
Timofeev, V. & Samygina, V. (2023). Crystals 13, 71. Web of Science CrossRef Google Scholar
Unge, J. J., Lin, J., Weaver, A., Sae Her, A. & Gonen, T. (2024). Adv. Sci. 11, 2400081. Web of Science CrossRef Google Scholar
van de Streek, J. & Neumann, M. A. (2010). Acta Cryst. B66, 544–558. Web of Science CrossRef CAS IUCr Journals Google Scholar
van Genderen, E., Clabbers, M. T. B., Das, P. P., Stewart, A., Nederlof, I., Barentsen, K. C., Portillo, Q., Pannu, N. S., Nicolopoulos, S., Gruene, T. & Abrahams, J. P. (2016). Acta Cryst. A72, 236–242. Web of Science CSD CrossRef IUCr Journals Google Scholar
Wang, S., Witek, J., Landrum, G. A. & Riniker, S. (2020). J. Chem. Inf. Model. 60, 2044–2058. Web of Science CrossRef CAS PubMed Google Scholar
Watts, K., Dalal, P., Tebben, A. J., Cheney, D. L. & Shelley, J. C. (2014). J. Chem. Inf. Model. 54, 2680–2696. Web of Science CrossRef CAS PubMed Google Scholar
Wieske, L. H. E., Bu, G., Erdélyi, M., Kihlberg, J., Gonen, T. & Danelius, E. R. (2026). Chem. A Eur. J. 32, e02256. Web of Science CSD CrossRef Google Scholar
Willart, J. F., Dujardin, N., Dudognon, E., Danède, F. & Descamps, M. (2010). Carbohydr. Res. 345, 1613–1616. Web of Science CrossRef CAS PubMed Google Scholar
Wojdyr, M. (2022). J. Open Source Softw. 7, 4200. CrossRef Google Scholar
Woollam, G. R., Das, P. P., Mugnaioli, E., Andrusenko, I., Galanis, A. S., van de Streek, J., Nicolopoulos, S., Gemmi, M. & Wagner, T. (2020). CrystEngComm 22, 7490–7499. Web of Science CSD CrossRef CAS Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
access

journal menu



