Can the results of quantum refinement be improved with a continuum-solvation model?

Quantum refinement has been shown to be a powerful approach to interpret and improve macromolecular crystal structures. Previous studies have shown that the results of quantum refinement can be improved if the charge of the quantum mechanical (QM) system is reduced by adding neutralizing groups. Here it is shown that a similar improvement can be obtained if the original highly charged QM system is instead immersed in a continuum solvent in the QM calculations.


Introduction
X-ray crystallography is the prime method for obtaining threedimensional structures of proteins. Owing to the limited resolution of protein crystal structures, it is normally necessary to introduce empirical restraints in the crystallographic refinement to ensure that the structures make chemical sense (i.e. give reasonable bond lengths and angles) (Kleywegt & Jones, 1997). These restraints are usually derived from highresolution structures (Engh & Huber, 1991). In the language of computational chemistry, they represent a molecular mechanics force field. Therefore, standard crystallographic refinement optimizes an energy function of the form: where E X-ray is a crystallographic goodness-of-fit criterion, reflecting how well the current model (coordinates, occupancies and atomic displacement parameters), reproduce the experimental data, E MM is the empirical restraints and w A is a weight factor determining the relative importance of the two terms. This shows that protein crystal structures to a significant degree are actually computational and that they depend on the accuracy of the empirical restraints. The empirical restraints are accurate for protein residues and nucleic acids, for which there are ample accurate experimental data. However, for cofactors, substrates and inhibitors, ISSN 2052-5206 much less information is available, making the restraints less certain. Even worse, for metal sites it is hard to construct an empirical potential (Hu & Ryde, 2011) and it depends strongly on all the ligands and the charge and spin state of the metal. Therefore, these parts of the crystal structures have a lower accuracy than the amino acid parts.
To overcome these problems, we have suggested that the empirical restraints can be replaced by quantum-mechanical (QM) calculations . This can be done for a small, but interesting, part of the structure (e.g. the active site) in the same way as in standard combined quantum-mechanical and molecular-mechanical (QM/MM) methods (Senn & Thiel, 2009;Ryde, 2016). This part will be called system 1 (or the QM system) in the following. This leads to the quantum-refinement energy function: Here, E QM1 is the QM energy of system 1 and we need to subtract the corresponding MM energy of system 1, E MM1 , to avoid double counting of energy terms. Finally, w MM is another weight factor that is necessary because the empirical restraints are normally in statistical units, whereas the QM energy is in energy units. Another problem with standard X-ray crystallography is that it normally cannot discern H atoms and that it is not possible to distinguish between isoelectronic groups. These problems apply in principle also to quantum refinement. However, the addition of protons or replacement of one group with an isoelectronic group leads to subtle changes in the surrounding structure, owing to changes in the net charge, the hydrogen-bond pattern or changes in the chemistry. By the use of QM calculations, accurate information regarding the chemical preferences is introduced into the refinement (such information could in principle also be incorporated into the empirical restraints, but it is seldom available or accurate enough, especially as the restraints normally exclude electrostatics, which provide the most sensitive information about the chemical structure). With quantum refinement, different structural interpretations can be compared to decide which fits the crystal structure best using standard crystallography quality measures, like electron-density difference maps and real-space Z scores (Tickle, 2012), and QM measures, like strain energy Ryde, 2002;Borbulevych et al., 2016). We have shown that quantum refinement can locally improve crystal structures (Ryde & Nilsson, 2003) and that it can distinguish the protonation state of metal-bound ligands (Nilsson & Ryde, 2004;Cao et al., 2017Cao et al., , 2018bCaldararu et al., 2018), the oxidation state of metal sites (Rulíšek & Ryde, 2006;Cao et al., 2019), detect photoreduction of metal ions (Nilsson & Ryde, 2004;Sö derhjelm & Ryde, 2006;Rulíšek & Ryde, 2006) and solve scientific problems regarding what is really seen in crystal structures (Sö derhjelm & Ryde, 2006;Cao et al., 2019). Several other groups have implemented this and similar approaches (Yu et al., 2005;Borbulevych et al., 2014;Hsiao et al., 2010;Zheng et al., 2017;Genoni et al., 2018;Fadel et al., 2015) and shown that it can be used to reduce errors in crystal structures and decide protonation and tautomeric states (Yu et al., 2006;Borbulevych et al., 2016Borbulevych et al., , 2021. Other approaches have also been developed to improve crystal structures with QM data (Genoni et al., 2018). In particular, QM calculations can be used to improve the calculated electron density of the model. In standard crystallography, it is assumed that the density is spherical around each atom, the independent-atom model (IAM) (Cromer, 1965). This is a poor approximation for H atoms, for which the density is displaced towards the bond partner, leading to bond lengths that are $0.1 Å too short. However, it can also be improved for other atoms in accurate crystal structures, giving information about chemical bonding and lone pairs. This can be done by using a multipole model (Hansen & Coppens, 1978), but it requires too many refined parameters for biomacromolecules. Therefore, various databases of multipolar parameters have been constructed from QM calculations or experimental structures (Jarzembska & Dominiak, 2012;Domagała et al., 2012;Dittrich et al., 2013). Even more accurate results can be obtained if the QM density is employed directly in the refinement, which is the aim of Hirshfeld atom refinement (HAR) (Jayatilaka & Dittrich, 2008;Capelli et al., 2014). This approach is also too demanding to apply to biomacromolecules, but a fragmentation approach has been developed (Bergmann et al., 2020) and a combination of HAR with a library of extremely localized molecular orbitals have been applied to small proteins like crambin (Malaspina et al., 2019).
We have recently observed that the results of quantum refinement can be improved if a QM system with a large net change is partially neutralized by including nearby residues with the opposite charge . Naturally, the time consumption of quantum-refinement calculations strongly depend on the size of the QM system. Therefore, it not optimal to add residues to the QM system only to compensate the charge. An alternative may be to include a continuumsolvent model. In this study, we test whether the results of quantum refinement can be improved by employing a COSMO continuum solvent in the QM calculations (Klamt & Schü ü rmann, 1993). We tested it for five crystal structures of Mo and V nitrogenase, particulate methane monooxygenase and acetylcholin esterase, i.e. systems with or without metals and with varying net charges. We investigate how the results depend on the dielectric constant employed for the solvent model and how the discriminating power of the quantum refinement is affected.

Quantum refinement
The quantum-refinement calculations were run with the ComQum-X software , which is an interface between the QM software Turbomole (Furche et al., 2014) and the crystallography and NMR system (CNS) software (Brü nger et al., 1998;Brunger, 2007). For V nitrogenase with an OH À or NH 2À ligand, there are two conformations in the active site  and we therefore used the ComQumX-2QM approach . The full protein was considered in all calculations, including all crystal water molecules. For the protein, we used the standard CNS force field (protein_rep.param, water_rep.param and ion.param). The empirical restraints for nonstandard residues were downloaded from the hetero compound information centre Uppsala (Kleywegt, 2007). For the w A factor (determining the relative weight between the crystallographic data and the empirical potential), we used the default value suggested by CNS (specified below). The w MM weight was set to 1 3 as in all our previous studies Cao et al., 2017). For the crystallographic target function, we used the standard maximum-likelihood function using amplitudes (mlf) in CNS (Pannu & Read, 1996;Adams et al., 1997). CNS does not support anisotropic atomic displacement parameters (ADPs), so only isotropic ADPs were used. After quantum refinement, ADP refinement was performed using phenix. refine (Afonine et al., 2012). The electron-density maps were generated using phenix.maps.

Protein setup
The quantum-refinement calculations of V nitrogenase with a putative N-derived reaction intermediate were based on the 6fea crystal structure at 1.2 Å resolution (Sippel et al., 2018). The FeV cluster was modelled by VFe 7 S 7 C(CO 3 )(homocitrate)(CH 3 S)(imidazole), where the two last groups are models of Cys-257 and His-423 (all residues are from the D subunit of the crystal structure). In addition, the putative N-derived ligand, as well as models of Gln-176, His-180 and Phe-362, were included in the QM calculations, as can be seen in Fig. 1, a total of 90 atoms. For the larger QM system, we also included the nearby Lys-83, Arg-339 and Lys-361 residues [modelled by CH 3 NH 3 + or CH 3 NHC(NH 2 ) 2 + ; shown as thin sticks in Fig. 1]. Two different interpretations of the bound ligand were tested: OH À or NH 2À . OH À was assumed to bind to the E 0 resting state with a formal oxidation state of V III Fe II 4 Fe III 3 (Benediktsson et al., 2018). For NH 2À , a twoelectron oxidized state was assumed, corresponding to the E 6 state . We have shown that the crystal structure contains a significant amount (14%) of the undissociated S2B ligand, which was included in all refinements using a separate QM calculation, modelled in the resting E 0 state, and Gln-176 in the nonrotated conformation, observed in the crystal structure of the resting state (Sippel & Einsle, 2017). We assumed a quartet spin state for the FeV cluster (Benediktsson et al., 2018;. The w A factor was 0.0794.
The quantum-refinement calculations for the resting state of V nitrogenase were based on the 5n6y crystal structure, obtained at a resolution of 1.35 Å (Sippel & Einsle, 2017). We used two sizes of the QM system also for these calculations. The small QM system included only the FeV cluster (from the A subunit of the protein; VFe 7 S 8 C), the bidentate ligand, homocitrate, as well as the side chain of Cys-257 (CH 3 S À ) and the imidazole ring of His-423. In the large QM system, we included also the side chains of Lys-83 and Lys-362 (modelled as CH 3 NH 3 + ), as well as the whole Arg-339 (except O, but including a -COCH 3 group from Pro-338) and the side chain of Thr-335. The positively charged Lys and Arg residues were included to compensate the high negative charge of the FeV cluster, whereas Thr-335 and the backbone of Arg-339 form hydrogen bonds to the bidentate ligand. We tested three different interpretations of the bidentate ligand: CO 3 2À , HCO 3 À and NO 3 À . In all cases, we employed the oxidationstate assignment V III Fe II 4 Fe III 3 , corresponding to the E 0 resting state, and we assumed a quartet spin state for the FeV cluster (Benediktsson et al., 2018;. The w A factor was 0.145.
The calculations of Mo nitrogenase were based on the 3u7q crystal structure at 1.00 Å resolution (Spatzal et al., 2011). We studied the FeMo cluster in the C subunit of the protein. The QM system was the FeMo cluster (MoFe 7 S 8 C), homocitrate, the imidazole ring from His-442 and the side chain of Cys-275. In the large QM system, we included also the side chains of Arg-96 and Arg-359 [modelled as CH 3 NHC(NH 2 ) 2 + ]. Two protonation states were tested for the homocitrate ligand. One (1Ha) had a proton on the alcohol oxygen (O7), shared with a carboxylate oxygen (O1). The other (2H) had an additional proton on the O2 carboxylate atom (Cao et al., 2017). We assumed the standard formal oxidation-state assignment for the E 0 resting state, Mo III Fe II 3 Fe III 4 (Bjornsson et al., 2014) and a quartet electronic state (Hoffman et al., 2014). The w A factor was 0.0793. The 1Ha protonation state was used for the homocitrate ligand in the quantum refinement calculations of V nitrogenase (Benediktsson & Bjornsson, 2020;. The calculations on particulate methane monooxygenase (pMMO) were based on the 3rgb crystal structure at a resolution of 2.8 Å (Smith et al., 2011). The QM system involved one or two Cu ions and His-33, His-137 and His139 from the E Figure 1 The QM systems used for V nitrogenase. The small QM model contained only atoms shown with thick sticks, whereas the large QM model included also the three residues shown as thin sticks. subunit of the structure (only the imidazole rings of the latter two, but NH 2 CH 2 CH 2 -imidazole for His-33, i.e. including the amino terminal group, which also coordinates to Cu). The structures were taken from our previous investigation (Cao et al., 2018a). The copper ions were studied in the closed-shell reduced state and the w A factor was 4.91.
For all five crystal structures, coordinates, occupancies, ADPs and structure-factor amplitudes were obtained from the Protein Data Bank (PDB), together with the space group, unit-cell parameters, resolution limits, R factors and the test set used for the evaluation of the R free factor.
The quality of the models was judged by the real-space difference-density Z score (RSZD), calculated by EDSTATS (part of the CCP4 package; Winn et al., 2011), which measures the local accuracy of the model (Tickle, 2012). The maximum of the absolute negative and positive RSZD value was calculated for all atoms in the QM systems (atoms outside the QM system were kept frozen according to the original crystal structure). RSZD is typically less than 3.0 in absolute terms for a good model.
For the nitrogenase models, the QM calculations were performed with the broken-symmetry (BS) approach (Lovell et al., 2001): each of the seven Fe ions was modelled in the high-spin state, with either a surplus of (four Fe ions) or (three Fe ions) spin. We employed the broken-symmetry BS7-235 state with a surplus of spin on Fe2, Fe3 and Fe5 for all calculations. This is the best BS state for the resting state of Mo nitrogenase and also for several other states (Lovell et al., 2001;Cao & Ryde, 2018;Cao et al., 2018b), and this state was also used in previous studies on V nitrogenase (Benediktsson & Bjornsson, 2020;Bergmann et al., 2021). This state was obtained using the fragment approach by Szilagyi & Winslow (2006) or by swapping the coordinates of the Fe ions (Greco et al., 2011).

Results and discussion
In this study, we investigate whether the results of quantum refinement can be improved by carrying out the QM calculations in a COSMO continuum solvent (Klamt & Schü ü rmann, 1993) and whether the solvent may allow calculations with a smaller but more highly charged QM system. We test the approach for five different crystal structures to investigate  Table 1 RSZD values and strain energies (ÁE str in kJ mol À1 ) for the FeV cluster with different sizes of the QM system and different interpretations of the unknown ligand, replacing S2B (OH À or NH 2À ), in quantum-refinement calculations without (" = 1) or with a COSMO continuum solvent with two values of the dielectric constant (").
The structure shows two alternative conformations, one with S2B intact (AC2, 14% occupancy) and one with it replaced by the unknown ligand (AC1, 86% occupancy). Gln-176 also rotates between the two conformations. 'Sum' is the sum of the RSZD scores for the six residues shown in the when it is applicable, what is a proper choice of the dielectric constant and whether it affects the discriminatory power of quantum refinement. The results for each crystal structure are described in separate sections.
3.1. OH À or NH 2À binding to V nitrogenase We first study the 6fea crystal structure of V nitrogenase (1.2 Å resolution), showing a small ligand binding to the FeV cluster (Sippel et al., 2018). This ligand was originally interpreted as a reaction intermediate, NH 2À or NH 2 À . However, both QM/MM and quantum-refinement studies showed that it is rather OH À (Benediktsson et al., 2018;. The ligand replaces one of the 2 -bridging sulfide ions, S2B, but we showed that it contains a significant amount of undissociated S2B ($14%). Therefore, the quantum-refinement calculations were performed with the ComQumX-2QM approach , performing two separate QM calculations for the two alternative conformations. However, most importantly for this study, we showed that the structure was improved significantly if three positively charged residues were included in the QM system, to compensate the large negative charge of the QM system.
Here we investigate whether we can get a similar improvement with the original QM system, but using a COSMO model instead. Thus, we compare the results obtained using either a small QM model with only the directly coordinated ligand and three residues around the ligand or a larger model in which three neutralizing groups are added to the QM system, as can be seen in Fig. 1. The net charge of the small QM system is À5 or À6, depending on the interpretation of the unknown ligand, whereas it is À2 or À3 for the large QM system. We compare results obtained with three different dielectric constants: " = 1 (i.e. no COSMO model), 4 (a protein-like environment) and 80 (a water-like environment).
The results are collected in Table 1. We first discuss the results obtained with the preferred OH À ligand (upper half of the table). It can be seen that in a vacuum, " = 1, the large model (with one Arg and two Lys residues) gives appreciably lower RSZD scores than the smaller QM system for all residues, except S2B. Therefore, the sum of the RSZD scores is reduced from 18.3 to 12.7. This can also be seen in the electron-density difference maps in Figs. 2(a) and 2(b). However, a similar improvement is found also when the COSMO model is used for the small system: the sum of the RSZD scores is reduced from 18.3 in a vacuum to 11.2 with " = 80. The effect comes from all residues, except S2B, but it is especially large for the two conformations of Gln-176 (Fig. 2c). In fact, the result with " = 80 is better than that obtained with the large model in a vacuum.
It can also be seen that the strain energies (i.e. the difference in QM energy of the QM system between structures optimized with or without the crystallographic data) of the two conformations are reduced, from 124 to 46 kJ mol À1 for the conformation with OH À (AC1) and from 10 to 8 kJ mol À1 for the S2B-bound conformation (AC2). The strain energies are always much larger for the large QM system than for the small one. This reflects that the strain energies strongly depend on the size of the QM system (in a larger QM system, there are more Quantum-refined structures of the FeV cluster with different sizes of the QM system and different interpretations of the bound ligand, replacing S2B: (a) OH À with the small QM system in a vacuum, (b) OH À with the large QM system in a vacuum, (c) OH À with the small QM system and " = 80, and (d) NH 2À with the small QM system and " = 80. The mF o ÀDF c difference maps are contoured at +3 (green) and À3 (red). All systems were refined with ComQumX-2QM and two conformations of the QM system, namely one (86%, cyan) with the unknown ligand and the other (14%, grey) with S2B still bound. atoms that can be strained) (Ryde, 2002). Therefore, strain energies obtained with different sizes of QM systems are not comparable.
The large QM system is also improved with the continuumsolvation model, but to a smaller extent: the sum of the RSZD scores decrease from 12.7 to 11.1, whereas the strain energies decrease from 208 to 64 kJ mol À1 for the OH-bound conformation and from 20 to 10 kJ mol À1 for the S2B-bound conformation.
Next, we checked that the discriminatory power was not compromised by the COSMO model, by including calculations also with the second best interpretation of the ligand, NH 2À . The results in the second part of Table 1 show that this ligand gives similar trends to the OH À ligand: both the RSZD scores and the strain energies decrease when going from a vacuum to " = 80, both for the small and the large QM systems. However, most importantly, it can be seen that the RSZD score of the unknown ligand is always higher for the NH 2À than for the OH À models, by 1-4. Likewise, the sum of the RSZD scores is always also larger for models with NH 2À than for the corresponding OH À models, by 5-10. This can also be seen from the difference maps in Figs. 2(c) and 2(d). The same applies for the strain energy for AC1, which is lower for OH À than for NH 2À by 2-24 kJ mol À1 , except for the small model with " = 80. In fact, the difference decreases with increasing ", reflecting that the strain energy decreases with increasing " for both ligands and both sizes of the QM system. The strain energy of AC2 is nearly the same for the two models (within 1 kJ mol À1 ), reflecting that AC2 is the same for the two models.
Finally, we performed a more detailed study of how the RSZD results depend on the dielectric constant by performing quantum-refinement calculations for the small-QM OH À model with " varying from 1 to 80 in steps of 1 and 5 in the intervals 1-20 and 20-80, respectively. The results are shown in Fig. 3(a).
It can be seen that the RSZD values for the two alternative conformations of His-180 and for OH À decrease somewhat from " = 1 to 2 or 4 and are then essentially constant, with fluctuations of 0-0.1 (note that EDSTATS reports the RSZD value with a single decimal). Gln-176 in AC1 shows the largest variation, with a large decrease in RSZD from 9.8 to 4.4. The decrease is monotonous until " = 13 and the lowest value is not attained until " = 75, although the variation is only 0.2 for " > 19. The RSZD value of the same residue in AC2 also decreases with ", but only from 6.0 to 5.1, and the fluctuations are 0.2 for " > 14. In contrast, the RSZD value of S2B (of AC2) shows a slight increase with increasing ", from 0.3 to $0.8, but the fluctuations are 0.3 even for " = 55-80. As a consequence, the sum of the six RSZD values decreases with increasing ", from 18.3 to 11.2. Thus, it seems clear that " = 4 is too small to give converged results, but essentially any value >20 can be used. The random fluctuations of 0.1-0.3 for the individual RSZD values seem to be caused not by variations in the geometries, but mainly by variations in the B factors obtained by Phenix after the rerefinement of the coordinates of the QM system. The corresponding strain energies are shown in Fig. 3(b). It can be seen that the strain energy of AC2 decreases slightly with increasing ", from 10 to 8 kJ mol À1 . However, for " > 5, it is stabilized, with fluctuations of < 2 kJ mol À1 . The strain energy of AC1 also first decreases with increasing ", from 124 to 42 kJ mol À1 . However, from " = 10, it increases again, reaching 51 kJ mol À1 at " = 55-70, after which it decreases again to 46 kJ mol À1 at " = 80. The fluctuations are similar to those of AC2, i.e. <2 kJ mol À1 .
Consequently, we can conclude that a COSMO model clearly improves the result of quantum refinement for systems with a large negative charge and can therefore be used to avoid the need of enlarging the QM system with neutralizing residues.

The bidentate ligand in V nitrogenase
In the second application, we also consider V nitrogenase, but concentrate instead on the nature of the bidentate ligand that bridges two of the Fe ions in the FeV cluster. From the original crystallographic data (5n6y at 1.35 Å resolution; Sippel & Einsle, 2017), it could not be decided whether the ligand is carbonate or nitrate. However, a recent QM/MM study showed that it is most likely carbonate (Benediktsson & Bjornsson, 2020) and we came to the same conclusion with quantum refinement (Bergmann et al., 2021). Here, we compare three different interpretations of the bidentate ligand, namely CO 3 2À , HCO 3 À or NO 3 À . As in the previous section, we compare the results obtained using either the small QM model with only the directly coordinated groups or a larger model in which three neutralizing groups and one hydrogen-bonding group are added to the QM system, as is shown in Fig. 4. We also test three different values of the dielectric constant, namely " = 1, 4 and 80.
The results are collected in Table 2. It can be seen that the RSZD score of the bidentate ligand ('Lig' in Table 2) is always lower for CO 3 2À than for the other two ligands with both system sizes and all values of " (by 0.7-2.8 for the small QM system and 0.2-0.7 for the large QM system). The same applies also for the sum of the RSZD scores (it is lower by 0.4-3.2; 'Sum' in Table 2). This can also be seen from the electrondensity difference maps in Fig. 5, showing larger volumes of positive density (green) around the bidentate ligand for HCO 3 À [part (c)] and NO 3 À [part (d)] than for CO 3 2À [parts (a) and (b)]. Thus, the COSMO model does not affect the discriminatory power of the quantum refinements and it does not change the conclusion that the bidentate model is CO 3 2À . However, the strain energies become appreciably smaller for the COSMO calculations, especially for the small QM system (it decreases by 53-73 kJ mol À1 between a vacuum and " = 4. They therefore also become more similar for the three different interpretations of the bidentate ligand. As an effect, the strain energy becomes slightly smaller for the HCO 3 À ligand than for CO 3 2À for the small QM system (by less than 1 kJ mol À1 ) and slightly smaller for the NO 3 À ligand than for CO 3 2À for the large QM system (by 2-3 kJ mol À1 ). This is partly an effect of the larger negative charge of the CO 3 2À ligand, which increases the strain energy (Ryde, 2002 Table 2 RSZD values and strain energies (ÁE str in kJ mol À1 ) for the FeV cluster with different interpretations of the bidentate ligand and different sizes of the QM region (cf. Fig. 4).
For all systems, calculations without (" = 1) or with a COSMO continuum solvent with two values of the dielectric constant (") were tested. Q is the charge of the QM system. 'Sum' is the sum of the RSZD scores for the five residues shown in the

Figure 5
Quantum-refined structures of the bidentate ligand in the FeV cluster of V nitrogenase (small QM model) with different interpretations of the ligand: (a) CO 3 2À in a vacuum, (b) CO 3 2À with " = 80, (c) HCO 3 À with " = 80 and (d) NO 3 À with " = 80. The mF o À DF c difference maps are contoured at +3 (green) and À3 (red).

Figure 4
The QM systems used for the FeV cluster with different interpretations of the bidentate ligand. The small QM model contained only atoms shown with thick sticks, whereas the large QM model included also the four residues shown as thin sticks.
Moreover, it can be seen that in most cases, the sum of the RSZD scores of the active site decreases when the continuumsolvent model is added. The effect is clearest for the small QM system, whereas for the large QM system, the results are less smooth. The individual residues show more varying trends, but for the bidentate ligand itself, the results improve systematically as " is increased for both the large and small QM systems with the preferred CO 3 2À ligand, but not always for the incorrect ligands.
However, the most important question is whether the small QM systems with a COSMO model gives better results than the large system without any COSMO model. The results in Table 2 show that this is the case for the CO 3 2À and HCO 3 À ligands, whereas for NO 3 À , the sum of the RSZD scores is better only for " = 80, which is acceptable, because this is not the correct ligand. Thus, we conclude that also for this system is it favourable to employ an implicit solvent model in the quantum-refinement calculations.

Protonation of homocitrate in Mo nitrogenase
Next, we studied the protonation state of homocitrate in Mo nitrogenase, based on the 3u7q crystal structure (1.0 Å resolution). Homocitrate is a bidentate ligand of the Mo ion in the catalytic FeMo cluster (Fig. 6). It contains one alcohol and three carboxylate groups. Previous QM/MM, QM and quantum-refinement studies have suggested that in the resting E 0 state, there is a proton on the alcohol group (O7 in Fig. 6), directed towards and almost shared with an O atom of one of the carboxylate groups (O1) (Benediktsson & Bjornsson, 2017;Cao et al., 2017). However, a structure with an additional proton on the other O atom of that carboxylate group (O2) is competitive (Cao et al., 2017). In this study, we have studied these two states (denoted 1Ha and 2H, respectively) with quantum refinement. As for V nitrogenase, we used either a minimal QM system with only the FeMo cluster, or a larger QM system extended by two nearby Arg residues (cf. Fig. 6). The net charge was À5 or À4 for the small QM systems and À3 or À2 for the large QM systems. For all states, we performed calculations in a vacuum or in the COSMO continuum solvent with a dielectric constant of 4 or 80.
The results are collected in Table 3. It can be seen that the behaviour is similar to that for the FeV cluster. The strain energies in general decrease with increasing " (with a single exception). The strain energies are always larger for the large QM systems. They are also normally larger for the 1Ha state than the 2H state, which reflects the larger negative charge of the 1Ha state.
Likewise, many of the RSZD scores decrease slightly with increasing ", but the variation is larger and varies with the considered group. For the homocitrate ligand, RSZD is lowest for " = 80 for all systems, except for the 1Ha state with the small QM system. The sum of the RSZD scores of the four moieties of the active site is lowest for " = 80 for two of the systems and for " = 4 for the other two.
However, the discrimination between the two protonation states is less clear. Our original study (Cao et al., 2017) was based on the small model in a vacuum, for which 1Ha gives a slightly lower RSZD value for homocitrate (3.1 compared to 3.4). The results in  Table 3 RSZD values, O1-O7 distances (Å ) and strain energies (ÁE str in kJ mol À1 ) for the FeMo cluster with different protonation states of the homocitrate ligand (HCA) and different sizes of the QM region (small or large QM system, cf. Fig. 6).
For all systems, calculations without (" = 1) or with a COSMO continuum solvent with two values of the dielectric constant (") were tested. Q is the charge of the various QM systems. 'Sum' is the sum of the RSZD scores for the four groups shown in the

Figure 6
The QM systems used for the FeMo cluster. The small QM model contained only atoms shown with thick sticks, whereas the large QM model included also the two residues shown in thin sticks.
any of the other calculations. Still, the preference of 1Ha was mainly based on the electron-density difference maps, which show a positive density between the O1 and O7 atoms, and a negative density on the other side of O1 [ Fig. 7(g)], indicating that the increase in the O1-O7 distance caused by the protonation on O2 (from 2.47 to 2.60 Å ; also listed in Table 3) is not supported by the crystallographic data. Figs. 7(a)-7(f) show that the positive density exists also for the 1Ha system, but at a lower level and with a smaller volume. The negative density is present only for 2H, but it is smaller for the large system and it decreases with ", reflecting that the O1-O7 distance decreases to 2.55 Å . Still, the difference maps around atoms O1, O2 and O7 are slightly better for 1Ha than for 2H even with the large system and " = 80. Therefore, we conclude that also for Mo nitrogenase, the results are improved by a continuum solvent, whereas the discriminatory power is somewhat decreased, but it was rather weak also without the continuum solvent. Quantum-refined structures of homocitrate in Mo nitrogenase: 1Ha with the small QM system in a vacuum (a), " = 4 (b) and " = 80 (c); 1Ha with the large QM system in a vacuum (d), " = 4 (e) and " = 80 (f); 2H with the small QM system in a vacuum (g), " = 4 (h) and " = 80 (i); 2H with the large QM system in a vacuum (j), " = 4 (k) and " = 80 (l). The mF o À DF c difference maps are contoured at +2.5 (green) and À2.5 (red).

pMMO
Our fourth test case is the Cu B site of particulate (i.e. membrane-bound) methane monooxygenase (pMMO). The nature of the active site of this enzyme has been much debated (Ross & Rosenzweig, 2017). Based on low-resolution (!2.6 Å ) crystal structures, the Cu B site was suggested to be the active site and it was modelled as a binuclear site, but with a distorted and strange geometry (Smith et al., 2011). However, based on quantum refinement, we suggested that this site is mononuclear rather than binuclear (Cao et al., 2018a). More recently, it has been suggested that the active site is rather the Cu C site, another mononuclear Cu site (Ross et al., 2019;Peng et al., 2021).
In this study, we have refined the Cu B site in the 2.8 Å 3rgb structure (Smith et al., 2011) in a vacuum and with COSMO using " = 4 or 80. This was done for both a mononuclear and a binuclear site. Both sites were studied in the reduced, Cu I , state, giving a net charge of +1 for the mononuclear site and +2 for the binuclear site. The results are collected in Table 4. It can be seen that continuum solvent has a small effect on the RSZD scores of the three His ligands and the Cu ion(s), 0.1-0.4, without any clear trends. On the other hand, the strain energies decrease with increasing ", somewhat more for the binuclear model than for the mononuclear model. Still, it is clear for all values of " that the mononuclear model gives lower RSZD scores and lower strain energies than the binuclear model, in agreement with our original study (Cao et al., 2018a). This is also reflected by the electron-density difference maps in Fig. 8, showing extensive negative features around the extra Cu ion.
Thus, we conclude that for this structure at 2.8 Å resolution with a rather small charge of the QM system, there is no gain of using the continuum solvent, at least not for the RSZD values (but the results are not deteriorated either and strain energies decrease with increasing ").

Protonation states in acetylcholine esterase
Finally, we tested the influence of the COSMO model on a system without metal ions for a study of the protonation state of the acetylcholine esterase active site. We used only one size of the QM system, involving 17 residues and three water molecules around the active-site Ser residue (Ser-203), which is modified by the covalent attachment of the nerve agent sarin (the net charge of the QM system is -2 e). We compared two different protonation states. In the first (P0), His-447 in the catalytic triad is doubly protonated (and therefore positively charged), whereas the two nearby residues Glu-202 and Glu-450 are deprotonated and negatively charged. In the second (P3; the numbering is taken from a study involving more states), the HE2 proton on His-447 is moved to Glu-202. The two states are shown in Fig. 9. In our original study (unpublished), we decided that P3 is the most likely state, although all quality measures did not point in the same direction. P0 was one of the competitive alternatives.  Table 4 RSZD values and strain energies (ÁE str in kJ mol À1 ) for the Cu B site in pMMO with one or two Cu ions.
For both models, calculations without (" = 1) or with a COSMO continuum solvent with two values of the dielectric constant (") were tested. 'Sum' is the sum of the RSZD scores for the four residues shown in the

Figure 8
Quantum-refined structures of the Cu B site in pMMO with (a) one or (b) two Cu ions. The mF o À DF c difference maps are contoured at +3 (green) and À3 (red).
As for the other cases, we have run quantum refinement with three different values of the dielectric constant, namely " = 1, 4 and 80. The results are presented in Table 5. It can be seen that the RSZD values of the individual residues vary little with increasing " (up to 0.3, except for Ser-203 in the P0 state). Still, the small differences add up so that the sum of the RSZD scores is actually lowest for " = 1 for both protonation states by 0.4-1.0. On the other hand, the strain energies mainly decrease with increasing " (except for P0 with " = 4).
Finally, we note that both the RSZD score and the strain energy point to P3 as the better protonation state, except for the strain energy in a vacuum. Likewise, the QM energy of the refined structures also point to P3 as the more stable state by 7-14 kJ mol À1 , except for " = 80, where P0 is 2 kJ mol À1 more stable. Thus, we can conclude that the continuum-solvation model has a relatively small effect on systems with the same net charge and differences in the protonation state of the organic molecules.

Conclusion
We have studied whether it is possible to improve the results of quantum refinement by performing the QM calculations in a continuum solvent. We have studied five crystal structures with different properties. For two different structures of V nitrogenase, we show that the continuum solvent strongly improves the results: both the RSZD scores and strain energies decrease when the continuum solvent is turned on. The best results are typically obtained with a high dielectric constant (" > 20). The improvement is largest with small QM systems. The reason for this is most likely the high negative charge of the small QM systems, namely À5 or À6. In particular, refinement of the small QM system in the continuum solvent gives results that are actually better than those obtained for the larger QM system in a vacuum, in which three positively charged residues have been added to reduce the large negative charge. We also show that the ability of the quantum-refinement calculations to discriminate between different interpretations of the structures is not affected by the continuum solvent. However, the difference in strain energies between different structural interpretations is typically reduced, because the strain energies become smaller in a continuum solvent.
For Mo nitrogenase, the results are similar: the strain energies still improve with increasing ", and mostly also the RSZD scores. However, the difference between the two protonation states (which was quite small already in a vacuum) becomes smaller.
For the other two crystal structures (particulate methane monooxygenase and acetylcholine esterase), the RSZD scores Quantum-refined structures of acetylcholine esterase phosphonylated by sarin in the (a) P0 and (b) P3 protonation states, using " = 80. The mF o À DF c difference maps are contoured at +3 (green) and À3 (red). The green arrows indicate the position of the moved proton. Table 5 RSZD values for all residues involved in the QM system for acetylcholine esterase, as well as the strain energies (ÁE str ) and the difference in QM energy of the P0 and P3 protonation states (ÁE QM1 , both in kJ mol À1 ). 'Sum' is the sum of the RSZD scores of all the residues shown in the table.

State
P0 P3 " = 1 " = 4 " = 80 " = 1 " = 4 " = 80 Tyr-119 1.5 1.5 1.5 1.6 1.6 1.5 Gly-120 show little variation, but the strain energies in general decrease slightly with increasing ". Thus, we conclude that a continuum solvent typically is favourable only for structures with a large charge of the QM system. The continuum solvent can then be used to avoid the need of enlarging the QM system. However, for structures with a low resolution (>2 Å ) or with a low charge of the QM system, there is no advantage of the continuum solvent.