Estimating local protein model quality: prospects for molecular replacement

Local error estimates can be used to improve the success of models in molecular replacement.


Introduction
The estimation of protein model quality has a long history in protein structure prediction, originating from methods that estimate the free energy of protein models (Hendlich et al., 1990;Jones et al., 1992;Lü thy et al., 1992). If the free energy of a protein can be accurately described, it should be possible to use this to find the minimum free energy and locate the native structure of the protein. However, the vast majority of energy functions describing the free energy have focused on identifying the native structure among a set of decoys (Park & Levitt, 1996). These methods do not necessarily show a good correlation with the relative quality of protein models, in particular for difficult homology modelling or ab initio cases.
In 2003, we developed the ProQ method, which had a different aim to previous methods . Instead of recognizing the native structure among a set of decoys, ProQ was developed to predict the quality of a model using machine learning and features that could be calculated from the model itself, such as different types of atom-atom contacts, residue-residue contacts, surface-exposure preference, agreement with predicted secondary structure and surface area. We used ProQ to rank models in CASP5 and it was the main reason why our prediction servers were ranked at the very top in terms of model quality .
ProQ was later extended to estimate the local quality of each residue in a protein model, and the quality of the entire model was estimated by simply summing up the quality for each residue (Wallner & Elofsson, 2006). This method was rather successful in CASP7 (Wallner & Elofsson, 2007) and CASP8 (Larsson et al., 2009), in which quality assessment had now become a separate prediction category.
In ProQ2, improved prediction was achieved by using evolutionary sequence profile weights and features averaged over the whole model, even though the prediction was local (Ray et al., 2012;Uziela & Wallner, 2016). ProQ2 error estimates encoded as B factors were shown to improve the success of molecular replacement (MR) (Bunkó czi et al., 2015). This was based on the idea that the estimation of local model quality could be translated into coordinate uncertainty and used to smear the atoms in the model over their range of possible positions (Read & Chavali, 2007), which was first implemented using ensemble consensus to estimate local errors (Pawlowski & Bujnicki, 2012).
Since the release of ProQ2, we have made considerable improvements in prediction accuracy. In ProQ3, we combined ProQ2 with two novel predictors based on centroid and allatom energy terms calculated using Rosetta (Leaver-Fay et al., 2011). Most recently, we developed ProQ2D and ProQ3D (Uziela et al., 2017), which are deep-learning versions of ProQ2 and ProQ3 optimized on a larger training set using new developments in machine learning. In terms of performance, we have gradually improved Pearson's correlation between predicted and actual quality from 0.60 for ProQ to 0.81 for ProQ2, 0.85 for ProQ2D and ProQ3, and finally 0.9 for ProQ3D calculated on data from CASP11 (Uziela et al., 2017).
Given the recent improvements in prediction accuracy in ProQ3D, we wanted to analyze how this improvement propagates to the ability to improve the quality of the models for MR.

Data set
The data set consisted of 431 target-template pairs for 229 molecular-replacement targets with an LLG of <100, using the template to calculate the LLG, and resolution between 0.8 and 3.1 Å (see the supporting information for a complete list). The pairs have an average sequence identity of 28% (with a range of 17-45%) calculated using the alignment constructed below. Models for the pairs were constructed by first generating hidden Markov models (HMMs) for the target and template sequences, respectively, using HHblits (Remmert et al., 2012) with two iterations against uniclust30_2018_08. The two HMMs for targets and template were then aligned using HHalign (Steinegger et al., 2019) with default settings. 3D models were constructed from the alignment using Modeller version 9.14 (Š ali & Blundell, 1993). In the default setting, Nand C-terminal regions unaligned with the template are trimmed from the model, but all other unaligned regions are kept.

Local error estimation
Local errors were estimated using ProQ2 (Ray et al., 2012) and ProQ3D (Uziela et al., 2017). Both programs predict the S score (Cristobal et al., 2001), a score between 0 and 1, where 0 is no quality and 1 is perfect quality. The score S i transforms the local distance deviation d i using the formula S i ( where d 0 is a parameter that monitors how fast the function goes to zero; here, d 0 = 3.0 Å was used, which makes the transform most sensitive to distances around 3 Å ; for example, the 0-6 Å range is mapped to [0.2-1], while all distances larger than 6 Å are mapped to [0-0.2]. The predicted local qualities S i were transformed to predicted local error estimates by solving the equation for d i : d i = d 0 (1/S i À 1) 1/2 . To restrict the range of d i , all d i > 15 were set to 15.

Molecular replacement
To estimate the usefulness of models for molecular replacement, the log-likelihood gain (LLG) measure from Phaser (McCoy et al., 2007) was used. The LLG measures how much better an atomistic model explains the measured X-ray data compared with a random model (Read, 2001). In the general case, calculating the LLG is time-consuming. However, for the purpose of this study we can utilize the fact that the target structures are available and can be used to place the models in roughly the optimal position by superimposing them on the target structures using phenix.superimpose_pdbs (Liebschner et al., 2019). This faster version of Phaser (McCoy et al., 2007) was used to calculate the LLG both without and with local error estimates from ProQ2 and ProQ3D.
To be able to compare different LLG values and their usefulness, an LLG of >50, corresponding to a 90% chance of success in MR (McCoy et al., 2017), was used as threshold to define models of good quality for MR.

Results and discussion
We wanted to compare the potential success in molecular replacement (MR) for the models in the data set (see Section 2) using ProQ2, ProQ3D and no error estimates. As outlined in the flowchart in Fig. 1, we first ran Phaser (McCoy et al., 2007) on the models without any error estimates to establish a baseline. We then used ProQ2 and ProQ3D to predict residuespecific error estimates, as illustrated in the top right panel in Fig. 1, and added these to the B-factor column of the model (see the model colored by predicted error in the bottom right panel in Fig. 1). Finally, Phaser was run again with the same model, but now with error estimates. Following this procedure, three LLG values were calculated for each of the 431 models in the data set: without error estimates, with ProQ2 error estimates and with ProQ3D error estimates, respectively.

Model quality in MR
The target sequences from all models have a relatively low sequence identity to the templates, with a majority (61%) below 30%; however, the produced models are still relatively accurate overall, with most GDT_TS (Zemla, 2003) values above 0.7, corresponding to roughly 70% correct residues (Fig. 2a). It can also be noted that at this sequence-identity level there is almost no correlation (0.06) between the sequence identity and the quality of the models. Next, we analyzed whether the quality of the models (GDT_TS) is important for the models to be useful in MR as measured by the LLG for the models without error estimates (Fig. 2b).

research papers
Indeed, models with high LLG are also of high quality, and almost all cases (LLG > 50) have GDT_TS > 0.7. However, not all high-quality models receive a high LLG. In fact, quite a few models with GDT_TS above 0.7 have an LLG of less than 50. Thus, it is not only the overall quality of the model that impacts on whether a model is of good quality for MR.
Both ProQ2 and ProQ3D predict global overall model quality based on its local error estimates. The correlation to the correct GDT_TS measure in this data set is 0.57 and 0.66 for ProQ2 and ProQ3D, respectively (Figs. 2c and 2d). As we know from previous experience, both ProQ2 and ProQ3D are very good at separating bad from good models, but not as good when it comes to ranking already high-quality models. In this case, both ProQ2 and ProQ3D are able to discriminate between low-quality and high-quality models, and almost all cases with LLG > 50 have a ProQ score above 0.5 (Figs. 2e and  2f ). In addition, the relation between ProQ2 and ProQ3D to LLG is very similar to the relation between GDT_TS and LLG (compare Figs. 2e and 2f with Fig. 2b). Thus, it should be possible to use a threshold on the ProQ score to predict whether a model is of good quality for MR.

MR with error estimates
Next, we calculated the LLG using models with error estimates from ProQ2 and ProQ3D (Fig. 3). Clearly, for the vast majority of the models ProQ2 and ProQ3D error estimates improve the LLG compared with no error estimates (Figs. 3a and 3b). ProQ3D improves 383/431 (88.9%) of the models, which is significantly larger than the 329/431 (76.3%) of the models that were improved by ProQ2 (Table 1).
We can also observe a clear shift in the LLG distribution towards higher LLG values when using error estimates (Figs. 3c and 3d). For ProQ3D the average LLG increases from hLLG noerror i = 35.8 to hLLG error i = 51.7. In terms of modelling there is a small advantage to pruning all unaligned regions from the search model when not using error estimates, hLLG noerror-pruned i = 36.5 (an increase of 0.7), and a small disadvantage when using error estimates, hLLG error-pruned i = 51.4 (a decrease of 0.3). In both cases, the advantage of using error estimates is clear. Overview of the workflow used. Input models superimposed on the target structures are used as input. Errors in the models are estimated using ProQ2 or ProQ3D; the errors are added to the B-factor column in the model. Models with both no error estimates and error estimates are used in MR calculations to estimate the LLG. Table 1 LLG improvements using error estimates for 431 models in the data set.
LLG increase is the fractional improvement in LLG when using error estimates, #targets ÁLLG>0 is the number of targets that improve when using error estimates and #targets LLG>50 is the number of targets that have an LLG of >50.

Figure 2
Global model quality and potential success in MR. (a) Sequence identity for the target-template sequences versus global model quality measured by GDT_TS, (b) GDT_TS against the LLG for a model without error estimates, (c) GDT_TS in relation to the predicted global quality by ProQ2, (d) GDT_TS in relation to the predicted global quality by ProQ3D, (e) ProQ2 against the LLG for a model without error estimates, ( f ) ProQ3D against the LLG for a model without error estimates.
In a previous study, we reported an average 25% increase in the LLG using ProQ2 error estimates compared with models using no error on models submitted to CASP10 (Bunkó czi et al., 2015). Here, the average improvement in the LLG using ProQ2 is 36.7% (Table 1); since there is no change in methodology between the two sets, this number indicates that this particular data set is slightly easier than the CASP10 data set. ProQ3D error estimates improve the average LLG by 52%, suggesting that the success in MR can be improved even further by using ProQ3D instead of ProQ2. Indeed, if we check how many models that have LLG values indicating a high chance of success (LLG > 50), we see that only 74/431 models without error estimates are successful, while 175/431 and 209/431 are successful using ProQ2 and ProQ3D, respectively; the difference between ProQ2 and ProQ3D is significant.

Prediction example
Finally, we conclude by demonstrating a successful prediction case. The target is a 206-amino-acid dihydrofolate reductase from Pneumocystis carinii solved using X-ray diffraction at 2.1 Å resolution (PDB entry 2fzh). The template is a 332-amino-acid dihydrofolate reductase from Bacillus anthracis solved using X-ray diffraction at 2.25 Å resolution (PDB entry 3e0b, chain A). The alignment between the target and template sequence is 30.9% identical and the quality of the model based on this alignment has a GDT_TS of 0.67. The predicted error by ProQ3D as well as the actual error (capped at 8 Å ) is shown in Fig. 4(a). The correlation between the predicted and actual error is 0.85. The model colored by the error with the corresponding template superimposed is shown in Fig. 4(b); some obvious bad loops that do not align well with the template are correctly identified as such, but then there are also some secondary-structure elements, such as the leftmost strand, which align well with the template but are correctly predicted as bad (data not shown). The model without error estimates received an LLG of 8.2 and this improved to 81.3 for the model with error estimates, clearly demonstrating the value of using error estimates.

Conclusion
We have demonstrated that the use of error estimates can increase the number of models useful for MR substantially. The most recent version of our model-quality assessment program ProQ3D is more accurate and significantly better than ProQ2. ProQ3D improved the LLG score by over 50%  on average, resulting in significantly more models of good quality for MR compared with not using error estimates. ProQ3D is available from http://proq3.bioinfo.se/ both as a server and as a standalone download.