research papers
The probabilistic estimate of the solvent content (Matthews probability) was first introduced in 2003. Given that the Matthews probability is based on prior information, revisiting the empirical foundation of this widely used solvent-content estimate is appropriate. The parameter set for the original Matthews probability distribution function employed in MATTPROB has been updated after ten years of rapid PDB growth. A new nonparametric kernel density estimator has been implemented to calculate the Matthews probabilities directly from empirical solvent-content data, thus avoiding the need to revise the multiple parameters of the original binned empirical fit function. The influence and dependency of other possible parameters determining the solvent content of protein crystals have been examined. Detailed analysis showed that resolution is the primary and dominating model parameter correlated with solvent content. Modifications of protein specific density for low molecular weight have no practical effect, and there is no correlation with oligomerization state. A weak, and in practice irrelevant, dependency on symmetry and molecular weight is present, but cannot be satisfactorily explained by simple linear or categorical models. The Bayesian argument that the observed resolution represents only a lower limit for the true diffraction potential of the crystal is maintained. The new kernel density estimator is implemented as the primary option in the MATTPROB web application at http://www.ruppweb.org/mattprob/.
Keywords: solvent content; protein crystals; Matthews coefficient; Matthews probability; kernel density estimator; Bayesian resolution limit.
Supporting information
Microsoft Word (DOCX) file https://doi.org/10.1107/S1399004714005550/dz5321sup1.docx | |
Microsoft Word (DOCX) file https://doi.org/10.1107/S1399004714005550/dz5321sup2.docx | |
Microsoft Word (DOCX) file https://doi.org/10.1107/S1399004714005550/dz5321sup3.docx |