Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
Publication in refereed journal

Times Cited
Web of Science31WOS source URL (as at 24/11/2020) Click here for the latest count
Altmetrics Information

Other information
AbstractDocking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.
All Author(s) ListLi HJ, Leung KS, Wong MH, Ballester PJ
Journal nameMolecules
Volume Number20
Issue Number6
PublisherMDPI AG
Pages10947 - 10962
LanguagesEnglish-United Kingdom
Keywordsbinding affinity prediction; docking; machine-learning scoring functions
Web of Science Subject CategoriesChemistry; Chemistry, Organic

Last updated on 2020-25-11 at 00:05