Skip to PREreview

Structured PREreview of The role of induced polarization in drug discovery applications

Published
DOI
10.5281/zenodo.15714491
License
CC BY 4.0
Does the introduction explain the objective of the research presented in the preprint?
Yes
The introduction establishes a comprehensive framework for understanding the research objective through five key elements. First, it establishes the fundamental context by explaining how induced polarization plays a pivotal role in drug discovery and molecular interactions, demonstrating why this phenomenon is crucial for ligand-protein binding through the redistribution of electronic clouds in response to electrostatic fields. Second, the authors identify a critical computational challenge: while quantum chemistry methods such as Density Functional Theory (DFT) can accurately calculate polarizability tensors, these approaches are computationally expensive and impractical for analyzing large datasets of drug-like molecules typically encountered in pharmaceutical research. Third, the introduction presents machine learning as a promising solution approach, explaining how these methods can leverage molecular descriptors to predict polarizability tensors rapidly and at scale, thus complementing high-accuracy quantum chemistry techniques for large-scale drug discovery workflows. Fourth, the authors provide a clear and specific objective statement, outlining their plan to calculate polarizability tensors for thousands of molecules from the CHEMBL database targeting three specific proteins (Thrombin, Estrogen Receptor alpha, and Phosphodiesterase 5A), followed by the development of machine learning models to predict tensor eigenvalues based on atomic hybridizations and inertia tensor eigenvalues. Finally, the introduction articulates the broader significance of this work by explaining how the same molecular features will be used to predict IC50 values, thereby demonstrating the critical importance of induced polarization in computer-aided drug discovery and potentially accelerating the identification of new therapeutic candidates.
Are the methods well-suited for this research?
Somewhat appropriate
1. Despite establishing a solid computational foundation with DFT calculations using the well-established B3LYP functional, the study is constrained by its choice of the 6-31G* basis set, which, while computationally efficient, may compromise the accuracy of polarizability calculations that would benefit from larger, more diffuse basis sets designed specifically for electronic property predictions. 2. Although the research demonstrates sophisticated machine learning implementation through robust neural network architectures with modern techniques like batch normalization and comprehensive K-fold cross-validation, it paradoxically relies on a relatively limited feature set of only 24-27 descriptors, potentially overlooking critical molecular properties that influence binding affinity and polarizability. 3. While the study employs exemplary validation practices including bootstrapped sampling and multiple model comparisons between neural networks and random forests, it undermines direct model comparison through inconsistent preprocessing approaches—specifically applying log transformations to random forest inputs while using raw IC50 values for neural networks—creating methodological inconsistencies that complicate interpretation of relative model performance.
Are the conclusions supported by the data?
Somewhat supported
1. While the study demonstrates strong predictive performance with R² values of 0.79-0.93 for IC50 prediction and 0.82-0.88 for polarizability prediction, the conclusions lack essential comparative analysis with existing molecular descriptors, alternative computational methods, or established benchmarks, making it impossible to assess whether the proposed approach represents a genuine advancement or merely reproduces existing capabilities. 2. Despite achieving good performance across three target proteins, the conclusions overgeneralize the findings by claiming broad applicability to drug discovery without adequately acknowledging the limited scope of testing only three specific proteins (Thrombin, Estrogen Receptor alpha, and Phosphodiesterase 5A), which undermines the robustness of claims about the universal importance of induced polarization in computer-aided drug discovery. 3. The conclusions present an overly optimistic interpretation by failing to discuss model limitations, potential failure cases, or circumstances where the hybridization-based approach might be inadequate, while also not addressing the mechanistic gap between demonstrating predictive correlation and establishing true causal understanding of induced polarization's role in molecular binding processes.
Are the data presentations, including visualizations, well-suited to represent the data?
Highly appropriate and clear
How clearly do the authors discuss, explain, and interpret their findings and potential next steps for the research?
Somewhat clearly
The author has made appropriate explanations and interpretations of findings, but fails to provide more structured and synthesized explanations to give more guided potential next step for the research
Is the preprint likely to advance academic knowledge?
Highly likely
It presents a critical topic of study in a timely and relevant manner
Would it benefit from language editing?
No
Would you recommend this preprint to others?
Yes, it’s of high quality
Is it ready for attention from an editor, publisher or broader audience?
Yes, after minor changes
Synthesis of the analysis and interpretation of findings with the literature, and provision of a more structured approach for significant recomendations.

Competing interests

The author declares that they have no competing interests.