PREreview of Deep Boosted Molecular Dynamics (DBMD): Accelerating molecular simulations with Gaussian boost potentials generated using probabilistic Bayesian deep neural network

by Christian Macdonald, James Fraser, Ilayda Alkislar, and Nicholas Freitas

Published: June 30, 2023
DOI: 10.5281/zenodo.8102835
License: CC BY 4.0

This review arose out of a course for graduate students in the life sciences at UCSF, “Peer Review in the Life Sciences,” which aims to introduce junior scientists to peer review in a critical yet constructive way. The students selected preprints to review, led discussions of them, drafted reviews, and revised them based on feedback from peers and instructors.

Summary:

A significant challenge of typical molecular dynamics (MD) are computational limitations, preventing simulations from reaching the sampling necessary to observe biological processes with long timescales and/or high energy barriers. A variety of accelerated MD techniques exist, which adjust the energy landscape to improve sampling. This manuscript describes a new method, Deep Boosted Molecular Dynamics (DBMD), that employs Bayesian neural network learning models to minimize anharmonicity and improve accuracy of measured reweighted free energy in biomolecular simulations.

Previous researchers had developed Gaussian-accelerated MD (GaMD), which allows for more rapid sampling by reweighting the free energy via addition of a “boost potential” to avoid being caught in energy wells. GaMD allows accurate recovery of free energies if employing potentials that follow a Gaussian distribution, which can be difficult to ensure. This work expands GaMD by using machine learning to build boost potentials and iteratively reduce the measured divergence from a Gaussian, then allowing energetic reweighting as in GaMD to recover free energies. The authors validate DBMD by comparing it with conventional MD with no acceleration, examining the free energy profiles of alanine dipeptide, chignolin, and three tetraloop-containing hairpin RNAs. Ultimately, acceleration methods with greater reweighting accuracy, like DBMD, could have a significant impact on fields of research where MD is used frequently, such as small-molecule drug design, protein design, and the study of protein dynamics. We were unsure from the manuscript how this method practically compares with other methods for generating boost potentials, however.

The major success of this paper is introducing a model with comparable accuracy to conventional MD, and a reduction in the anharmonicity of boost potentials required for acceleration with the GaMD method. The major weakness of this paper is the choice of comparisons. The authors compare DBMD to conventional MD in the text and figures. However, despite discussing GaMD as an alternative to their method, the authors do not compare it with DBMD. Because the authors do not benchmark DBMD against the existing accelerated method, GaMD, the practical improvements of DBMD is unclear. Further, the methods are not sufficiently described, making it difficult to follow and appreciate the advancements made to the molecular dynamics method.

Finally, while one of the readers is familiar with machine learning, we are unfamiliar with the use of Bayesian Neural Networks and benefits in this context over traditional neural network models, and cannot provide comments on its specific implementation in this paper.

Major Points:

Figure 1 is difficult to interpret: This method builds upon previously established Gaussian accelerated MD. However, the new aspects of this method are not described clearly. We found figure 1, which should establish for the reader what is being done in this paper, difficult to understand. It is unclear what the function of Figure 1c serves.
Utility and implementation of Machine Learning model is unclear: In addition, the role and justification for the use of the machine learning model is not clear. For example, we were unable to determine why the authors initialize their boost potentials randomly while initially training the machine learning model. We were also unable to determine why multiple rounds of training were used, where the previous round’s output would serve as the next round’s input. Although closely reading the rest of the paper, it was hard to determine what the desired outputs of the machine learning model were. For example, the section titled “Deep Learning of Potential Energies” begins with the sentence, “In DBMD, the probabilistic Bayesian neural network… was applied to minimize the anharmonicity of boost potentials ΔV.” This implied to us the model would be used to generate boost potentials during the simulation. However, the next section seems to indicate the model is only used to find values for k_0.
Comparing results to GaMD would demonstrate value of new method: In the discussion section, the authors state that they ran the same simulations using their method and the existing GaMD method, but did not share the data. While the data for simulations of alanine dipeptide, chignolin, and the RNA hairpins show improved simulation speed over classical MD, GaMD would also accelerate systems of this size. As the paper is written currently, it doesn’t indicate that the reader should choose this method over GaMD. Since the authors claim DBMD works better than previous methods, showing that DMBD is faster than conventional MD and is more accurate than GaMD would convince the readers of this.
Model systems may be too small to show DBMD utility: Accelerated MD is most useful for large simulation sizes. This paper used small systems to compare DMBD with conventional MD. We suggest that the authors specify the utility of this method for small proteins and macromolecules, based on their current data.

Minor Points:

Small grammatical changes:
- The paper has some small grammatical issues that should be fixed before publication. For example, in the second paragraph of the intro: “…is an enhanced sampling that technique works…” and “…neural network to learn a priori CV…”
Duplicate panels in Figure 2.
- Additionally, the energy graphs for the implicit solvent alanine dipeptide cMD/DBMD simulations in Figure 2 are identical, indicating they have been accidentally duplicated.
Relevance of DMBD parameters was difficult to interpret: For each experiment the authors show the resulting k, Vmin, Vmax and E parameters. Do these parameters have biological relevance, and should they be considered results to be reported in future DBMD works? It may be more clear to readers to be shown as a table or supplementary figure.
Stylistic elements:
- Stylistically, the figures could be improved with the addition of a color-blind friendly palette.
Some supplementary figures could be added to the main paper to add clarity: Figure S3, which shows the comparison of energy profiles in alanine dipeptide in DBMD and conventional MD, was convincing in demonstrating the accuracy of DMBD. This would fit well as a main figure.
Data for three hairpin RNAs can be abbreviated: The experiments to use DMBD and conventional MD to find the energy profiles and folding states of the three hairpin RNAs were useful benchmarks for DBMD. However, the figures and text are repetitious. It may be advantageous to represent this data as a table or move two of these experiments to the supplement.

PREreview of Deep Boosted Molecular Dynamics (DBMD): Accelerating molecular simulations with Gaussian boost potentials generated using probabilistic Bayesian deep neural network

Competing interests

Comments