Virtual screening (VS) has emerged as a powerful method for quickly screening vast libraries of compounds and reducing them down to a small pool of candidates that can be investigated in a time and cost efficient manner. While great improvements have been made in both reducing the computational time as well as the biophysical modeling of ligand binding, the success rate of VS can still vary between targets and limitations exist. Traditional virtual screening methods make simplifying assumptions about empirical parameters and consequently VS may not capture the intricacies of molecular recognition. These scoring function simplifications lead to inaccuracies in predicting receptor-ligand poses and relative affinities. In this paper, the authors address whether incorporating experimental electron densities (ED) of ligand-bound structures can correct for these simplifications in the scoring function and therefore improve enrichment and diversity of compounds in VS. Based on a pre-existing crystal structure of a receptor, the authors calculate the intensity of ED at grid points and incorporate those values into a modified scoring scheme which they call ExptGMS. Incorporating ED of the binding site also introduces ED of solvent molecules and alternative side chain conformations which provides information about pocket dynamics and accessibility that can be relevant to ligand binding. Such information might implicitly encode the displacement of water molecules, rearrangement of side chains upon ligand binding or other features. These aspects may be simplified (rigid structures) or omitted (solvent) in other VS programs.
To test the effectiveness of their approach, they benchmarked their method against existing ligand or receptor-based VS software showing a small improvement in balancing positive hit identification and hit diversity. The authors then optimized their algorithm by factoring in a low resolution cutoff to the ED. High resolution ED will result in high intensity grid points; because the authors scoring function favors ligands that occupy ED peaks this could lead to bias against compounds that miss the exact centers of those peaks. Therefore, the authors also included low pass filtered (maps calculated only using low resolution reflections) data which produces a more smoothed distribution of ED intensity which they argue accounts for more conformational variability, however, this must be balanced against using too low resolution data, which would result in information loss about ligand fitting to features in the density volume.
They then tested it against the SARS CoV 2 protease 3CLpro which yielded several compounds that exhibited IC50’s in the low micromolar range.
The major strength of this paper is a protocol for incorporating ED of bound ligands to alter the scoring function used in specific receptor binding sites in VS. Currently, this procedure produces comparable results to existing VS software such as GlideSP, which do not have any ED term. Many existing docking algorithms work iteratively by gradually introducing more complex terms; they are aided by scoring functions which assign greater weight to molecules that can form empirical interactions with the receptor. In some cases, a ligand is missed either because the correct binding pose can’t be found or because the scoring function penalizes a lack of strong interactions. In these cases, the ED based scoring function could be utilized to aid the scoring function. The inclusion of the solvent ED in mapping out the binding site and aiding in ligand placement to be particularly interesting from the perspective of developing VS docking tools that take advantage of apo-structures that don’t have ligands already bound to them, which could expand the use of this method. The major weakness of this paper is, as mentioned by the authors, that this method is limited to proteins that have electron density with ligands bounds which limits its utility for receptors with no known ligand bound ED (although see point above for potential for expanding the domain of applicability). Given the current implementation of ExptGMS, we are curious to know if the authors tried generating electron density grids based on the solvent density in the binding site alone. Augmenting VS scoring functions by incorporating experimental ED may further improve docking scores by aiding the placement of molecules using existing binding data of ligands as well as solvent; however, currently the performance improvements offered by this method are modest.
In figure 2, the authors compare how well ExptGMS performs relative to other virtual docking programs by examining the number of top 10, 50, and 100 compounds that are detected and the diversity of compounds. The 2D plot used does not sufficiently describe how differences between datasets used in the DUD-E database could affect these results. For example, did ExptGMS (and the other programs) do much better with some datasets than others? It would be helpful if the authors could show representative graphs from individual datasets to make the differences in performance more clear.
In figure 2 the authors benchmark ExptGMS against several VS software programs that are either ligand or receptor based. They note in figure 3 that their method can improve false negative/positive hit detection. What was the overall false positive/negative rates between the methods used? Do the authors see a relationship between the number of top 10, 50, and 100 compounds and the false negative/positive rate? Does this depend on the choice of software (ex ligand vs receptor based VS)? It would be helpful if the authors could further elaborate on the false positive and negative hit detection rates between their method and the existing VS methods.
We feel the 3CLpro in vitro assay description is unclear as it is written. Is this a peptide displacement assay? Further, what was the construct for the peptide/choices of fluorophores? Further explanation of how the in vitro assay was constructed and performed would be helpful.
In Figure 5, the authors describe one possibility for why ExptGMS with different resolutions complemented each other by using solvent exposure in the pocket. How are t-test results in box plots ? We feel that it is difficult to judge whether the red and blue boxes in each resolution have statistically significant differences.
In Table 1, the authors try to demonstrate the usefulness of multi-resolution analysis using a machine learning model. The table seems to show the advantage of the combination between GlideSP and multi-resolution ExptGMS. However, we are concerned about multiple hypothesis testing. This is also related to Figure 5. The authors tested more and more things about multi-resolution analysis but did not show a principled test. It would be helpful to get a deeper understanding about multi-resolution analysis if the authors could provide their thoughts and tests related to the principle.
Supplementary materials are mentioned but not available under biorxiv posting.
Reviewed by CJ San Felipe, Hiroki Yamamura, and James Fraser (UCSF)
The author declares that they have no competing interests.