Skip to PREreview

PREreview of Identifying key residues in intrinsically disordered regions of proteins using machine learning

Published
DOI
10.5281/zenodo.7631234
License
CC BY 4.0

This work aims to address the issue of the lack of success in predicting conserved amino acid residues in intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) of proteins. While primary sequence alignment can provide insight about the evolution of structured proteins, little can be extracted about IDP/Rs due to their low sequence similarity and lack of structure and predicted function.

In this study the authors attempt to address this gap in knowledge using machine learning to find highly conserved residues in human protein orthologs containing IDP/Rs that give rise to liquid-liquid phase separation (LLPS). The authors applied unsupervised contrastive ML to find the highest conserved residues, which might indicate critical functional importance.

The authors found that cysteine and tryptophan residues overall were assigned the highest “attention paid” score by the ML algorithm while most other residues received broadly distributed attention scores indicating low importance to IDP/R function. This is consistent with previously reported experimental findings, which report that aromatic residues are critical to LLPS function.

This work is interesting as there are few predicative tools that can provide insight into IDP/R function from primary sequence analysis. I can see its potential value not only to IDP/R researchers but to the broader protein design/engineering community.

Major issues

  • The title is vague and I think it could be beneficial to be more descriptive. Might suggest, “Identifying key residues that drive LLPS in….”

  • Not clear on how the sequences were “padded”? Does this bias the model?

  • Concluding paragraph needs work. Reiterate major findings and reframe improvements as future directions. What’s next for the model? Other IDP/R functional predictions?

  • Can resolution of Figure 1 be better?

  • Color scheme in Figure 2 makes it difficult to read

  • Figure 2 is confusing. What are the arrows indicating (it’s stated in the legend but not clear in the figure), they also show up red (authors call them purple)? Are the amino acids the arrows point to highlighting the group with shared physicochemical properties or is it supposed to be indicating individual residues? Please clarify.

    Why are there illustrative figures in panel B but not the other panels?

    What are the labels at the bottom of panel D indicating and why aren’t they in all panels?

     Can these questions be simply addressed in the figure legend?

  • Panel E in Figure 2 might be better as a separate figure entirely. Figure S5 might be used to replace it in the main text and remove panel E instead.

  • Combine references in Supple. Mat. with main text references

Minor issues

  • minor spelling errors. Figure 1. “attension” should be “attention”

Competing interests

The author declares that they have no competing interests.