Skip to main content

Write a comment

PREreview of When can AlphaFold predict the oligomeric states of proteins?

Published
DOI
10.5281/zenodo.20126973
License
CC BY 4.0

This preprint was reviewed as part of STB 612 course in the Department of Structural Biology at University at Buffalo lead by Dr. Alex Vecchio and students on the PhD track.

Overview

In this study, the authors show that ipTM is the best metric among several for predicting oligomerization states of proteins with AlphaFold2 Multimer. With an ipTM score cutoff of 0.32 in AlphaFold2 Multimer, proteins which score below this cutoff for all multimeric states are strongly predicted to be monomers, while those which score above the cutoff can be predicted to be the oligomeric state with the highest score. They further show that this predictive power is greatest for proteins with structures, or structures of homologous proteins, in the AlphaFold database, while predictions for those without similar proteins in the database are unreliable. In conjunction with predicting oligomeric states, the authors optimized the computational costs of running local structure predictions and analyzing the number of recycles needed to produce little to no increase in predicted confidence. The authors also compared AlphaFold2 Multimer with AlphaFold3 and identified little variation between the two for predicted oligomers. Lastly, they applied a similar approach to a selected group of membrane proteins of particular interest to the authors. Among these, they found that some families of membrane proteins tended to have higher ipTM scores which they tied to prediction confidence.

The ability to predict oligomerization states of proteins in-silico is important for understanding their function and developing drugs to treat disease. The visual representations of their data are appealing and the chosen color schemes are contrasting and easily interpretable.In its current state, the paper lacks clear novelty and significance hindering its impact. It is currently difficult to assess the significance and novelty of this work based on the background given, but this appears to be more applicable as a methods or application paper in its current state. However, the use of ipTM score is not strongly justified and the cutoff diverges with the field standard. Based on these facts, we have provided suggestions on changes to strengthen the presented work:

Major Comments

  1. The introduction to this paper needs additional background and references about the current state of the field to accurately assess the novelty of this work. For example, seminal papers such as the newest paper detailing AlphaFold3.

  2. The authors utilize an ipTM cutoff of 0.32 for prediction of oligomerization states, though the standard in the field is 0.8 to confirm oligomerization states. Are the authors proposing to change the standard of the field? What does the 0.32 ipTM score mean in the context of the actual interface? Would an ipTM score of 0.32 accurately depicted an experimentally determined interface?

  3. Further, the statistical significance of the ipTM scores based on the plots in figure 1D is not convincing, with several proteins appearing to have overlapping scores for different oligomerization states, such as Q9P0L9. The average for Q9GZP0 in the dimer state appears to be below the cutoff, but it is not listed as a monomer, calling the cutoff of 0.32 into question.

  4. Notably, ipTM is a score that measures the predicted confidence of protein-protein interactions. There must be more justification as to how it can be applied to monomeric proteins. Use of ipTM for oligomerization prediction is instead being used to determine AlphaFold’s ability to predict protein complexes. Perhaps another score should be considered, for example pTM. In Figure 1B, the ipTM cutoff is being used with monomers, but monomers are unable to oligomerize. This should be addressed by the authors.

  5. For the variance in ipTM scores between predicted oligomers (Figure 1D, 6A, and the like) a statistical model should be used to validate that the differences seen between states is significant. This is especially evident for some proteins where the variance between ipTM scores for 2, 3, 4, 5, 6, and 7mers is quite low (Figure 6A Q96RD6-HsPANX2). The authors should address how the relative values within predictions of a given protein might affect the confidence of the oligomeric prediction.

  6. The authors selected a list of 40 proteins at random to attempt to have a broad sampling to demonstrate general applicability. However, the selection does not seem random, as many of these proteins are homologs with similar structure and the same oligomerization state (e.g. PO2743 and PO2741). Additionally, the selected proteins are primarily alpha helical. The authors should justify why these proteins were chosen or how their dataset was curated, and if they wish to show broad applicability, they should consider using a more diverse set of proteins.

  7. For the selected proteins, the authors claim there is a known oligomerization state. This does not necessarily show the biologically relevancy of these states, as they were determined by crystallography. This should be clearly stated, as crystallization requires protein-protein interactions which are often not present in physiological conditions. If solution based oligomerization states are known in physiological conditions, they should be highlighted.

  8. The authors may consider changing the title, as it poses a question which is not explicitly answered.

Minor Comments

  1. Some of the box-plots of ipTM scores have missing values. (e.g. P53396, P35670) Please correct this.

  2. Might be useful/interesting to have beyond 5mers predicted for Figure 1. For example, in the bottom row of 1D many of the predicted ipTM scores increase with oligomers. Would this be the case for 6, 7, or 8? Perhaps it would peak at 5 and if so would validate the metrics further. For broad applicability, stopping at pentamers is understandable. For proteins where ipTM appears to increase monotonically with oligomerization state (Q9NXV2, P02741, Q86SE8), showing higher-order oligomers to prove that it will not increase past the real oligomerization state.

  3. The full set of ipTM scores in Figure 1D and the protein images in figure 2A, while helpful for visualization, take up a large amount of space and could possibly be moved to the supplementary figures, similar to figures S4 and S5. Additionally, the large amount of structures is somewhat overwhelming and distracting. Reduction in the number of represented structures, while shifting some to the supplemental, will help alleviate this and enhance readability.

  4. Figure 2E. The X and Y axes are plotting similar things (TM Scores) but are scaled differently. Perhaps making the scales equivalent will enhance readability because as it stands this figure is confusing and difficult to interpret.

  5. Figure 3A/B. A statistical model should be implemented here to determine the significance in the change of delta mean ipTM. This would also provide a more definitive “end” to the number of recycles needed before no statistical change is detected.

  6. Figure 4B/C. Why is only the bottom 30% being used here? Would examining the top 30% instead provide more insight because those predictions are expected to be of higher confidence? Additionally, all plotted curves between 0-90 follow a near identical pattern. Presumably the curves comparing >0 and >90 should be different. This is not well explained and should be clarified.

  7. In figure 5G, showing the PDB of incorrectly predicted structures, when possible, next to the predictions may be helpful in evaluating what is going wrong with some predictions and the importance of the monomer prediction.

  8. Figure 6C. Only encompasses two proteins from each class. Reasoning for choosing these select proteins should be elaborated or more protein from within each class should be included in the predictions.

  9. The statement of the monomeric pLDDT being used to predict whether an oligomeric state is correctly predicted “In this sample set, if the pLDDT value was > 90 (14 structures), then the correct oligomeric state was assigned. If the pLDDT value was < 50 (1 structure), then an incorrect oligomeric state was predicted.” appears to contradict prior statements and Supplementary Figure S1 and shows values without believable statistical power.

  10. Method: Recycling. It would be useful to state the computing infrastructure used and specify the computing time to show accessibility of the techniques and show the cost and benefit of increased recycles and structures.

Competing interests

The authors declare that they have no competing interests.

Use of Artificial Intelligence (AI)

The authors declare that they did not use generative AI to come up with new ideas for their review.

You can write a comment on this PREreview of When can AlphaFold predict the oligomeric states of proteins?.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now