PREreview of Effects of residue substitutions on the cellular abundance of proteins
- Published
- License
- CC BY 4.0
SIGNIFICANCE:
While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seq-type measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the “abundance” measurement of VAMP-seq.
PUBLIC REVIEW:
Of course there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (esp. important in “degron” motifs), etc. Here the authors goal is to create simple models that can act as a baseline for two main reasons: 1) - how to tell when adding more information would be helpful for a global model ; 2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above. As such, the authors state that this preprint is not to intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards the consideration of static structural information for the VAMP-seq effects. At its core the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.
SPECIFIC FEEDBACK:
Major points:
The authors spend a good amount of space discussing how the six datasets have different distributions in abundance scores. After the development of their model is there more to say about why? Is there something that can be leveraged here to design maximally informative experiments.
They compare to one more “sophisticated model” - RosettaddG - which should be more correlated with thermodynamic stability than other factors measured by VAMP-seq. However the direct head-to-head comparison between their matrices and ddG is underdeveloped. How can this be used to dissect cases where thermodynamics are not contributing to specific substitutions patterns OR in specific residues/regions that are predicted by one method better than the other. This would naturally dovetail into whether there is orthogonal information between these two that could be leveraged to create better predictions.
a. Perhaps beyond the scope of this baseline method, but there is also ThermoMPNN and the work from Gabe Rocklin to consider as other approaches that should be more correlated only with thermodynamics.
I find myself drawn to the hints of a larger idea that outliers to this model can be helpful in identifying specific aspects of proteostasis. The discussion of S109 is great in this respect, but I can’t help but feel there is more to be mined from Figure S9 or other analyses of outlier higher than predicted abundance along linear or tertiary motifs.
Minor points:
Why is a continuous version of contact number used here, instead of discrete count of neighboring residues? WCN values of the residues in core domain can be affected by residues far away (small contribution but not strictly zero; if there is many of them, it adds up)
Typos in SI figure captions e.g. FigS8-11 “All predictions were performed using using….”
Personally, we’d appreciate a definition on this new substitution matrices under constraints of rASA/WCN values. It is unclear to me until I read the code but we think that the definition is averaging the substitution matrix based on the clusters they are assigned to. If so, this could be straightforward defined in method section with a heaviside step function.
Competing interests
The authors declare that they have no competing interests.