PREreview of Heterogeneous folding landscapes and predetermined breaking points within a protein family

by Christian Macdonald and CJ San Felipe

Published: May 27, 2024
DOI: 10.5281/zenodo.11356867
License: CC BY 4.0

In this manuscript, the author conducts a comparative computational analysis of protein folding across a paralogous family in a single organism, the 16 small GTPases in Saccharomyces cerevisiae. The author presents a broad variety of interesting measurements and comparisons which are neatly integrated through a model where unfolding pathways are linked with degron display.

The major strength of this paper is in showing how homologous proteins with a shared fold can differ in their folding/unfolding pathways despite structural similarity. The author presents a few interesting hypotheses for how changes of the unfolding pathway might be linked to protein homeostasis. This leaves open several interesting future questions such as how sequence space tunes the folding pathways of proteins.

The measurements and comparisons are in general appropriate and convincing, with an important exception (major point 1). The major weakness of this paper is that many of the explanations and figures are often unclear or confusing, which is made worse by the overall terseness of the text (major point 2). We found many portions of this paper to be difficult to follow and the objectives of several sections and figures difficult to understand which significantly hinders their impact. We suggest some possible ways to improve the clarity below.

We also feel that the distinction between structure and sequence-based analyses could be more usefully motivated, and the distinction between the two could be made clearer (major point 3). As the author states, “many open questions regarding protein dynamics cannot be systematically answered from the analysis of static protein sequences and structures alone,” and we also believe integrative analysis will be essential to answer the open questions of protein biology. This is an interesting broader point to this paper, but as written it is understated.

Overall, this is an interesting comparative study that is both a good example of the comprehensive examination of a single family and will be of interest to protein biologists with an evolutionary or comparative bent. Although in its current form some of the interest requires effort to obtain, we believe that the clarity and argument can be reasonably simply improved.

Reviewed by Christian B. Macdonald and CJ San Felipe

Major points

The author suggests a model where the order of unfolding is anticorrelated with the presence of degrons. This is an intriguing possibility, but we find the comparisons here insufficient to conclude this. Figure 5A convincingly shows a depletion of C-terminal degrons in C-terminally unfolders, and 5B is suggestive of this, although primarily through a weaker chaperone-mediated effect. These are not quite the appropriate comparisons, in our opinion. A comparison of C- vs N-terminal degron enrichment within each individual protein is the more direct measurement. For example, from the data in 5A, one could not distinguish between the case where the total degron content (rather than specifically C-terminal) is altered by the unfolding pathway.

The manuscript suffers from a general terseness and complexity which cumulatively becomes a major issue with comprehension. The figures are too crowded, and within individual plots there are often multiple comparisons being made, and identifying the relevant ones requires close reading. While we find the arguments, once identified, reasonable, they could be made much simpler to grasp. We have identified many specific issues in the minor points below, but in general suggest simplification of the main figures, use of SI figures to contain secondary analyses, and making individual plots simpler to interpret.

The author presents a reasonable hypothesis that the differences in simulated unfolding pathways is correlated with degron depletion, however, we believe the question of the interplay between protein sequence and unfolding pathway is not completely clear from this work. This is presented as a motivation, but it is fairly understated - foregrounding this might help. Also, the various metrics include sequence-based, structure-based, and mixed ones, and it is rarely clear whether a particular point is being made based on sequence or structure or both. Some framing could make this clearer and also indicate the utility of particular types of information for understanding folding. For example in figure 3C, the author classifies 5 major unfolding pathways, but does not dive deeper into exploring potential reasons for the distinctions. Are homologs with higher sequence similarity more likely to share unfolding pathways? Some of these homologs (YHR022C, YCR027C, and YOR089C) appear to show a 50/50 split between two unfolding pathways. Are there subregions within these proteins that are more closely related to homologs that unfold through one pathway predominantly?

Minor points

The figures, although attractive, are too crowded, which makes individual plots too small to grasp specific details.

The overall discussion of the results is overly terse with respect to methods, For example, lines 117-119: “I first determined the folding temperatures Tf of the proteins….” Learning how the folding temperature was determined requires reading the methods. The clarity would be substantially improved by including at least cursory statements of experimental approaches in the text.

Some quantities could be defined more thoroughly at first use, for example “frustration” at line 140. Although this is a somewhat commonly used term, including a brief definition of how it is measured and an expanded qualitative description would greatly improve the readability.

Similarly, the precise sense of “breakpoint” could be defined at line 176.

Proteins are given Y names in the figures, but sometimes referred to as either a Y name or a gene name in the text (SRP102, ARF1, etc). Indicating the gene names in the figures for ones that are specifically discussed might be helpful.

Lines 198-211: the use of structural region (e.g., a3 or a4/b5) to indicate both an overall unfolding pathway class (i.e., tertiary structure) and the folding of a specific region (i.e., secondary structure) is confusing. Perhaps using different cases for each (A3 vs a3) or Greek (α3 vs a3) could make this distinction clearer.

Figure 1C: the correlation and slopes should be quantified and displayed.

Figure 1C: the increased correlation of the aligned regions appears to mainly be driven by the decreased variance in contact numbers for that region, which would be expected from a more structurally conserved region.

Figure 2A, D: although the background is shaded by secondary structure, a cartoon plot above would probably get this sense across better.

Figure 2G: the author states that the contact pattern divides the structure into two regions. Could these be indicated in this plot? A scale should also be included.

Figure 2H: the correlation and slopes should be quantified and displayed.

Figure 3A: Frustration vs alignment score graph is unclear: what do the black and green lines correspond to? What do the vertical lines in the background correspond to?

Figure 3C: it is unclear how robust this classification is. A plot of percentage of fraction of simulations unfolding via major vs. secondary classes, perhaps as SI, would be useful.

Figure 3D: the description of the Qf curves is confusing, do these Qf curves represent the overall unfolding trajectories for a single protein or is it an average for all homologs? Do the vertical lines correspond to the time that specific clusters break for different homologs?

Figure 3F: this is a very busy plot, and the important comparisons are not immediately apparent. Including some significance indicators for within-class and across-class AUCs would help, as well as providing a title for the class (alpha vs beta) above each plot.

Figure 3G: the author uses the contact cluster matrix to derive two distinct regions (N and C-terminus) that are more likely to unfold. We felt that the explanation of how this was derived from the matrix was unclear and could not see how the two distinct regions arise from the matrix.

Figure 4D: is the unit of unfolding intermediate time? This is not clear from the text. If so, perhaps a better comparison here would be the percentage of total unfolding time spent in the intermediate state rather than total time.

Figure 4F: the bottom two plots share an axis (length termini), but one is X and the other Y. Plotting both on Y would probably make more sense. Also, the correlation and slope should be quantified and displayed, and the similarity of the correlation quantified somehow.
We found the use of degron “depletion” to be unclear. Is the author referring to the presence of degron signals at the termini or the exposure of the degron?

Figure 5A-B: does the region of each protein analyzed for degrons include the N- and C-termini, or just the aligned regions? The cartoon suggests the latter, but this should be explicitly stated.

Figure 5B, line 269: although the author mentions it having a weaker correlation in the text, a plot of N-terminal degron score vs. % N-unfolding (similar to 5A) should be included, at least in an SI.

PREreview of Heterogeneous folding landscapes and predetermined breaking points within a protein family

Major points

Minor points

Competing interests

Comments