PREreview of Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues

Published: August 10, 2023
DOI: 10.5281/zenodo.8237332
License: CC BY 4.0

Note: We reviewed an updated version of this preprint for a journal. The comments are posted with this preprint in the hopes the authors post the updated version that we commented on here as the journal we are reviewing for does not place limits on updating preprints during the peer review process.

Ashraya Ravikumar and James Fraser

Summary:

The traditional Ramachandan plot uses the ϕ and ψ torsion angles about the N-C𝛼 bond and C𝛼-C bond respectively to represent aspects of the three dimensional protein backbone structure in two dimensions. Some of the atoms involved in the calculation of ϕ and ψ torsion angles of a residue come from the adjacent residues in the protein chain. In this work, the authors consider the ψ angle of residue i and ϕ angle of residue i+1 as an entity and analyze the distribution of these amino acid pairs. Their approach has the advantage of the torsion angle pair being fully contained in an amino acid pair and the ease of representation of these pairs in the familiar Ramachandran like plot. The authors show that their cross peptide bond plot covers more area than the traditional ϕ,ψ plot and identifies certain structural elements that are “recurring outliers” using the traditional plot. They also show some differences in conformational preference between thermophilic and mesophilic proteins. There is an initial attempt at experimental validation, with small stability changes (measured by melting temperature) upon point mutations to amino acids more favored for that specific region of the cross plot; however, this validation is limited and would benefit from examples intended to be neutral and destabilizing. The major strength of the paper is a new concept that is very simple yet also powerful for identifying regions of conformational space that should be considered “valid”, not outliers. In doing so, their method provides a lot of scope for some interesting future work and new ways of validating protein structures, refinement procedures, and structure predictions. The major recurring issue in the manuscript however has been lack of clarity and lack of attention to detail, which can be improved in a future (third?) iteration of the manuscript. The major and minor points of concern are expanded below:

Major points:

1. In Figure 1, the authors claim that “Standard secondary structures (such as left and right 𝛼-helices, 𝛽-strands and turns) are clearly recognizable” from the gray stick plots. While the 𝛼-helices are recognizable especially in clusters 12 and 19, it is not possible to identify 𝛽-strands and turns from these images. More amino acids may have to be added to the visualization on either side of the pair to make this clear.

2. Their analysis on the correlation of cross bond angles using MD simulations need more details and discussion. The text points to Figure 2 and Table S1 to show the correlation but Figure 2 simply shows the five structures which were simulated along with the amino acid pairs and the cross peptide bond plot. It is unclear how to interpret the correlation between ϕ(k) and ψ(k+1) from this figure. Also, their choice of proteins for this analysis seems arbitrary. What do they mean by “small” protein? What is the dataset of structures they started out with before randomly picking these five structures?

3. How do the authors claim that cluster 9 and 10 shown in Figure 4 represent a transition into helix? The ϕ(k+1),ψ(k+1) distribution is not indicative of being part of a helix.

4. The authors refer to Figure 5 in their discussion about the cluster 15 representing the Type II 𝛽 turn. The representatives of the cluster do appear in different structural contexts. But, are they all indeed 𝛽 turns? Do they satisfy other criteria to be called a beta turn - either distance between C𝛼 atoms of i and i+3 residue or H bond between carbonyl oxygen of i and amide hydrogen of i+3. This information is needed to support the statement that type II 𝛽 turns are also common in random coil regions. Also, Figure 5 caption says the representatives are from cluster 6, but we are assuming the authors mean cluster 15.

5. The authors have not mentioned what is the color scale used to color the nodes in Figure 6. We are assuming that warmer color means higher probability of the cluster being occupied by thermophiles. The clusters that are part of the most prominent transitions are very close to each other. Maybe many of the amino acid pairs from thermophiles that are classified into cluster 12 could be part of cluster 19 with some minor change in ψ or ϕ angle. Are the alphafold predicted structures accurate enough to distinguish between such close ψ,ϕ angles? These observations could also be a result of inherent biases in alphafold. Perhaps the authors could also analyze experimental structures of mesophiles and thermophiles to see if these trends hold.

6. The statement “Visual inspection of the most prominent cases suggests that the preferred clusters in thermophiles, presumably the more thermostable ones, are those which appear more ordered” is not well supported. If this statement is being made purely based on the cartoon representation of the two cluster transitions shown as inset in Figure 6, then it does not look convincing, especially 1 to 11. Perhaps the authors could analyze the extent of disorder in a more systematic way by comparing preferred clusters of thermophiles and mesophiles and quantitatively look at the difference in disorder, if any. The authors can also show specific examples of transition matrix along with pictorial representations of differences in angles/planes so that the reader can understand them better.

7. In the context specific mutation analysis, the differences in ΔTm observed are very small and the raw DSF data is not shown in the supplemental figure. Are these changes significant enough to conclude about the effect of these context specific mutations on protein stability? Additional experiments are likely needed to place these ΔTm changes in context. What is the typical change in ΔTm for a predicted neutral or deleterious mutation? Can anything be inferred based on prior deep mutational scans for GFP or another protein to help give this analysis more power?

8. Over the years, Ramachandran angle restraints have become part of structure refinement protocols, which when applied inappropriately, could lead to over-optimization of ϕ-ψ angles. For such cases a simple Ramachandran validation will fail to identify issues in the structure and needs a more global approach such as the Ramachandran Z-score (https://www.sciencedirect.c .... From the way the authors’ approach is designed, it could have potential for a similar application. They have shown in Figure 3 how outliers of Ramachandran plot fall into acceptable regions in their cross-peptide bond plot. Along these lines, the authors should discuss about global validation metrics derivable from their method (perhaps related to the normalized marginal distribution distance shown in Figure 7)

Minor points:

1. In the introduction section, the authors mention the disadvantage of the (ϕ,ψ)2 method. But apart from mentioning the protein blocks (PB) method, they don’t explain how their approach improves over the PB method. It’s important to place this work in context to PB since PB’s have been used for several structure related applications. Another related work is TERMs (https://www.sciencedirect.c ... where the protein structure is broken down into smaller structural entities which are then used to assess model quality, sequence/structure compatibility, conformational transitions, etc. The authors could discuss about the relationship of between their work and TERMs

2. The authors should add labels to panels within figures instead of addressing them as top, middle, etc.

3. What is the correct resolution cut off used to extract structures from PDB? Results and discussion says 1.8Å but methods say 1.5Å. Also, the authors need to provide the list of PDB structures used.

4. The authors have probably cited the wrong reference for the proteomes of mesophile/thermophile bacterial pairs in materials and methods

5. Page 5, para 2, text says Figure 7 while it’s actually referring to Figure 6.

6. Page 5, para 3, Fig S3 should be changed to Fig S4

7. Page 6, para 1, unclear if authors are referring to Fig S2 or Figure 7 since Figure S2 does not have the FH and YH distribution. Same para, reference to Figure 6 instead of Figure 7.

8. In Materials and Methods, page 7, under “Calculation of correlation coefficients between dihedral angles”, it should be Figure 2

PREreview of Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues

Competing interests

Comments