Comentários
Escrever um comentárioNenhum comentário foi publicado ainda.
Summary:
This preprint investigates whether the SARS-CoV-2 spike (S) protein accessed new thermodynamically viable sequence space during variant evolution. The authors use multiple computational methods (position-specific scoring matrices, environment-specific substitution tables, FoldX, and Rosetta) to predict structural constraints across the wildtype and different viral variant S protein structures (Alpha, Delta, Omicron). They validate these predictions using temporal substitution frequency data from the GISAID database and develop a machine learning framework that combines phylogenetic, thermodynamic, and structural features to predict mutational viability.
The authors find that structural constraints have remained remarkably stable throughout variant evolution, with no long-term directional change that would indicate movement of the protein through sequence space. They report that signature mutations associated with variants of concern (VOCs) were structurally viable from the beginning of the pandemic and did not arise through changes in thermodynamic accessibility. Only one substitution, C336H, showed evidence of being enabled by structural changes in the Omicron variant, representing less than 1% of analyzed substitutions.
The authors conclude that despite rapid phenotypic evolution, the SARS-CoV-2 S protein operates within strict structural constraints, suggesting viral adaptation occurs through sampling novel combinations of already-viable mutations rather than relaxed structural constraints enabling previously forbidden mutations.
Strengths
· Contextualization. The authors clearly situate their work within the broader questions about saltatory evolution and chronic infection dynamics. They explicitly frame their research question: did VOCs due to the S protein accessing new regions of thermodynamically permitted sequence space or by sampling novel combinations of already-viable mutations? They also connect findings to broader questions about RNA virus evolvability, genomic fragility, and long-term structural constraints on viral adaptation.
· Scope and temporal granularity. While most prior work focused on the receptor-binding domain (RBD), in this study, the authors analyzed the entire S protein across the wildtype and three distinct variant structures with monthly model retraining from December 2020 - December 2022. This work provides valuable insight into how substitution predictions change over the critical period of VOC emergence (Alpha, Delta, early Omicron), achieving comprehensive coverage and impressive resolution for temporally tracking structural constraint dynamics.
· Machine learning strategy and implementation. The authors' integration of phylogenetic (PSSM), structural (ESST, local features), thermodynamic (FoldX, Rosetta), and immunological (epitopes) predictors represents a genuinely novel and comprehensive approach to constraint prediction, with the various features capturing complementary information on constraint mechanisms. They demonstrate a good awareness of overfitting risks and implement multiple layers of protection: recursive feature elimination, ensemble approach with 10 subsampled models, extensive cross-validation, and testing across different temporal windows and structural backgrounds (WT, Alpha, Delta, Omicron). The consistent but modest performance (ROC AUC 0.78-0.81) across all validation schemes suggests that the model captures genuine, context-independent biological constraints rather than noise or dataset artifacts.
Major Comments
· Circular validation framework. The authors assume substitution frequency reflects structural viability, validate their predictive tools against frequency-based labels, and then use these predictors to conclude that structural constraints (as measured by frequency) remained stable. This is circular. Frequency actually reflects many selective pressures (e.g., founder effects and lineage competition, transmission fitness, immune selection) beyond structural viability. When the researchers validate RBD predictions against Starr et al. (2022) deep mutational scanning data for protein expression, performance improves substantially (ROC AUC 0.93 vs 0.81), suggesting frequency-based labels are indeed noisy. The authors acknowledge this limitation: “substitution's frequency in the global database is not necessarily a true reflection of its viability in all cases” and mention “antagonistic pleiotropy,” “intra- vs inter-host fitness effects,” and “sequencing bias.” However, they don't adequately address that this undermines the validity of their central conclusions about structural constraint stability.
Recommendations: Compile available experimental measurements of the full S protein structural properties (protein expression levels, thermal stability, functional assays that isolate structure from fitness) from published literature and use those as their primary validation labels. Test if structural predictors explain frequency after controlling for known non-structural selective pressures. Where experimental and frequency data both exist, quantify their concordance to estimate how much non-structural signal contaminates frequency-based labels.
· Binary Classification Approach. The authors classify substitutions as “viable” (high frequency) vs. “deleterious” (low frequency) using the mean frequency as a threshold. However, this appears to be arbitrary rather than biological as they do not justify their cutoff choice. It is entirely possible that many low-frequency substitutions may represent recent mutations, geographically restricted variants, or gradully evolving changes rather than structurally constrained substitutions. Moreover, 15% of substitutions flip categories between timepoints, suggesting the threshold captures noise rather than pure biological boundaries.
Recommendations: Replace the arbitrary mean-based threshold with biologically-motivated cutoffs derived from experimental studies of SARS-CoV-2 spike protein fitness effects (Starr et al., 2020). If an explicit justification cannot be provided for any threshold choice through comparison with experimental data about the spike protein, consider implementing a comprehensive threshold sensitivity analysis testing multiple classification schemes such as quartile-based thresholds or fixed frequency cutoffs, and continuous modeling approaches rather than binary classifications.
· Phylogenetic non-independence in frequency calculations. The SARS-CoV-2 sequences in the GISAID database form a phylogenetic tree where closely related sequences share mutations through common ancestry, not independent selection events. Therefore, treating them as independent observations inflates frequencies for mutations in highly-sampled lineages. A substitution might appear "high frequency" simply because its lineage was heavily sequenced, not because it's structurally viable. This biases labels used to train and validate the models.
Recommendations: Apply phylogenetic corrections such as phylogenetic independent contrasts or lineage-weighted frequencies before calculating substitution frequencies. Compare results using both raw frequencies and phylogenetically-corrected frequencies to assess the impact on conclusions.
· Lack of multiple testing correction and statistical power analysis. There is an extensive multiple testing problem inherent in the study design and no correction method applied to mitigate it. The authors conduct monthly model retraining across 25 time points from December 2020 to December 2022 with predictions on ~23,000 substitutions (1273 positions × ~19 possible changes), across 4 protein structures. The authors make some negative claims (e.g., "no temporal constraint changes"), but without power analysis, they can't distinguish “no change detected” from “insufficient power to detect real changes.”
Recommendations: Implement multiple testing corrections using False Discovery Rate (FDR) or Bonferroni methods. Conduct formal statistical power analyses to determine the minimum effect size detectable given the sample sizes and analysis methods used. Calculate and report confidence intervals for the various temporal comparisons and explicitly acknowledge limitations in detecting small but potentially biologically meaningful changes.
The author declares that they have no competing interests.
The author declares that they did not use generative AI to come up with new ideas for their review.
Nenhum comentário foi publicado ainda.