Structural semantic evolutionary distance (SSED) unifies the selection of cancer driver genes across macroevolution and tumorigenesis
- Posted
- Server
- bioRxiv
- DOI
- 10.64898/2025.12.17.694808
The non-random, site-specific enrichment of somatic mutations in cancer driver genes (CDGs) suggests their emergence is governed by underlying evolutionary constraints. However, quantitative methods to define these constraints and their underlying principles remain underexplored. To address this, we introduce the Structural Semantic Evolutionary Distance (SSED), a metric leveraging the pretrained ESM-3 protein language model to quantify evolutionary divergence within a unified structural semantic space. Our analysis demonstrates that CDGs are subject to persistent structural semantic constraints across species, tolerating a significantly narrower range of structural semantic changes during evolution compared to non-CDGs. Crucially, clinically observed oncogenic mutations follow this same principle, favoring minimal structural perturbation as shaped by long-term gene evolution. Such mutations maintain core protein function while conferring a capacity for immune evasion, thereby driving clonal expansion. Guided by this “evolutionary constraint” framework, we successfully predicted and experimentally validated a previously uncharacterized oncogenic mutation, KRAS R135L, in bronchial epithelial cells. Furthermore, clinical cohort analysis demonstrated that SSED acts as an independent predictor of response to immune checkpoint blockade, offering information orthogonal to tumor mutational burden (TMB). This study unifies the evolutionary principles governing CDGs across macroevolutionary and microevolutionary (tumorigenesis) timescales, elucidates the balance between structural adaptability, functional conservation, and immune pressure, and identifies a novel predictive biomarker for cancer immunotherapy.