Comments
Write a commentNo comments have been published yet.
This preprint profiles somatic and germline variation in largely HPV-negative head and neck/oral squamous cell carcinoma from South Asia with prominent chewing tobacco/areca nut exposure, using whole-exome sequencing (WES) across 103 patients aggregated from three South Asian cohorts (A–C) and compared to TCGA HNSCC (cohort D). The authors report recurrent alterations in canonical HNSCC drivers (e.g., TP53, CDKN2A, NOTCH1), nominate several potentially population/exposure-enriched hotspot mutations (e.g., TRIM48 p.I44T; POLQ p.A187T in high-TMB cases; MAP3K19 p.H1282Y; CDC20 p.R162Q), and identify recurrent copy-number changes including EGFR amplification and TP73-region deletion. They also propose a germline SDHA variant (p.S456L) as a potential South Asian susceptibility factor, motivated by its recurrence across cohorts and a trend toward younger diagnosis age among carriers.
Cohort heterogeneity: The study merges (i) prospectively collected cohort A containing both FFPE and fresh biopsies, (ii) two external raw FASTQ cohorts (B/C), and (iii) TCGA “pre-annotated” MAF calls, while also mixing genome builds (hg38 for A/B/C vs hg19 for ICGC raw; plus liftover to run MutSig2CV), which creates strong potential for batch effects and non-biological differences in mutation discovery rates and spectra. This concern is amplified by the authors’ own observation that FFPE samples have much higher mutation counts than fresh tissues, consistent with known FFPE-associated artifact risk, yet the analysis does not clearly document artifact-mitigation steps (e.g., deamination/OxoG filtering) beyond “standard” preprocessing.
Tumor mutation burden (TMB) definition appears unclear and possibly inflated: The manuscript reports a median somatic variant count per patient of ~1050 and a median TMB of ~21 mutations/Mb across cohorts A–C, with some patients having >14,000 mutations, but does not specify the exact callable/exome target size, inclusion/exclusion rules (synonymous? indels? PASS-only?), or artifact handling used to compute TMB. Given the stated FFPE inflation of mutations and variable depths across cohorts, the central TMB estimate (and downstream claims about “high TMB” subsets and POLQ association) is difficult to interpret without a transparent, cohort-stratified TMB pipeline and QC.
Reporting clarity and reproducibility gaps: Key parameters are missing or not explicit (e.g., Mutect2 settings and filters; panel-of-normals usage; contamination estimation; tumor purity estimates; handling of low-depth samples; thresholds for hotspot calling beyond “≥10 reads” in places). The rationale for restricting CNVs in cohorts B/C to those overlapping cohort A (and the effect on sensitivity/specificity) should be justified more explicitly.
Presentation/terminology: Several passages conflate “head and neck cancer,” “HNSCC,” and “oral cancer” without consistently defining the included subsites, while cohort C is tongue-only and cohort A/B are buccal-predominant, which may strongly shape mutational patterns.
The evidence is moderate for describing a South Asian, largely HPV-negative WES-based mutational landscape and confirming frequent alteration of canonical HNSCC genes (TP53, CDKN2A, NOTCH1) within the aggregated cohorts. The evidence is limited for claims of cohort-specific “novel drivers,” for POLQ p.A187T as a distinctive high-TMB marker, and for SDHA p.S456L as a susceptibility variant, because key confounding (batch/FFPE artifacts, cohort integration effects, lack of external control comparisons, limited replication/validation) is not fully resolved.
Tighten cohort harmonization: Re-run key analyses with a unified pipeline across A–C (and, if possible, reprocess TCGA raw BAM/FASTQ rather than using pre-annotated MAF), and report batch-aware sensitivity analyses (by cohort, site, stage, tissue type FFPE vs fresh, and sequencing depth).
Make TMB rigorous and interpretable: Define TMB precisely, report callable Mb per sample, provide cohort-stratified TMB distributions, and show how high-TMB cases were identified and whether they remain high under stricter filtering.
Temper clinical inferences: Present EGFR amplification and cetuximab-related statements as hypotheses, and add validation of EGFR CNV calls (e.g., FISH, qPCR, or SNP array) plus any available clinical correlations if claiming therapeutic relevance.
The authors declare that they have no competing interests.
The authors declare that they did not use generative AI to come up with new ideas for their review.
No comments have been published yet.