Multi-Omic Integration and Machine Learning Reveal Regulatory Networks Driving Breast Cancer Progression
- Posted
- Server
- Preprints.org
- DOI
- 10.20944/preprints202512.0929.v1
Breast cancer progression from early to late stages involves complex molecular changes that traditional anatomic staging inadequately captures. Integration of microRNA (miRNA) and messenger RNA (mRNA) expression profiles through machine learning offers potential for identifying biological markers that distinguish progression states independent of tumor size and lymph node status. This study analyzed 1,081 primary breast cancer samples from The Cancer Genome Atlas with combined miRNA-Seq and RNA-Seq data, stratified into early-stage (Stage I-II, n=822) and late-stage (Stage III-IV, n=259) groups. Following variance-based feature selection retaining 3,000 high-variability features and sample-level log2-CPM normalization, nested 5-fold cross-validation with stratified sampling addressed the 3.2:1 class imbalance. Nine machine learning algorithms were evaluated, with XGBoost selected for final modeling after Bayesian hyperparameter optimization. The integrated miRNA-mRNA XGBoost classifier achieved test set accuracy of 79.8% (95% CI: 73.2-85.3%) with AUC 0.687 (95% CI: 0.622-0.748), outperforming single-platform mRNA-only models (AUC 0.654) and miRNA-only approaches (AUC 0.612). Top discriminative features included miR-21-5p, miR-155-5p, miR-200c-3p, and miR-145-5p among miRNAs, alongside mRNA targets PIK3CA, CCND1, MYC, and ERBB2. Network analysis revealed three core regulatory modules: epithelial-mesenchymal transition controlled by the miR-200 family targeting ZEB1/ZEB2, metabolic reprogramming via the miR-155/HK2 axis enhancing glycolysis, and immune evasion through miR-34a/PD-L1 regulation. Differential expression analysis identified 15 significant miRNAs and 194 significant mRNAs distinguishing progression groups. Hub miRNA analysis revealed 15 miRNAs with extensive target networks ranging from 97 to 516 targets. Multi-omic integration of miRNA and mRNA expression captures biological progression signatures beyond anatomic staging, with moderate but consistent classification performance validated through rigorous statistical methods. The identified regulatory networks provide mechanistic insights into progression drivers and potential therapeutic vulnerabilities applicable across diverse populations and resource settings.