SHAP‑Guided CpG Selection with Ensemble Learning for Epigenetic Age Prediction
- Posted
- Server
- bioRxiv
- DOI
- 10.64898/2026.02.20.707142
Abstract Epigenetic biomarkers offer critical insight into biological aging and disease risk, yet most deep learning models lack interpretability and generalization across tissues. We present a reproducible pipeline for interpretable age classification using SHAP-guided CpG prioritization, enhancer and gene annotation, and stacked ensemble modeling. Across both blood and brain samples (GSE41826, GSE40279), certain CpGs showed reproducible age-linked methylation changes. Comparative performance metrics, SHAP breakdowns, and CpG-level stability analyses support their potential as cross-tissue anchor sites.. A multi-model ensemble combining XGBoost, MLP, TabTransformer→XGBoost, and LightGBM yielded high predictive accuracy (92.4%) and macro F1 of 92.3%. Biological support for these findings stems from motif scans, enrichment results, and visual mapping of CpG-to-gene relationships using Sankey diagrams. Delta-based stacking improved prediction confidence in borderline age groups, notably boosting middle-age recall through complementary model behavior. This work lays the groundwork for explainable epigenetic clocks that transcend tissue boundaries.