Skip to main content

Write a PREreview

SHAP‑Guided CpG Selection with Ensemble Learning for Epigenetic Age Prediction

Posted
Server
bioRxiv
DOI
10.64898/2026.02.20.707142

Abstract Epigenetic biomarkers offer critical insight into biological aging and disease risk, yet most deep learning models lack interpretability and generalization across tissues. We present a reproducible pipeline for interpretable age classification using SHAP-guided CpG prioritization, enhancer and gene annotation, and stacked ensemble modeling. Across both blood and brain samples (GSE41826, GSE40279), certain CpGs showed reproducible age-linked methylation changes. Comparative performance metrics, SHAP breakdowns, and CpG-level stability analyses support their potential as cross-tissue anchor sites.. A multi-model ensemble combining XGBoost, MLP, TabTransformer→XGBoost, and LightGBM yielded high predictive accuracy (92.4%) and macro F1 of 92.3%. Biological support for these findings stems from motif scans, enrichment results, and visual mapping of CpG-to-gene relationships using Sankey diagrams. Delta-based stacking improved prediction confidence in borderline age groups, notably boosting middle-age recall through complementary model behavior. This work lays the groundwork for explainable epigenetic clocks that transcend tissue boundaries.

You can write a PREreview of SHAP‑Guided CpG Selection with Ensemble Learning for Epigenetic Age Prediction. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now