Skip to main content

Write a PREreview

Beyond PAM50: Unsupervised Discovery of Anomalous Subgroups in Breast Cancer

Posted
Server
bioRxiv
DOI
10.1101/2025.05.11.653361

Breast invasive carcinoma (BRCA) exhibits molecular heterogeneity not fully captured by classifiers like PAM50. I applied an ensemble of four unsupervised anomaly detection algorithms Isolation Forest, One-Class SVM, Local Outlier Factor, and Autoencoder to ∼13,400 gene expression profiles from 1,218 TCGA-BRCA RNA-seq samples, identifying 41 High-Concordance Anomalies (HCAs) consistently flagged by three or more methods. HCAs showed marked downregulation of ∼1,750 genes, strongly enriched for immune-related pathways such as T-cell activation and cytokine signaling, indicating an “immune-cold” phenotype. In contrast, ∼160 upregulated genes were associated with metal ion response, metabolism, and developmental programs. Over half of the HCAs were PAM50_Unknown. Within the Basal-like subtype, a subset of HCAs (HCA-Basal, n=7) exhibited even stronger immune suppression, with 499 additional immune genes downregulated, defining an “ultra-immune-cold” variant. Upregulated genes in HCA-Basal lacked coherent pathway enrichment. While not statistically significant, HCA-Basal cases (n=5 with survival data) showed a trend toward poorer prognosis. These findings reveal a distinct, immune-suppressed BRCA subgroup often missed by current classifiers, with potential relevance for risk assessment and treatment.

You can write a PREreview of Beyond PAM50: Unsupervised Discovery of Anomalous Subgroups in Breast Cancer. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now