PREreview del TissueFormer: Extending single-cell foundation models to predict population-level phenotypes

por Bivas Nag, James Fraser, Jane E. Oberhauser y Brian Cheung

Publicado: 22 de mayo de 2026
DOI: 10.5281/zenodo.20334841
Licencia: CC BY 4.0

The preprint titled "TissueFormer: Extending single-cell foundation models to predict population-level phenotypes" presents TissueFormer, which builds on the Geneformer architecture to process groups of (rather than individual) ranked single-cell gene expression inputs simultaneously, enabling predictions based on the collective composition and transcriptional signatures of cells in a sample for whole-sample classification. The authors apply this approach to predicting cortical region of origin from mouse brain spatial transcriptomics, including with neurodevelopmental perturbations, and to predicting COVID-19 severity using peripheral blood scRNA-seq data from three independent cohorts.

Overall, this preprint presents a transformer-based neural network model that demonstrates the power of jointly modeling the transcriptional profiles of groups of cells rather than modeling cells individually for sample classification. This TissueFormer approach outperforms existing deep learning regional classification methods (aggregation/pseudobulk and summary statistics-based methods), predicting cortical maps that capture inter-individual variability in brain areas overlooked by established transcriptomic tissue atlases, and identifying compositional signatures of COVID severity. Population-aware single-cell analysis is of increasing importance for tissue annotation and addressing cell type compositional and transcriptomic changes in disease. TissueFormer represents a promising tool that is well validated on retrospective cases where “ground truth” is known. The discussion in text would benefit from a more concrete explanation of its potential prospective use cases. The authors may find the below comments helpful for strengthening this manuscript.

Major points:

TissueFormer’s compatibility with existing tools/atlases is not fully established. A big potential use is to ground traditional (non spatially registered) scRNA-seq on an established atlas to drive down costs and increase sample size at near “spatially resolved” quality. The authors demonstrate use of TissueFormer on the Allen Brain Atlas cortical reference but do not discuss what other reference atlases or labeled datasets could serve as training data for new applications (for instance, the BRAIN Initiative Cell Census Network (BICCN) or cell type atlases from other organisms and tissue types.) The practical value of TissueFormer would be clarified by a more explicit discussion of the reference datasets and atlases with which TissueFormer could be best combined to generate meaningful biological insights. Researchers who use TissueFormer may seek to draw upon large public datasets to inform downstream experiments. Upon our initial readthrough, this perspective was not clearly addressed. Explicitly identifying the types of studies in which TissueFormer would prove most useful and clarifying TissueFormer’s role in the analysis pipeline in the discussion would help connect this tool with its most likely prospective users.

Related to this, TissueFormer as presented requires large quantities of labeled training data for each new prediction task, which may limit its applicability to researchers in data-limited contexts. Figure 2 supplement 1e begins to address this by varying training set size, but the analysis could be extended into practical guidance for instance, the minimum dataset size at which TissueFormer is expected to outperform simpler compositional tools (logistic regression on pseudobulk or cell type composition) in a given application. Although task dependent, a workthrough of the existing concrete costs for collecting the data and the interpolated cost savings for a threshold level of performance (e.g. costs of spatial transcriptomics of the mouse brain) would give tangibility to the benefits of TissueFormer. Combined with explicit discussion of which reference atlases and labeled datasets are suitable training sources, this would help prospective users assess whether TissueFormer is the right tool for their dataset.

TissueFormer has the potential to enable broad-ranging, novel biological discovery and would benefit from increased discussion. As presented, TissueFormer is framed primarily as a sample classifier, but the referenced benchmark papers position similar tools as instruments for biological discovery rather than just labeling. The scAGG authors point out that a classifier trained to distinguish Alzheimer's positive vs. negative samples can be used to identify the underlying cell type orderings and gene expression trends that correspond with disease severity, and CellCnn uses "a representation learning approach to detect rare cell subsets associated with disease." We feel the TissueFormer discussion currently underplays this dimension, and an expanded discussion section drawing out these parallels would help. One concrete way to demonstrate this would be to build on the existing "Leave-one-cell-type-out" analysis for example, by extending it to identify cell type subsets or transcriptional programs that drive prediction within a given label, or by reporting which cells the model attends to most strongly across samples. This would shift the framing from "TissueFormer assigns a label" to "TissueFormer can be interrogated to reveal which cellular features underlie that label," which we feel is where the real value lies.

We also invite the authors to speculate on how TissueFormer might be experimentally transformative by broadening what researchers can accomplish for less financially. For instance, could TissueFormer enable hypothesis-generating single-cell brain region mapping to replace more expensive spatial transcriptomics, or allow greater throughput in phenotypic screening experiments, or even change how we approach exploratory studies? A worked through example, perhaps in the mouse enucleation context, would provide a concrete financial calculation demonstrating TissueFormer’s advantage.

The applicability of TissueFormer to non-spatial single-cell data versus spatial transcriptomic data deserves a clearer treatment. The authors demonstrate TissueFormer on both modalities BARseq for the brain and droplet-based scRNA-seq for COVID-19 but do not explicitly discuss how the model's behavior, group construction strategy, or expected performance differs between the two. For instance, the brain task uses spatially defined cortical columns, while the COVID task uses per-donor cell pools. As a core differentiator to existing work, providing guidance on how a user should construct groups for new applications, and where the model is most appropriate for spatial vs. non-spatial data, would make the method more accessible to a broader audience.

Reframing of the pre-training ablation test. The authors compare pre-trained TissueFormer to a non pre-trained, “randomly initialized” TissueFormer model, finding that pre-training only provides an advantage when the input dataset is small. They note, “Thus, this spatial transcriptomic dataset [Chen et al., 2024] is large enough that transfer learning from Murine Geneformer offers no advantage.” We would encourage the authors to repurpose this honest test as a point of strength for their model, identifying how mapping TissueFormer accuracy across different sample sizes with and without pre-training can be adapted to illustrate the minimum necessary sample size and training dataset size in different use cases for confident model prediction.

Similarly, the manuscript would benefit from a more detailed characterization of the sensitivity and resolution of TissueFormer’s predictions. The enucleation experiment in Figure 4 demonstrates that TissueFormer can detect gross differences in visual cortex area between control and enucleated mice. However, it remains unclear to what degree TissueFormer is capable of detecting more subtle phenotypes. This is touched upon in the authors’ characterization of mild vs. severe COVID, but the authors should discuss what effect sizes are reliably detectable at what level of precision and how both of these factors scale with sample size.

Minor points:

TissueFormer is positioned as an extension to single-cell foundation models by providing population-level prediction, yet its relationship to other foundation models is discussed only briefly. The authors discuss the strengths of TissueFormer over foundation models such as scGPT (Wang et al., 2025) and graph based models like HEIST (Madhu et al., 2025) and CI-FM (You et al., 2025). It would be helpful to include in the discussion section a more explicit evaluation of TissueFormer’s unique niche relative to other supervised compositional classifiers (i.e. CellCnn; Arvaniti et al., 2017) and self-supervised spatial foundation models (i.e. NicheFormer, Tejada-Lapuerta et al., 2025; SWOT, Wang et al., 2025). This would clarify for the reader the extent of TissueFormer’s applicability and address what kinds of biological questions each class of tool is best suited to address.
Recommended extensions of cortical mapping approach in Figures 3 and 4. In their validation of reduced visual cortex size in enucleated mice, it would be useful for the authors to present a comparison with an established reference atlas in the control (non-enucleated) mice as a baseline. This would establish that TissueFormer's region predictions in unperturbed brains are concordant with the standard atlas before interpreting deviations in the enucleated condition as biological signal. Without this baseline, it is harder to disentangle a real effect of enucleation from model-specific variability, since Figure 3 already shows that TissueFormer's predictions can diverge from CCF labels even in normal brains. Relatedly, it would be interesting to interrogate or speculate whether, if TissueFormer's cortical mappings were repeated over data from a greater number of mice (i.e. n = 20+), the mapped cortical region boundaries would converge toward the atlas average. This would help clarify how much of the inter-animal variability shown in Figure 3 reflects true biological differences between individuals versus statistical noise from the small sample size (n = 4). If boundaries converge to the atlas with more animals, the current inter-animal differences may reflect sampling variability; if they remain distinct, this would strengthen the claim that TissueFormer captures real anatomical variation that registration erases.

It is unclear whether the authors took into account differences in sequencing methods between the publicly sourced COVID-19 datasets. The combined-cohort analysis in Figure 5a pools data across three studies that used different 10x sequencing chemistries (10x 5' v1, 10x 5' v2, and 10x 3' v3), which differ in mRNA capture, sensitivity, and gene detection. While the individual-cohort graphs are not affected by this since sequencing chemistry is constant within each study, the combined-cohort result could in principle be influenced by chemistry-specific signatures correlating with cohort-level differences in severity composition. Supplemental analysis demonstrating that performance on the combined cohort is driven by biological signal rather than batch effects from sequencing technology would strengthen the generalizability claim.

Competing interests

The authors declare that they have no competing interests.

Use of Artificial Intelligence (AI)

The authors declare that they did not use generative AI to come up with new ideas for their review.

Comentarios

Escribir un comentario

No se han publicado comentarios aún.