PREreview del Deep generative models generate mRNA sequences with enhanced translation capacity and stability

por Steelblue Lemur

Publicado: 7 de abril de 2026
DOI: 10.5281/zenodo.19461898
Licencia: CC BY 4.0

In this work, the authors present transformer-based generative models for mRNA coding sequences (GEMORNA-CDS) and untranslated regions (GEMORNA-UTR). They demonstrate that GEMORNA produces mRNA sequences that are superior to a number of benchmarks across several metrics for both linear and circular RNAs.

The authors conclude that GEMORNA has an unprecedented ability to design mRNA sequences and outperforms all existing methods. This should be tempered. GEMORNA does demonstrate improvement relative to fixed sequences when used as part of a sampling and multi-step selection pipeline, but this is arguably not a clear method-level comparison with the tested alternatives for several reasons.

Benchmarks are not comparable. The authors make several claims for GEMORNA’s superiority based on comparisons to alternative models. These comparisons have major issues that undermine their value.

LinearDesign is specifically not designed for the task. LinearDesign, previously published by the authors, encodes mRNA through a deterministic finite-state automaton to generate optimized sequences. It was not designed or tested with modified mRNA, while the experimental validation set here consists of m1Ψ-modified mRNA. This is not a reasonable benchmark.
BNT162b2 was not optimized for the exact task it is being tested on. BNT162b2 design reflects a broad set of constraints, including process and clinical necessities that are not tested in this work. Manufacturing factors (transcription efficiency, storage degradation kinetics, batch reproducibility, LNP encapsulation efficiency, etc) are major factors for commercial designs that are not relevant for in vitro or preliminary model systems. As a result, the use of BNT162b2 as a baseline has limitations.

GEMORNA is not identically compared with alternative methods. The authors compare GEMORNA-CDS and GEMORNA-UTR sequences to natural and other computational designs (Fig 2F-G, 3E-G). In each case, they compare multiple GEMORNA designs to a single competing design. The GEMORNA designs show a distribution of luciferase activities that extend above and below the benchmarks. An appropriate comparison would include multiple sequences from each method rather than a single sequence from each. As a result, the comparison is better understood as stating that the distribution of GEMORNA sequences extends beyond one example of a sequence from a particular method. This does not address whether GEMORNA is uniquely superior to alternative methods.

Given the many-to-one comparisons, multiple comparison correction should be performed, but from the text it does not appear to have been. This renders statistical claims questionable.

Naturalness is a problematic metric. The authors define a “naturalness” metric that is essentially a model-assigned likelihood. This name suggests a relationship to natural sequences rather than the model. This is exemplified by “natural” sequences having a lower mean “naturalness” than GEMORNA-generated ones. This risks becoming a circular argument.

Minor issues

Averaging r values is invalid. In figure 2E, the authors show the correlations between different metrics and several experimental datasets and claim that naturalness, which GEMORNA increases, is most correlated with expression. They summarize these correlations by averaging the r values for each metric across all datasets. This is statistically unsound and a Fisher transform should have been performed.

Several plots use truncated axes with linear scales. Figures 4D, 4E, and 4G contain data that is plotted on a linear y-axis but does not begin at the origin. This has the effect of emphasizing differences between samples. An indication that a truncation was made should have been added.

Identical data is displayed multiple times. Figure 4H contains a plot of the time course of IgG titer produced by mRNA from different designs. Figure 4G shows the IgG titer at one time point from 4H. The choice of time point is not motivated. This is used to show a statistical difference between two values, but the truncation issue mentioned above is also involved here.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.

Comentarios

Escribir un comentario

No se han publicado comentarios aún.