Skip to PREreview

PREreview of RAPTOR-GEN: RApid PosTeriOR GENerator for Bayesian Learning in Biomanufacturing

Published
DOI
10.5281/zenodo.18741398
License
CC BY 4.0

Peer Review: RAPTOR-GEN: Rapid Posterior Generator for Bayesian Learning in Biomanufacturing

Manuscript ID: arXiv:2509.20753v2 Reviewer(s): [Lauretta Ojo - Imoukhuede- Senior Reviewer — Bioprocess Systems Engineering] Journal Target (Assumed): Operations Research, INFORMS Journal on Computing, Biotechnology and Bioengineering, or Journal of Machine Learning Research

1. Summary Assessment

This manuscript presents RAPTOR-GEN, a mechanism-informed Bayesian learning framework for parameter inference in stochastic reaction networks (SRNs) that underpin biopharmaceutical manufacturing processes. The framework integrates two principal components: a Bayesian-updating pKG-LNA metamodel that applies linear noise approximation (LNA) to SDE-based mechanistic models for tractable likelihood derivation, and an LD-LNA posterior sampling method inspired by Langevin diffusion that avoids step-size discretisation of the underlying SDE. The paper is technically ambitious, mathematically rigorous, and addresses a genuinely important problem — Bayesian inference under sparse, heterogeneous, partially observed data in high-stochasticity bioprocess systems.

The theoretical contributions are substantive: the paper proves finite-sample convergence bounds (Theorem 4, Corollary 2), establishes a connection to the Bernstein–von Mises theorem (Corollary 1), and provides strong and weak convergence guarantees for the algorithmic implementation (Theorems 5 and 6). The empirical validation on enzyme kinetics and prokaryotic autoregulation gene networks demonstrates credible superiority over the unadjusted Langevin algorithm (ULA) and comparative advantages over ABC-SMC.

However, the manuscript exhibits several weaknesses that temper its publication readiness. The empirical evaluation is limited to two synthetic benchmark problems; no real experimental biomanufacturing data are used. The practical scalability of the framework to high-dimensional parameter spaces (beyond Nθ = 8) is insufficiently characterised. The writing is dense and occasionally imprecise in linking theoretical guarantees to the practical scenarios of interest. The regulatory and industry-translation framing, while present in the introduction and conclusion, is not substantiated by the experimental design.

2. Major Concerns

M1. Absence of Real Experimental Data. All numerical experiments use synthetic data generated by the Gillespie algorithm — the same stochastic simulation engine embedded in the theoretical framework. This constitutes a circularity risk: the model is validated against data it effectively generated. The manuscript makes strong claims about applicability to "real-world biomanufacturing processes," yet presents no evidence from actual cell culture, fermentation, or gene therapy manufacturing data. At minimum, one application to published experimental bioprocess data (e.g., CHO cell culture, iPSC expansion) is required to substantiate these claims credibly.

M2. Scalability to Realistic Parameter Dimensions. The largest experimental case involves Nθ = 8 parameters (Section 7.2). Industrial bioprocess models (e.g., metabolic flux models, multi-scale kinetic models) routinely involve tens to hundreds of parameters. The finite-sample bound in Theorem 4 scales as Nθ^(9/2) · λ_max^(3/2), which implies rapid degradation of approximation quality with increasing dimensionality. This scaling behaviour is concerning and is only cursorily acknowledged in the concluding remarks. The paper should provide either empirical evidence of performance at higher Nθ or a more transparent discussion of the dimensional limitations.

M3. Prior Sensitivity and Prior Selection Justification. Throughout the experiments, uniform priors are used without justification or sensitivity analysis. In biomanufacturing Bayesian inference, prior specification is a critical methodological choice, particularly under sparse data regimes (H = 4, 8 observations). The manuscript does not investigate how inference quality varies with prior choice, nor does it provide guidance on principled prior elicitation from domain knowledge — an important practical omission given the paper's stated industrial motivation.

M4. The LNA Approximation Error Accumulation. Section 4.3 acknowledges that LNA error accumulates over time and proposes sequential Bayesian updating as a mitigation. However, the magnitude of this accumulation as a function of SRN nonlinearity and observation interval length is not quantified empirically. Figure 4 shows that discontinuities arise in the inferred trajectories from the sequential updates. The paper does not characterise when these discontinuities are pathological versus benign, nor does it provide convergence diagnostics for the sequential updating procedure.

M5. Comparison Baseline Insufficiency. The primary algorithmic comparison is against ULA, a relatively weak baseline. The paper briefly compares against ABC-SMC (Appendix EC.8.2) but does not compare against other likelihood-based Bayesian methods that are competitive in this setting, including: (i) particle MCMC (Andrieu et al. 2010), (ii) NUTS/HMC-based posterior samplers, or (iii) variational inference approaches adapted to SDE systems. The absence of these comparisons limits the ability to assess RAPTOR-GEN's standing relative to the state of the art.

M6. Gaussian Measurement Error Assumption. The observation model (Equation 4) assumes additive Gaussian measurement noise with a diagonal covariance structure. Biomanufacturing assays (e.g., flow cytometry, HPLC) frequently exhibit non-Gaussian, heteroscedastic, and even count-distributed measurement errors. The paper does not discuss the sensitivity of the framework to violations of this assumption, nor propose extensions to non-Gaussian likelihoods.

3. Minor Concerns

m1. The paper claims in Section 6 that "the two-stage structure of Algorithm 2 offers substantial gains in computational efficiency...further enhancements and systematic theoretical investigation are planned for future research." This constitutes an acknowledged incompleteness in the theoretical treatment of the two-stage variant. A convergence analysis of Algorithm 2, analogous to Theorems 5–6 for Algorithm 1, should be provided, or the gap explicitly bounded.

m2. The notation is occasionally inconsistent. The symbol Ω is used both as a bioreactor volume parameter and implicitly as a system-size scaling term, which is standard but not always clearly distinguished when generalising LNA to LD in Section 5.2 (where the analogy is approximate, not exact).

m3. Figures 4 and 5 include 95% prediction intervals based on 1000 inferred trajectories, but the posterior predictive distribution used to generate these intervals is not described precisely in the main text. The relationship between posterior samples {θ^(b)} and the trajectory predictions warrants explicit clarification.

m4. Table 2 reports ULA "Solved Number" as low as 27/100 macro-replications for H = 16, c = 1, implying 73% divergence rate. The paper attributes this to step-size sensitivity, which is a valid point, but the comparison may be unfair: with appropriate step-size selection (e.g., via dual averaging or adaptive MALA), ULA performance would improve substantially. The experimental design should either use adaptive step-size selection for ULA or justify its exclusion.

m5. The discussion section is brief relative to the technical depth of the paper. Notably, the interpretability claims for pKG-LNA over black-box Gaussian processes (Section 4.2) are asserted but not empirically demonstrated. A comparative experiment illustrating interpretability advantages would strengthen this claim.

m6. The ABC-SMC comparison is relegated to an appendix (EC.8.2), yet represents the most relevant likelihood-free baseline. This comparison should be elevated to the main text.

m7. Several equations in Section 5.3 use notation that is introduced or clarified only in the electronic companion (EC). For a self-contained manuscript, the exponential family parameterisation (used in Theorem 4) should be more fully introduced in the main text.

m8. Citation of Xu and Xie (2024) — described as "preliminary work" — requires careful handling to ensure the current manuscript's contributions are clearly distinguished from the conference paper, particularly regarding the Bayesian updating pKG-LNA metamodel described in Section 4.3.

4. Methodology Critique

The methodological architecture of RAPTOR-GEN is technically sound and internally consistent. The LNA-based metamodel is a well-understood approximation in the systems biology literature (Fearnhead et al. 2014; Golightly and Wilkinson 2005), and the application to sequential Bayesian updating on pKG is a meaningful extension. The generalisation of LNA to the Langevin diffusion process (Section 5.2) is the most novel methodological contribution: replacing step-size discretisation of LD with an ODE-based characterisation of its stationary distribution is conceptually elegant and practically motivated.

However, several methodological concerns arise:

Reproducibility: The paper is generally reproducible in its mathematical development. However, the experimental setup lacks certain details needed for full reproducibility: the specific ODE solver used in Algorithms 1 and 2, numerical precision settings, hardware specifications, and random seed protocols are not reported.

Assumption Verification: Assumptions 1–4 are regularity conditions imposed on the log-posterior and its derivatives. The paper does not verify these assumptions for the specific SRN models studied (enzyme kinetics, prokaryotic autoregulation), beyond asserting their plausibility. Given that these models involve nonlinear Michaelis-Menten and mass-action kinetics, the Lipschitz conditions (Assumption 4) and smoothness conditions (Assumption 2) should be verified or at least discussed in the context of these specific systems.

Unit System Size Assumption: The paper sets Ω := 1 throughout all experiments, corresponding to a unit-volume bioreactor. The theoretical convergence of LNA is asymptotic in Ω → ∞, and the authors acknowledge (Remark 3) that the bound from Grunberg and Del Vecchio (2023) applies to large Ω. Using Ω = 1 means the theoretical guarantees are not directly applicable to the experimental regime. The empirical robustness at Ω = 1, while demonstrated, requires more careful theoretical discussion.

Two-Stage Algorithm Convergence: The convergence of Algorithm 2 (two-stage) is not rigorously analysed. The claim that it achieves equivalent performance to Algorithm 1 is supported only empirically (Table 1) without proof.

5. Data Presentation Evaluation

The manuscript's figures are generally informative and professionally rendered. Figure 2 (framework architecture) provides a useful high-level overview. Figures 3, 4, and 5 effectively illustrate trajectory inference results.

Tables 1, 2, and 3 are appropriately structured with confidence intervals and comparable computational metrics. The use of wall-clock time as a computational measure is practical but hardware-dependent; reporting CPU time or floating-point operations per posterior sample would improve reproducibility.

The "Solved Number" metric in Table 2 is a useful innovation but requires more explicit definition in the table caption — specifically, what constitutes a "solved" versus "unsolved" run and whether unsolved runs are excluded from RMSE calculations (they are, per footnote (ii), which warrants discussion of selection bias).

The violin plots in Figure EC.2 are appropriate for posterior distribution visualisation, though their placement in the appendix reduces their visibility given their informational value.

One significant data presentation concern: the manuscript reports RMSE of σ as high as 16.45 ± 0.77 (Table 1, H = 4) when the true value is σ = 4. This represents a 411% relative error. While the paper frames this as acceptable given extreme data sparsity, no discussion is provided on whether such uncertainty in measurement error estimation has downstream consequences for inference of the mechanistic parameter θ3.

6. Contribution and Novelty Assessment

The manuscript makes contributions across three levels:

Substantial: The generalisation of LNA to Langevin diffusion for posterior sampling (LD-LNA) is genuinely novel and theoretically well-developed. The finite-sample Wasserstein bound in Theorem 4 and its explicit dependence on data sparsity structure (Corollary 2) are meaningful advances over existing BvM-based analysis that typically assumes dense data.

Moderate: The sequential Bayesian updating pKG-LNA metamodel extends existing LNA-based inference (Fearnhead et al. 2014) to heterogeneous, partially observed, multi-scale data. The modular pKG formulation is well-motivated but builds substantially on prior work from the same research group (Zheng et al. 2024; Wang et al. 2024c).

Incremental: The practical algorithm development (Algorithms 1 and 2) represents largely an engineering contribution, largely applying standard ODE discretisation schemes to the theoretical framework. The two-stage decomposition is elegant but not conceptually surprising once the framework is established.

The paper's claimed connection to generative AI (Section 2) via Langevin diffusion is noted but superficial — the connection to DDPM/score-based models is mentioned only to differentiate RAPTOR-GEN, and this framing adds minimal value to the core contribution.

Practical Framework Value: The biomanufacturing application domain is clinically and industrially important, and the paper's framing around digital twin development for flexible bioprocess control is timely. However, without real data validation, the practical relevance remains aspirational.

7. Publication Suitability Scores

Dimension Score (1–10) Originality 7 Methodological rigour 7 Practical relevance 5 Clarity of writing 6 Overall publication readiness 6

8. Editorial Recommendation

Major Revisions

The manuscript presents a technically sound and theoretically innovative framework with clear potential for high-impact publication. However, the absence of real experimental data validation, insufficient scalability characterisation, limited comparison baselines, and underdeveloped practical relevance discussion require substantial revision before acceptance. The theoretical apparatus is strong; the empirical and applicability cases need commensurate strengthening.

9. Revision Roadmap

Mandatory Revisions (Prior to Resubmission):

  1. Real Data Validation: Include at least one case study using publicly available or proprietary experimental biomanufacturing data (e.g., published CHO cell culture datasets, iPSC expansion data). Quantify inference performance against known or partially known ground truth.

  2. Scalability Analysis: Conduct empirical experiments with Nθ ≥ 20 and Nθ ≥ 50 parameters. Report how RMSE, λ_max, and computational cost scale with dimensionality. Discuss the regime where LD-LNA degrades, and the MALA corrector (EC.7) becomes necessary.

  3. Prior Sensitivity Analysis: Conduct sensitivity experiments across at least three prior specifications for each benchmark case. Provide practical guidance on prior elicitation from domain knowledge.

  4. Expanded Baseline Comparisons: Include comparison against at least one of: particle MCMC, NUTS/HMC, or a variational inference baseline. Move the ABC-SMC comparison to the main text.

  5. Convergence Analysis for Algorithm 2: Either prove convergence guarantees for the two-stage algorithm analogous to Theorems 5–6, or provide a formal bound on the gap between Algorithm 1 and Algorithm 2 outputs.

  6. LNA Approximation Quality Characterisation: Provide a systematic analysis of how LNA approximation error depends on observation interval length and system nonlinearity. Identify operating conditions under which sequential updating is sufficient versus insufficient.

Recommended Revisions:

  1. Assumption Verification: Verify Assumptions 1–4 analytically or numerically for the SRN models studied.

  2. Non-Gaussian Observation Model: Discuss the framework's extension or sensitivity to non-Gaussian measurement error distributions.

  3. Clarify Trajectory Prediction Methodology: Add a precise description of how posterior samples are propagated to trajectory predictions in Figures 4 and 5.

  4. Distinguish from Preliminary Conference Paper: Add a formal statement explicitly delineating contributions beyond Xu and Xie (2024).

  5. Improve Writing Clarity: Sections 5.2–5.3 in particular benefit from additional expository text between mathematical derivations. The Introduction should more precisely state what is proven versus what is demonstrated empirically.

  6. Hardware/Reproducibility Details: Report computational hardware, ODE solver specifications, and random seed protocols to enable reproducibility.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.