Comments
Write a commentNo comments have been published yet.
Manuscript Title: Accelerated Medicines Development using a Digital Formulator and a Self-Driving Tableting DataFactory
Manuscript ID: 2503_17411v1 (Preprint)
Review Type: Double-blind peer review simulation for high-impact pharmaceutical sciences journal
This manuscript presents an ambitious and technically sophisticated integrated platform for accelerated pharmaceutical tablet formulation and process development, combining a Digital Formulator — built on a hybrid system of data-driven and mechanistic predictive models — with a Self-Driving Tableting DataFactory employing physics-informed and multi-output Bayesian optimisation. The authors demonstrate the platform across nine case studies spanning six APIs and multiple drug loadings, achieving in-specification tablet development in under six hours using less than five grams of API.
The work represents a substantive and timely contribution to pharmaceutical digital transformation, addressing a well-articulated bottleneck in CMC development. The integration of crystallographic particle informatics descriptors from the Cambridge Structural Database into predictive models for tablet properties is a notable methodological innovation. The closed-loop DMTA workflow, real-time XR-enabled quality monitoring, and documented system benchmarking collectively distinguish this work from prior literature on predictive tablet formulation.
However, the manuscript exhibits several significant methodological limitations that require resolution before publication. These include an insufficiently characterised training dataset and leave-API-out validation strategy, absence of dissolution or disintegration data as quality attributes, limited discussion of regulatory pathway implications and GMP incompatibility, constrained excipient space and fixed lubricant/disintegrant assumptions that may limit generalisability, and incomplete statistical treatment in several key comparisons. The writing is generally clear and technically proficient, though some sections would benefit from structural reorganisation and tighter argumentation. Subject to major revisions, this manuscript has strong potential for publication in journals such as the Journal of Pharmaceutical Sciences, International Journal of Pharmaceutics, or Digital Discovery.
MC1 — Dissolution and Bioavailability Quality Attributes Are Absent
The platform's optimisation objectives are restricted to porosity, tensile strength, and flowability. These are manufacturability surrogates, not therapeutic performance indicators. The absence of dissolution rate, disintegration time, or drug release profiling as either constraints or outputs represents a fundamental gap in the demonstrated capability of the platform. For a system positioned as a material-to-product solution aligned with Quality by Design (QbD) and Quality by Digital Design (QbDD) principles, the omission of biopharmaceutical performance attributes is difficult to justify. The authors do briefly acknowledge the potential impact of MCC on dissolution, but do not integrate this into the optimisation framework. This must be addressed substantively, either by including dissolution data for validation formulations or by explicitly scoping the platform's applicability as limited to manufacturability screening with a clear argument for why this is sufficient in the stated use contexts.
MC2 — Dataset Characterisation and Model Generalisability Are Insufficiently Established
The training dataset of 113 tablet formulations (653 data points) is drawn from historical experimental records involving seven APIs, placebo blends, and varying drug loadings. However, the manuscript does not adequately report the distribution of APIs, excipient combinations, compaction pressures, or drug loadings within the training set. The leave-API-out validation strategy uses only two APIs (SP and GR) as the holdout set, representing a narrow assessment of the model's capacity to generalise to structurally and physicochemically diverse unseen APIs. The manuscript does not report the chemical or mechanical diversity within the training set, nor does it discuss whether the test APIs represent interpolation or extrapolation within the feature space. This is a critical methodological weakness given that the platform's primary value proposition is prediction for novel, previously unseen APIs. The authors should provide a more rigorous characterisation of the training data distribution and employ additional validation APIs, or at minimum, discuss the known limitations of generalisation based on dataset composition.
MC3 — The Excipient Space Is Severely Constrained and May Limit Practical Utility
The in-silico optimisation is restricted to six excipients across three functional categories, with lubricant (MgSt at 3.5%) and disintegrant (CCS at 1%) held constant. The justification for this simplification — that it "reflects standard industry practice" — is inadequate without a systematic sensitivity analysis demonstrating that lubricant and disintegrant concentrations do not materially influence the optimisation outcome. Lubricant type and concentration are well-established determinants of tablet hardness, dissolution, and manufacturing robustness. The absence of alternative lubricants and variable disintegrant concentrations in the optimisation space significantly constrains the platform's scope. This limitation must be more rigorously quantified and its implications for platform generalisability explicitly acknowledged.
MC4 — Two Case Studies (DM 20% and GR 20%) Fail to Meet Flowability Criteria
The authors report that measured FFC values for DM (20% w/w) and GR (20% w/w) formulations fail to meet the required flowability threshold (FFC > 4). While this is disclosed in the text, the manuscript does not adequately investigate the root cause of these failures or explore remedial strategies within the platform framework. Two failed cases in nine represent a failure rate of approximately 22%, which is non-trivial. The explanation offered — measurement uncertainty and conservative model assumptions — requires more rigorous quantitative support. The authors should provide a structured root-cause analysis and discuss whether this failure pattern suggests systematic limitations in model extrapolation to specific API classes.
MC5 — Regulatory Pathway and GMP Readiness Are Inadequately Addressed
The manuscript explicitly acknowledges that the Tableting DataFactory is not GMP-compliant but positions the system for use in early-phase clinical trials, dose titration, and personalised medicine. This juxtaposition requires substantially more regulatory analysis. ICH Q8(R2), Q9, and Q10 frameworks, which underpin the QbDD concept cited in the manuscript, impose specific requirements on design space definition, process validation, and risk management that are not discussed. The authors' claim of alignment with QbDD principles should be substantiated with reference to specific regulatory expectations, and the barriers to translating the described platform into a regulated environment must be examined with greater precision.
mn1 — Blend Homogeneity Assessment Is Qualitative
Hotelling's T² analysis is deployed for blend homogeneity monitoring, but the current implementation is qualitative rather than quantitative. Two outlier iterations (3 and 36) are identified but not investigated or linked to downstream tablet quality outcomes. A quantitative content uniformity acceptance criterion aligned with ICH Q6A or USP <905> would substantially strengthen this section.
mn2 — Inconsistency in Figure Labelling
Figure S9 references panels labelled "(a) (a)" rather than "(a) (b)", suggesting an editorial error in the Supporting Information. Additionally, Figure S12 contains a y-axis label describing tablet porosity as "mg" rather than the dimensionless unit (-), which is a factual error requiring correction.
mn3 — The Term "Quality by Digital Design (QbDD)" Requires Citation and Definition
QbDD is employed as a conceptual framework throughout the manuscript but is not formally defined or attributed to a primary reference. If this is an established framework in the literature, it should be cited; if the authors are proposing it, they should state this explicitly.
mn4 — API Abbreviations Are Not Consistently Defined
Abbreviations SP, AS, DM, GR, IM, and MH are introduced in the Methods section but used throughout the main text prior to their formal definition, creating ambiguity for readers approaching the article linearly.
mn5 — Table 2 Comparison Methodology Is Not Sufficiently Rigorous
The comparison between the Self-Driving Tableting DataFactory and "state-of-the-art conventional methods" in Table 2 would benefit from citation of primary sources for each conventional metric and a discussion of measurement equivalence. Some cells are marked "Not applicable," which, while honest, reduces the analytical value of the comparison.
mn6 — The XR Section Lacks Quantitative Usability or Outcome Data
Section 5 on extended reality integration is largely descriptive. No quantitative data are provided on error detection rates, decision latency, or operator performance improvements attributable to AR/MR. The contribution of this component to quality outcomes should be substantiated or framed explicitly as a qualitative proof-of-concept with appropriate hedging.
mn7 — Cross-reference Errors Exist in the Manuscript
Section 4.1 references "Figure 4 in Section 3," but the figure described appears to be Figure 4 in Section 2. The Methods section references equation numbers (Eq. 3 and Eq. 4) that correspond to Kawakita and Ryshkewitch-Duckworth models but are numbered inconsistently relative to Equation 1 in Section 7.2.2, suggesting equation numbering was not harmonised across sections.
The methodological architecture of this manuscript is ambitious and, in large part, technically sound, but several aspects require more rigorous treatment.
Hybrid Modelling System: The two-stage hybrid modelling approach — mixture models feeding into process models — is conceptually well-founded and builds on an existing published framework by the same group. The use of an ensemble of 20 DNNs to estimate predictive uncertainty is appropriate and well-motivated. However, the architecture description (two hidden layers, 128 units per layer, ReLU activations) suggests a relatively shallow network, and no hyperparameter optimisation procedure or architecture search is reported. It is unclear whether this architecture was selected empirically, by grid search, or by domain knowledge. The manuscript should report the hyperparameter selection methodology and any associated sensitivity analysis.
Leave-API-Out Validation: The leave-API-out approach is the correct methodology for assessing generalisation to novel APIs, but its implementation is underpowered. With only two holdout APIs (SP and GR), one of which (SP) constitutes a substantial fraction of the training data at lower drug loadings and one of which (GR) failed FFC validation, the assessment of generalisation capability is limited. The test R² values (0.95 for porosity, 0.90 for tensile strength, with CSD descriptors) are encouraging but should be interpreted cautiously given this limited test set composition.
PIBO Convergence and Termination Criteria: The physics-informed Bayesian optimisation convergence criterion — less than 20% change in tuning parameters across two consecutive iterations — is a user-defined heuristic. The sensitivity of final predictions to this threshold is not evaluated. It is plausible that premature termination could yield suboptimal calibration in cases with non-monotonic convergence behaviour. The authors should present a sensitivity analysis or at minimum show that the convergence criterion produces stable results across the case studies.
MOBO Sample Efficiency: The MOBO framework requires 40 total experiments (15 LHS initial DoE plus 25 BO iterations) to characterise a three-dimensional input space with three outputs. This is a substantial experimental burden relative to a classical Design of Experiments approach for the same dimensionality. The authors should benchmark this against a full factorial or central composite design at the same experimental budget, particularly given their stated emphasis on resource efficiency.
NIR-based Blend Homogeneity: The NIR homogeneity method relies on Hotelling's T² monitoring using a PCA model trained on blend spectra. The method is qualitative in its current implementation — no quantitative content uniformity prediction is performed in real-time, and outlier events are not acted upon in the described workflow. The content uniformity validation study in Section 3.7.1 is methodologically sound and the PLS model performance (R² = 0.92 for DS-corrected spectra) is acceptable, but this work is presented in Supporting Information rather than the main text. Given that content uniformity is a critical quality attribute in pharmaceutical manufacturing, this should be elevated to the main manuscript.
Reproducibility: The experimental platform involves bespoke 3D-printed components, custom robotic programming, and proprietary communication protocols. While individual instruments are commercially available, the integrated system as described would be difficult to reproduce without substantial technical infrastructure. The manuscript would benefit from a data availability and platform reproducibility statement, and potentially from release of the SCU code framework or sufficient documentation to enable replication.
The manuscript's figures are generally well-constructed and informationally dense. Figure 1 provides an effective high-level overview of the integrated workflow. Figure 3's scatter plots with overlaid standard deviation colour bars are an effective means of communicating prediction uncertainty. Figure 4's heatmap representation of optimised excipient compositions across APIs is intuitive and interpretively valuable.
However, several data presentation issues require attention. Figure 7 presents normalised tuning parameter trajectories for all nine case studies within four subplots, resulting in considerable visual complexity and overlapping traces that are difficult to distinguish. Disaggregating these across multiple panels or presenting summary statistics (mean convergence profile with confidence interval) would improve readability. Figure 8's heatmaps are small and the axis annotations are rendered at a scale that may be illegible in print.
Table 1 effectively presents prediction errors with directional colour coding, but the legend for the colour scale could be more precisely defined. The distinction between Prediction Error 1 and Prediction Error 2 is well-conceived and represents a genuinely useful framing of sequential model refinement.
The Supporting Information is extensive and well-organised, and the inclusion of calibrated versus initial compressibility and compactability profiles (Figure S16) provides important context for interpreting the PIBO performance. The decision to relegate content uniformity validation data (Figure S13) to Supporting Information should be reconsidered.
The absence of time-resolved dissolution profiles, disintegration data, or any biopharmaceutical performance data constitutes the most significant gap in the data presented.
The novelty and contribution of this work are substantive and merit recognition on multiple dimensions. The integration of crystallographic and particle informatics descriptors from the Cambridge Structural Database as inputs to tablet property prediction represents a meaningful methodological advance that, to the reviewer's knowledge, has not been systematically demonstrated in this context. The full-stack integration of in-silico formulation optimisation with a closed-loop, autonomous physical manufacturing and testing system — as demonstrated across multiple APIs — advances the self-driving laboratory concept into a domain where it has been largely aspirational.
The platform's performance metrics are compelling: sub-six-hour formulation-to-tablet development timelines with less than five grams of API substantially reduce the resource burden of early CMC development. The PIBO framework's convergence in five to six iterations demonstrates genuine sample efficiency in process optimisation.
The XR integration, while qualitatively novel in its pharmaceutical manufacturing application, is the weakest component in terms of demonstrated contribution, as no quantitative benefit to quality outcomes is documented.
The contribution is assessed as moderate-to-substantial for the pharmaceutical sciences and digital manufacturing fields. The platform represents an integrated system-level advance rather than a single isolated methodological innovation, and its practical relevance to industry is clear. Some components — particularly the hybrid modelling approach — build directly on prior published work from the same group, and the incremental contribution of individual subsystems over the prior art should be more precisely delineated.
DimensionScore (1–10)Originality8Methodological Rigor6Practical Relevance9Clarity of Writing7Overall Publication Readiness6
Major Revisions
The manuscript presents work of genuine scientific significance and practical relevance, and the core experimental programme is well-executed. However, the absence of dissolution data as a quality attribute, insufficient validation dataset characterisation, unresolved failure cases, limited regulatory analysis, and several methodological gaps require substantive revision before the manuscript can be recommended for publication. The authors are encouraged to address the concerns enumerated below systematically. Upon satisfactory revision, the manuscript is likely to be suitable for publication in a high-impact pharmaceutical sciences or pharmaceutical engineering journal.
The following structured revisions are recommended, listed in approximate order of priority:
Priority 1 — Biopharmaceutical Quality Attributes Include dissolution or disintegration data for at least a representative subset of the validated formulations. If dissolution testing is outside the current platform scope, provide a clearly reasoned scientific justification and explicitly scope the platform's utility as a manufacturability screening tool, with a forward-looking statement on integration of dissolution prediction.
Priority 2 — Training Dataset Characterisation Provide a comprehensive characterisation of the training dataset including API identity distribution, drug loading range distribution, excipient space coverage, and compaction pressure distribution. Report chemical diversity metrics or fingerprint-based similarity analyses to contextualise the generalisation assessment.
Priority 3 — Expanded Leave-API-Out Validation If additional API data are accessible, expand the holdout validation set. If not, provide a principled analysis of the structural and physicochemical distance between training APIs and test APIs using appropriate descriptors, to contextualise the generalisability claims.
Priority 4 — Root Cause Analysis for DM and GR Failure Cases: Provide a structured root cause analysis for the two FFC validation failures. Assess whether these reflect model extrapolation limitations, measurement artefacts, or formulation design constraints, and discuss implications for platform reliability.
Priority 5 — Lubricant and Disintegrant Sensitivity Analysis: Conduct and report a sensitivity analysis evaluating the impact of lubricant concentration and type (and disintegrant concentration) on porosity, tensile strength, and flowability predictions. This is necessary to justify the simplification of holding these parameters constant.
Priority 6 — Regulatory Framework Alignment: Expand the regulatory implications section to address specifically ICH Q8(R2) design space requirements, Q9 risk management considerations, and the pathway from proof-of-concept to regulated early-phase manufacturing. Clarify what additional development would be required to bring the platform into a GMP-aligned operating model.
Priority 7 — PIBO Convergence Sensitivity: Report a sensitivity analysis of the PIBO convergence threshold (20% change criterion), demonstrating that final predictions are robust to the choice of this hyperparameter.
Priority 8 — DNN Architecture Justification: Report the methodology used to select the DNN architecture (layer count, unit count, activation function) and whether hyperparameter optimisation was performed.
Priority 9 — Content Uniformity to Main Text: Elevate the content uniformity validation study (Section 3.7.1 of Supporting Information) to the main manuscript, as this constitutes a critical quality attribute validation and should not be relegated to supplementary material.
Priority 10 — Quantitative XR Evaluation: Either provide quantitative outcome data (error detection rates, decision latency, operator performance) for the XR section or reframe this component explicitly as a qualitative proof-of-concept demonstration, with appropriate hedging language that avoids overstating its demonstrated contribution.
Priority 11 — Editorial and Formatting Corrections: Correct the figure label error in Figure S9 (duplicate "(a)" panel labels), the unit error in Figure S12 (porosity axis labelled "mg"), equation numbering inconsistencies, API abbreviation placement, and cross-reference errors in Section 4.1.
Priority 12 — QbDD Formal Definition: Define "Quality by Digital Design" explicitly in the manuscript and provide appropriate primary citations, or assert that this is a novel framing introduced by the authors.
The authors declare that they have no competing interests.
The authors declare that they did not use generative AI to come up with new ideas for their review.
No comments have been published yet.