- Does the introduction explain the objective of the research presented in the preprint?
- Yes
- The introduction clearly states the three prediction targets (LOS, in-hospital mortality, 60-day mortality), justifies the clinical need, situates the work relative to prior literature, and specifies the N3C dataset.
- Are the methods well-suited for this research?
- Neither appropriate nor inappropriate
- The overall framework (retrospective cohort, multiple ML models, SMOTE, TRIPOD reporting) is reasonable. However, the inclusion of remdesivir as a predictor without verifying temporal ordering, the absence of threshold optimization, and the lack of temporal validation are meaningful deviations from best practices.
- Are the conclusions supported by the data?
- Somewhat supported
- The mortality findings and the SMOTE tradeoff conclusion are well-supported. The LOS conclusion is also credible. However, the claim that these models could inform pandemic preparedness is difficult to support given the temporal pooling across five years and the lack of external validation.
- Are the data presentations, including visualizations, well-suited to represent the data?
- Highly appropriate and clear
- The ROC curves, SHAP beeswarm plots, and tables are appropriate choices for this type of ML study and are generally readable.
- How clearly do the authors discuss, explain, and interpret their findings and potential next steps for the research?
- Somewhat clearly
- The discussion is one of the stronger sections, as the authors engage honestly with the SMOTE tradeoff, the LOS null finding, and the remdesivir confounding issue. However, the temporal pooling limitation is acknowledged but not fully explored, and the practical translation of AUROC values into clinical terms (what 0.72 means at the bedside) is absent.
- Is the preprint likely to advance academic knowledge?
- Somewhat likely
- The empirical SMOTE tradeoff finding and the honest documentation of the structured EHR data ceiling are genuine contributions to clinical ML methodology. The N3C cohort characterization by remdesivir exposure also adds value for future causal inference work.
- Would it benefit from language editing?
- No
- The writing is clear, precise, and well-organized throughout.
- Would you recommend this preprint to others?
- Yes, but it needs to be improved
- The infrastructure, transparency, and methodological honesty make it worth reading, but the remdesivir temporal issue and temporal pooling concern need to be addressed before the conclusions can be fully trusted.
- Is it ready for attention from an editor, publisher or broader audience?
- No, it needs a major revision
- A sensitivity analysis excluding remdesivir, some form of temporal validation, and subgroup performance reporting by race/ethnicity are needed before this is ready for publication. These are addressable revisions, not fatal flaws, but they are major enough that the current version should not go forward as it is.
Competing interests
The author declares that they have no competing interests.
Use of Artificial Intelligence (AI)
The author declares that they did not use generative AI to come up with new ideas for their review.