Comments
Write a commentNo comments have been published yet.
Short Summary
In this preprint, Iles et al. investigate whether the hatching status of blastocysts affects the information content of spent culture media (SBM) as measured by MALDI‐TOF mass spectrometry and used in machine‐learning (ML) models to predict implantation. They retrospectively analyzed 570 day‑5 embryo culture media samples (283 hatched, 287 unhatched) with known implantation outcomes. Using their custom spectral‐pattern ML pipeline (EvA-3), the authors report that hatched embryos contributed 44 discriminative mass‐spectral features (versus 12 in unhatched) to the implantation‐prediction model. The ML model trained on hatched‐embryo data achieved a substantially higher area under the ROC curve than the unhatched model. In other words, hatched embryos yielded richer, more informative metabolomic profiles, leading to better discrimination of eventual implanting versus non‐implanting embryos. The authors conclude that mass‐spectral profiling of SBM is more powerful when embryos are allowed (or assisted) to hatch, and they propose that transferring hatched blastocysts could improve noninvasive embryo selection accuracy. This work extends prior efforts using MALDI‐ToF secretome profiling for embryo selection by highlighting hatching as a critical pre-transfer variable.
The study’s strengths include a relatively large cohort of prospectively collected clinical samples from a single IVF center (final N=570), and a clear focus on a novel question: how procedural variables like hatching status influence SBM metabolomic signatures. The analytical pipeline (MALDI-TOF with Savitzky–Golay smoothing, peak extraction, and the EvA-3 scoring algorithm) is described in detail. The approach is timely given the field’s interest in noninvasive embryo selection methods.
Major Issues
Data selection and transparency of modeling. It is unclear why only 286 of the 570 eligible samples (210 hatched, 76 unhatched) were used to build the classification models. The manuscript states that 570 media samples met inclusion criteria, but Table 1 shows only 210 hatched and 76 unhatched samples in the predictive models. The fate of the remaining samples (approximately 284 total) is not explained. Were they held out for validation, filtered by quality, or excluded for other reasons? The absence of a clear description of data splitting (training vs. test or cross-validation) raises concerns about potential selection bias and overfitting. In particular, no independent test set or external validation is reported, so the high AUC values may reflect optimistic training‐set performance. The authors should clarify the workflow (whether they used cross-validation or withheld a test cohort) and report any validation procedure. The current description (“classification models were generated using EvA-3”) implies model training on all available data without held-out testing, which limits confidence in generalizability.
Machine learning methodology and statistical evaluation. The EvA-3 algorithm is not a standard method (it is an evolutionary algorithm previously used by these authors). As described, EvA-3 selects peaks via iterative Wilcoxon tests and enrichment metrics. However, the manuscript does not benchmark this approach against simpler models (logistic regression, random forest) or report uncertainty measures (confidence intervals or p-values) for AUC differences. The reported AUCs should be tested for statistical significance given the limited sample sizes. No measures of variability (95% CI) are provided, and it is not stated how the ROC curves were constructed. Likewise, the chosen probability thresholds (60% for hatched, 30% for unhatched for >95% discrimination) appear to be fitted on the same data and may be over-optimistic.
Biological interpretation and potential confounders. The core biological claim is that hatching releases new molecules into the media (breaking the zona pellucida), enriching the secretome. This is plausible and supported by prior work (detection of larger proteins only after hatching). However, an alternative explanation is that embryos which hatch early (by day 5) may simply be more developmentally advanced or inherently more viable. In this dataset, it is not reported whether hatched embryos had different baseline implantation rates than unhatched. (Table 1 suggests ~46% implanted in hatched and ~49% in unhatched, which is counterintuitive if hatched embryos are assumed “better”, this discrepancy deserves discussion.) If hatching correlates with blastocyst expansion or grade, then the observed spectral differences might partly reflect developmental stage or quality differences, not just ZP permeability. The manuscript would benefit from analysis of potential confounders, were patient characteristics, culture times, or blastocyst grades similar between groups? Also, “assisted hatching” (AH) is mentioned in the Introduction but not clarified for this study. It should be specified whether these embryos underwent mechanical or laser AH, or if hatching was purely spontaneous. Assisted hatching protocols could themselves alter the secretome, and if used non-randomly (for older patients), could bias results.
Claims about clinical superiority vs. standard methods. The Discussion states that both hatched and unhatched models “show a much higher discrimination… compared to the currently used embryo quality grading” and suggests these ML predictions could supersede current IVF practice. However, no data are shown directly comparing the SBM models to standard morphology scores (Gardner grading or “BQS”). The text even remarks that a 60% probability threshold “supports applicability… superior to currently used” criteria. This seems speculative: unless the authors provide head-to-head metrics (AUC or implantation rates) for standard grading in their cohort, such claims go beyond the presented evidence.
Reproducibility and conflicts of interest. The paper lacks any statement about data or code availability. The analysis relies on custom Python code and a proprietary EvA-3 algorithm. As noted, EvA-3 is a specialized “evolutionary” ML pipeline developed by Bioenhancer Systems. For reproducibility, it is essential to know whether the spectral-processing code or EvA-3 implementation can be accessed (the manuscript cites only published descriptions, not a repository). Additionally, two authors have commercial ties to Embryomic Ltd. (the CSO and a medical advisor) and hold related patents. This conflict of interest is disclosed in the preprint, but it underscores the need for transparent data sharing and independent validation.
Minor Issues
Clarity and editorial details. The manuscript would benefit from careful editing to improve readability. There are a few grammatical errors (“A number of extracted spectral features… were significantly higher (a total of 44 features, compared to 12 of unhatched embryo profiles) for the predictive model” the sentence is confusingly punctuated). Phrases like “morphometry/ morphometrics have been applied” (Introduction) are slightly awkward. Also, the abstract has a stray comma (“embryos.A number”) and missing space after a period. Consistent terminology should be used, the text alternates between “unhatched” and “un-hatched” (Figure 3 caption). In Figure 3 itself, the caption is clear but the color labels (“peak intensity” vs “peak presence”) could be more distinct for colorblind readers (one bar set is red, one is blue). Table 1 labels “Nº pattern features” and “Nº samples” might confuse some readers; consider writing out “Number” for clarity.
Methodological details. A few protocol details could be elaborated. The criterion for spectral quality (“minimum of seven peaks with S/N >5”) is reasonable, but how many samples were rejected by this filter? Also, the justification for choosing 379 m/z as alignment reference (empirical “prevalence” of a peak) is given, but the raw spectra or example quality plots would help assess consistency. In the Data Modeling (section 2.4), the definition of the EvA‐3 score and the logistic calibration is terse. It would aid readers to include the actual logistic equation or clarify “L, K, C” parameters in the supplementary (these are mentioned but not shown). Finally, the authors mention exclusion of multiple pregnancies and spontaneous terminations, please state explicitly how many cases were excluded for these reasons to verify the final N=570.
Statistical reporting. The performance metrics (sensitivity, specificity, AUC) are given in Table 1 but without confidence intervals. Even though the primary focus is AUC, it is customary to report 95% CIs for AUC or to perform statistical tests for differences. Reporting p-values or CIs would strengthen the claims of “significant” differences. The logistic calibration curves (Figure 4) are referenced but not shown in the manuscript PDF, ensure that all figures are clear and referenced correctly. Relatedly, when stating fold‐changes of peak intensities (up to three orders of magnitude), clarify whether these are log10 units; the log scale on the left axis of Figure 3 implies so, but explicit labeling would help.
Figure presentation. As a minor point, Figure 3 (above) is informative but somewhat cluttered. The log-scale y-axis labels (10^3, 10^2, 10^1, etc.) could be formatted in scientific notation for consistency. In the text, it may help to refer to “Figure 3a” and “3b” for clarity (hatched vs unhatched panels), although the caption already distinguishes them. The captions should be self-contained, explain what “controls” means in “comparison to controls” (presumably non-implanted embryos).
The author declares that they have no competing interests.
No comments have been published yet.