Comentarios
Escribir un comentarioNo se han publicado comentarios aún.
This review is the result of a virtual, collaborative live review discussion organized and hosted by PREreview and JMIR Publications on July 25, 2025. The discussion was joined by 20 people: 2 facilitators from the PREreview Team, 1 member of the JMIR Publications team, and 17 live review participants. The authors of this review have dedicated additional asynchronous time over the course of two weeks to help compose this final report using the notes from the Live Review. We thank all participants who contributed to the discussion and made it possible for us to provide feedback on this preprint.
The analysis of sperm motility is crucial for evaluating male fertility, but traditional methods fall short in capturing the dynamic complexity of sperm motion. In this study, Athanasia Sergounioti et al. utilize dynamic time warping (DTW), phase-space and fatigue metrics, and recurrence quantification analysis (RQA) to develop an unbiased framework for identifying latent patterns in sperm motility, aiming to move beyond traditional metrics.
The study classifies human sperm motility by analyzing raw temporal dynamics using unsupervised clustering, particularly DTW. Analysis of 1,176 sperm tracks (mean silhouette score 0.861) revealed two distinct phenotypes: structured and chaotic-like. Chaotic-like tracks showed higher spectral entropy (4.45 vs. 2.58), fractal index (0.434 vs. 0.079), and Lyapunov approximation (0.131 vs. 0.009; p < 0.001), with a more negative VSL slope, indicating increased fatigue. Recurrence rate did not differ between clusters.
Strengths of the study include its novel, unstructured analytical framework that captures full dynamic variability. However, limitations include the absence of a clearly stated research question, outdated references, lack of demographic information around sperm donors, and failure to test the identified motility phenotypes’ direct impact on fertility—question that may be beyond the scope of the study but that would be important to at least discuss. Despite this, the compelling data support the conclusion that temporal sperm motility signals hold rich, underutilized information. The study offers a significant advancement in sperm dynamics analysis, with potential applications in fertility diagnostics and broader biomedical research. The approach may enhance understanding of motility heterogeneity and provide new tools for assessing sperm endurance and energy status beyond conventional metrics.
General Concern
Adding a “Study Limitations” and a “Future direction/Next steps” sections would help the reader understand how to interpret the findings and inform on how the findings will shape future research.
Concerns with Analysis and Data Visualization
The text states that the fractal index of chaotic-like tracks was higher than that of structured-like tracks. However, according to the box and whiskers plots found in Fig. 3A, it seems that the median fractal index of structured-like tracks is roughly 0.4, while that of chaotic-like tracks is 0.1. Kindly clarify the interpretation of data presented in this figure and make changes in the text where appropriate.
Authors should consider removing data outliers for the sake of data veracity, data normalization, and the readability of the figures. If one were to perform a Robust Outlier (ROUT) test, upper and lower extremes would be eliminated. This would aid in the visualization, readability, and accuracy of many figures.
The data in Figure 5, could be represented more clearly using a different visualization plot, rather than opting for a box and whiskers plot. Given that most VSL values hover around 0, it can be difficult to discern the upper and lower quartiles. A bar chart, a heat map, or some other form of data visualization might help clarify data distribution.
While the data appear to be robust, the absence of both positive and negative controls complicates the differentiation between noise/artifacts and accurate results.
With regard to binary partitioning/binary clustering, the authors should consider exploring alternatives. This approach can oversimplify the heterogeneity of sperm motility and potentially mask biologically meaningful sub-phenotypes.
The translation between in silico modeling to real-world in vitro/in vivo applications is somewhat lacking. For instance, having a cluster number of N=2, indicating solely two distinct motility phenotypes, provides an overly binary view of sperm motility." This oversimplifies natural variation and could inadvertently exclude relevant spermatozoa from the scope of the study.
The reviewers noticed that many references are quite outdated. While it’s possible that not much advancement has been done in the field, it seems unlikely that there are no other more recent relevant references to cite.
Concerns with techniques/analyses
Setting cluster number "n = 2 a priori" assumes that there are only two distinct sperm motility phenotypes. This may potentially oversimplify the true natural variation or heterogeneity in the data. Kindly justify the choice of n = 2. Authors may consider using exploratory analyses such as the elbow method, silhouette scores, gap statistics, or dendrogram inspection to strengthen the analyses.
The analyses also appear to lack biological ground truth labels or external control groups such as comparison between fertile vs. subfertile sperm samples. Without external validation, it is hard to assess whether the "chaotic" group truly represents a difference in functionality or pathological motility. If it is possible, add labeled samples based on fertility outcomes or standard CASA classifications. The authors may consider also benchmarking the clustering results against external annotations or use validated videos labeled by experts.
It is unclear if appropriate statistical tests, e.g., t-test or Mann-Whitney U test, were used to compare trajectory features such as entropy and Lyapunov exponents. Also, there was no mention of correcting for multiple comparison testing. Please further clarify the statistical tests used. If there were comparisons made, apply and report the corrections.
Restricting analysis to only two clusters may mask meaningful biological sub-phenotypes and create an artificial division in sperm motility. For future studies, it may be possible to explore unsupervised techniques that will allow variability of cluster structures like DBSCAN or Gaussian Mixture Models, or consider soft clustering, e.g., fuzzy c-means, to better capture the motility spectrum.
The dataset seems to lack clear experimental controls, such as untreated normozoospermic samples, thus limiting the ability to best interpret the findings. Please include clearly defined control groups in order to provide a reference for interpreting the identified phenotypes.
The implication that sperm motility can be fully captured by two types may not align with existing biological literature, which recognizes a spectrum of motility behaviors. It may help to discuss how the two observed types of motility relate to known categories (e.g., progressive, non-progressive, hyperactivated), and acknowledge the limitations of binary classification.
Details for the Reproducibility of the Study
While the authors assert that the results are reproducible—stating in Section 2.4 ("Technical Reproducibility and Controls") that “results were fully reproducible from plain .py files”, and the libraries used in the analysis were reported (i.e., Tslearn, Scikit-learn, Shap, umap-learn), the current level of detail provided is insufficient to enable full replication of the study.
While the broad pipeline is described, many critical methodological components are missing—particularly around parameter choices, preprocessing steps, feature computation algorithms, and statistical testing. The authors should consider providing a complete breakdown of each step of the pipeline in either the main text or supplementary file. For example, stating how entropy or Lyapunov exponents were calculated, while including the values with justifications.
Figures and Tables
Figure 2: "Warm colors indicate smaller DTW distances" What are warm colors in layman terms? The term may be ambiguous for non-experts. Please, specify the exact colors used e.g., red, orange, yellow.
While the visuals are informative, axis labels, legends, and terminologies can be more reader-friendly.
Conclusions and limitations discussed
Although the data are compelling and appear to support the conclusions, yet, the authors have not connected their findings to actual fertility outcomes or clinical relevance. It’s possible the study aims to explore motility patterns purely from a computational perspective. If so, clearly state that this is a method paper to avoid misinterpretation. If any fertility-related inference is intended, whether preliminarily, the authors should link their findings to established biomarkers such as sperm viability, DNA fragmentation, or pregnancy outcomes. In case these issues should be addressed in future studies, then the authors should acknowledge them as limitations and discuss how future research could involve validating the motility phenotypes with biological data.
There is little information about the demographic characteristics of the sperm donors. Key information like full age ranges, recording times, or health status is missing, making it difficult to support the generalised conclusions of this study. If it is available include this information, or state why it was not possible to access it.
We thank the authors of the preprint for posting their work openly for feedback. We also thank all participants of the Live Review call for their time and for engaging in the lively discussion that generated this review.
Daniela Saderi was a facilitator of this call and one of the organizers. No other competing interests were declared by the reviewers.
No se han publicado comentarios aún.