PREreview estructurada del STRmie-HD enables interruption-aware HTT repeat genotyping and somatic mosaicism profiling across sequencing platforms

por Hansini Upadhyay

Publicado: 16 de mayo de 2026
DOI: 10.5281/zenodo.20241463
Licencia: CC BY 4.0

Does the introduction explain the objective of the research presented in the preprint?: Yes; Explaining the biological and clinical complexity of Huntington’s disease (HD) Highlighting limitations in current sequencing and computational methods Identifying a specific gap (lack of tools that simultaneously handle repeat size, somatic mosaicism, and interruption variants) Explicitly stating the proposed solution: STRmie-HD and its purpose
Are the methods well-suited for this research?: Highly appropriate; The methods are well-aligned with the stated research objective and reflect strong methodological rigor: They directly address the identified gap (simultaneous detection of repeat length, interruption variants, and somatic mosaicism). The per-read parsing approach is appropriate for capturing heterogeneity and mosaicism. The use of a regular expression–based, alignment-free strategy is well-justified given the limitations of reference-based methods for repeat expansions. Inclusion of quantitative indices (EI, II) strengthens downstream interpretability. The ONT-specific handling demonstrates awareness of platform-specific limitations and best practices. The framework is adaptable (ROI filtering, customizable parameters), which enhances robustness. Overall, the methods are thoughtfully designed, technically sound, and clearly tailored to the biological and computational challenges outlined in the introduction.
Are the conclusions supported by the data?: Highly supported; The conclusions are well supported by the data presented in the manuscript: The authors provide extensive benchmarking across four datasets (Illumina, PacBio, ONT, and synthetic), demonstrating consistent performance of STRmie-HD across different sequencing platforms. Quantitative metrics such as MAE, RMSE, and correlation coefficients directly support claims of high accuracy and robustness. The conclusions about superior or comparable performance vs. other tools are backed by explicit comparative results (e.g., ScaleHD, TRGT, RepeatDetector). Claims regarding interruption variant detection are supported by: Orthogonally validated samples, Quantitative read-level percentages, Clear evidence of improved detection over existing tools Biological conclusions (e.g., higher somatic expansion in brain vs blood) are supported by statistical analysis (Kruskal–Wallis test with significant p-values). The discussion appropriately includes limitations and caveats (e.g., dependence on sequencing platform, preprocessing), avoiding overstatement. Overall, the authors do not overreach—their conclusions align closely with the empirical results and are framed appropriately within the scope of the study.
Are the data presentations, including visualizations, well-suited to represent the data?: Somewhat appropriate and clear; The manuscript uses a variety of appropriate visualizations: Histograms for repeat distributions, Scatter plots for correlation with ground truth, Bar plots and tables for performance metrics, Boxplots for biological comparisons (e.g., EI across tissues) Figures are aligned with the type of data. Inclusion of quantitative tables (MAE, RMSE, CI) improves clarity and supports interpretation. Providing raw histograms and outputs as supplementary material supports transparency and reproducibility. Limitations Some visualizations (especially histograms and multi-tool comparisons) may be: Dense or harder to interpret without domain expertise Not fully optimized for quick interpretability by broader audiences Accessibility considerations (like simplified summaries, clearer legends, or visual consistency across figures) could be improved. Heavy reliance on supplementary materials for full interpretation slightly reduces immediate clarity.
How clearly do the authors discuss, explain, and interpret their findings and potential next steps for the research?: Somewhat clearly; The authors provide a clear, logically structured discussion that (1) restates the unmet need, (2) interprets the benchmark results across platforms, (3) highlights what is novel about STRmie-HD (especially interruption-aware, single-read quantification), and (4) outlines practical implications and extensions.
Is the preprint likely to advance academic knowledge?: Highly likely; The preprint makes meaningful and substantive contributions
Would it benefit from language editing?: No
Would you recommend this preprint to others?: Yes, it’s of high quality
Is it ready for attention from an editor, publisher or broader audience?: Yes, after minor changes; The manuscript does not require major rewriting, but professional polishing (clarity, conciseness, flow) would significantly improve readability and impact.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.

Comentarios

Escribir un comentario

No se han publicado comentarios aún.