Avalilação PREreview de Sputum Respiratory Pathogen Genomic Surveillance: A Practical Approach for Long-Read Metagenomic Sequencing

de Syed Adnan Haider

Publicado: 27 de maio de 2026
DOI: 10.5281/zenodo.20417815
Licença: CC BY 4.0

Major Issues

The sample size is very small for the strength of the claims. The pretreatment comparison appears to rely on two sputum samples, and the optimized workflow is reported across six cases. This is appropriate for a pilot optimization study, but not enough to support claims of “robust performance,” “diagnostic accuracy,” or routine clinical/surveillance readiness. The authors should frame the study as a proof-of-concept and substantially soften claims about scalability and clinical implementation.
Diagnostic validation is insufficient. The manuscript reports pathogen detection by mNGS, but it is unclear how each detected organism was confirmed. Orthogonal validation by multiplex PCR, targeted PCR, culture, qPCR, or reference sequencing is needed, especially for strain-level calls and bacterial detections from sputum. Without this, it is difficult to distinguish true infection, colonization, contamination, or database misclassification.
The study lacks sensitivity, specificity, and limit-of-detection assessment. For a diagnostic or surveillance workflow, the authors should include analytical sensitivity using spiked controls or dilution series, specificity using negative sputum controls, reproducibility across replicates, and comparison to standard clinical testing. The current data show feasibility, but not diagnostic performance.
“Unbiased detection” should be qualified. The workflow includes DNase treatment, SMART-9N amplification, 30 PCR cycles, host filtering, assembly, and database-based classification. Each step introduces bias. SMART-9N amplification may distort relative abundance and genome coverage, while DNase and filtration can differentially affect bacteria, DNA viruses, RNA viruses, and free nucleic acid. The authors should describe the workflow as broad-range rather than fully unbiased.
Clinical interpretation of sputum organisms needs caution. Sputum contains oral flora and colonizing organisms. Detections such as Haemophilus parainfluenzae, Pseudomonas spp., and Metamycoplasma salivarium may not necessarily represent causative pathogens. The manuscript should include clinical metadata, comparator diagnostic results, bacterial load thresholds, or criteria for interpreting pathogen relevance.
Strain-level claims need stronger evidence. The manuscript states strain/genotype-level resolution for all six cases, including rhinovirus C1/C42 and human rhinovirus NAT001. The authors should specify the classification thresholds, genome breadth, sequence identity, coverage uniformity, reference database versions, and whether assemblies were phylogenetically validated. Kraken2 classification alone is usually not sufficient for confident strain-level reporting.
Data and code availability are missing or unclear. I did not see a clear data availability statement for raw reads, assemblies, reference databases, or scripts. For a methods paper, reproducibility is central. The authors should deposit sequencing data, provide accession numbers, share command-line parameters, and include processed tables for host read fraction, microbial read counts, coverage, and detected taxa.

Minor Issues

Please report the exact number of samples used in each experiment more clearly in the Results and Methods.
The Results use language such as “significantly improved,” but no statistical testing is presented. Use descriptive language or provide statistical analysis.
Clarify whether replicates were technical aliquots, independent extractions, sequencing replicates, or separate patients.
Figure 2 labels are small and difficult to read. Enlarging labels and adding a table of read proportions would help.
Figure 4 uses different y-axis scales across pathogens. This is acceptable, but the caption should emphasize that visual comparison of depth between panels is not direct.
Add genome breadth/percent coverage alongside depth. High depth in some regions does not necessarily mean near-complete recovery.
Clarify whether DNase I was applied before or after extraction for each workflow, and how RNA viruses, DNA viruses, and bacterial cells are expected to be affected.
The methods say “two host depletion strategies” but list untreated control, DNase I, filtration plus DNase I, and adaptive sampling. Reword for clarity.
The 72-hour sequencing duration may limit rapid diagnostic use. Please report when actionable pathogen calls became available during sequencing.

Consider consistent taxonomy and clinical naming for Mycoplasma/Mycoplasmoides pneumoniae.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.

Comentários

Escrever um comentário

Nenhum comentário foi publicado ainda.