PREreview of Pathogenwatch: A public health platform for rapid interpretation of pathogen genomics

Published: May 16, 2026
DOI: 10.5281/zenodo.20237041
License: CC BY 4.0

Summary

This manuscript describes Pathogenwatch as a public-health genomic surveillance platform for rapid interpretation of bacterial, viral, and fungal genomes. The platform integrates species identification, MLST/cgMLST, lineage or variant assignment, AMR and virulence marker detection, contextual comparison with public genomes, and interactive visualization. The authors report broad global adoption, with over 14,000 registered users across 165 countries, and substantial 2025 usage. They also present example evaluations using SARS-CoV-2 lineage assignment and a global Staphylococcus aureus ST239 dataset.

This work moves the field forward by describing an important piece of practical genomic surveillance infrastructure. The manuscript is especially relevant for laboratories and public health teams that need interpretable outputs without maintaining complex local bioinformatics systems.

Major Issues

The evidence presented does not fully support the breadth of the platform claims. The manuscript describes many supported organisms and workflows, but validation is shown mainly through SARS-CoV-2 lineage concordance and a S. aureus ST239 example. Please add systematic benchmarking across representative bacterial, viral, and fungal workflows, including speciation, cgMLST, AMR prediction, virulence detection, and contextual clustering.
The SARS-CoV-2 benchmark is useful but small. Complete concordance across 16 VOC/VOI and 39 non-VOC/VOI genomes does not by itself demonstrate robust platform-wide performance. A larger and more diverse benchmark, including low-quality genomes, mixed-quality assemblies, and recent lineages, would strengthen this section.
Reproducibility needs clearer treatment. Because Pathogenwatch is continuously deployed and uses updated containers, databases, and nomenclatures, results may change over time. Please provide workflow/container versions, database versions, benchmark input accessions, output snapshots, and commit hashes for the analyses shown.
AMR and virulence prediction claims should be more carefully validated. The manuscript describes multiple AMR tools and drug-class summaries, but does not present systematic comparison with phenotypic AST, curated truth sets, or published AMR benchmarks. Please clarify which outputs are surveillance-supportive versus clinically actionable.
The hclink/context search approach needs more explanation for public health interpretation. Single-linkage clustering can be sensitive to threshold choice and chaining. Please provide organism-specific guidance, default thresholds, sensitivity analyses, and warnings about overinterpreting genetic relatedness as direct transmission.
Public reference curation is central to the platform, but the manuscript gives limited detail in the main text. Please clarify inclusion/exclusion criteria, deduplication, metadata requirements, quality filters, update frequency, and how geographic or temporal sampling biases affect contextual interpretation.
The manuscript would benefit from a dedicated limitations section. Important limitations include dependence on public database quality, uneven global sampling, changing nomenclatures, assembly quality variation, privacy constraints, AMR prediction uncertainty, and the risk of users overinterpreting automated outputs.

Minor Issues

Reconcile genome/reference counts: the abstract mentions over 875,000 curated public bacterial genomes, while the results mention over 1,759,554 bacterial genomes.
Clarify species coverage: the abstract states MLST for more than 37 bacterial species, while the introduction mentions MLST assignment for over a hundred species.
Add a code availability statement with links to the relevant repositories, not only references embedded in the bibliography.
Table 1 is useful but dense. Consider separating bacteria, viruses, and fungi or adding a clearer legend for tool abbreviations.
The phrase “replicates analysis results of complex bioinformatics pipelines” is vague. Specify which pipelines and what comparison metrics were used.
Please define “Pathogenwatch Local” more clearly, including installation requirements, supported workflows, and whether outputs are identical to cloud deployment.
Some wording and formatting need polishing, e.g. “2.4 million of pathogen genomes,” missing punctuation around “Klebsiella… Neisseria,” and occasional encoding artifacts in the PDF text.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.

Comments

Write a comment

No comments have been published yet.