Skip to PREreview
Requested PREreview

PREreview of Paired plus-minus sequencing is an ultra-high throughput and accurate method for dual strand sequencing of DNA molecules

Published
DOI
10.5281/zenodo.20046396
License
CC0 1.0

“Paired plus-minus sequencing is an ultra-high throughput and accurate method for dual strand sequencing of DNA molecules”

Short summary of the research and contribution to the field

This preprint introduces paired plus-minus sequencing (ppmSeq), an ultra-high-throughput duplex sequencing approach designed to improve accurate detection of very low-frequency single-nucleotide variants (SNVs). The central innovation is that both strands of a double-stranded DNA molecule are partitioned and clonally amplified on sequencing beads through emulsion PCR, allowing both strands to contribute to a single sequencing read. This design aims to overcome a major limitation of conventional duplex sequencing: low duplex recovery and the need for extensive over-sequencing.

The authors benchmark ppmSeq against existing duplex sequencing methods and report substantially higher duplex recovery, low residual SNV error rates in genomic DNA and cfDNA, and potential utility for high-sensitivity ctDNA detection in cancer monitoring. The work is important because accurate detection of rare variants is central to liquid biopsy, minimal residual disease detection, somatic mosaicism, early cancer detection, and other applications where technical sequencing errors can easily be mistaken for biological signal.

Overall, this study moves the field forward by proposing a potentially more scalable, cost-efficient, and high-fidelity duplex sequencing strategy that may expand the practical use of error-corrected whole-genome sequencing in clinical and translational genomics.

Positive feedback / strengths

  1. Important technical problem addressed. The study targets a major bottleneck in ultra-accurate sequencing: distinguishing true low-frequency SNVs from sequencing, PCR, and DNA-damage artifacts. This is highly relevant for cfDNA, ctDNA, somatic mosaicism, and tumor-informed or tumor-naive cancer detection.

  2. Potentially meaningful improvement in duplex recovery. The reported duplex recovery of approximately 44% compared with roughly 5–11% for existing leading duplex technologies is a major claimed advantage. If validated broadly, this could meaningfully reduce over-sequencing requirements and improve the feasibility of error-corrected WGS.

  3. Broad input range and sample-type relevance. Testing across genomic DNA and cfDNA, including low DNA input ranges, increases the relevance of the method for real clinical and translational samples where input quantity is often limited.

  4. Clinically meaningful application area. Demonstrating ctDNA detection at very low allele fractions, including tumor-informed and tumor-naive analyses, makes the study more impactful than a purely technical benchmarking paper.

  5. Strong conceptual bridge between sequencing chemistry and clinical genomics. The work connects a molecular sequencing innovation to practical problems in cancer monitoring, mutation-signature detection, and high-fidelity whole-genome sequencing.

Major issues

1. Benchmarking against existing duplex technologies needs full transparency

The comparison to existing duplex sequencing methods is central to the paper’s impact. The authors should provide enough methodological detail to ensure the benchmarking is fair and reproducible.

Suggested additions:

  • List the comparator duplex methods used.

  • Clarify whether comparator methods were performed in-house or based on published datasets.

  • Provide library input amounts, sequencing depth, capture/recovery metrics, duplicate rates, usable read fractions, and cost-per-informative-base comparisons.

  • Explain whether all methods were compared using the same sample type, DNA quality, sequencing platform, and analysis thresholds.

Without these details, the reported improvement in duplex recovery may be difficult for readers to interpret.

2. More detail is needed on the ppmSeq molecular workflow

The core innovation depends on how both DNA strands are partitioned, clonally amplified, and represented in a single sequencing read. Readers would benefit from a clearer workflow-level explanation.

Suggested additions:

  • Include a detailed schematic of the molecular workflow.

  • Explain how plus and minus strands are linked bioinformatically or physically during bead-based amplification.

  • Clarify how strand imbalance, incomplete strand recovery, or bead-level amplification bias is handled.

  • Describe failure modes: single-strand recovery, mixed molecules, bead multiplets, chimeric amplification, or strand dropout.

This will make the technology easier to evaluate and reproduce.

3. Error-rate claims should include confidence intervals and context-specific breakdowns

The reported residual SNV error rates are impressive, but the manuscript should clarify how these were calculated and how stable they are across sequence contexts and sample types.

Suggested additions:

  • Provide confidence intervals around error-rate estimates.

  • Break down errors by substitution class, trinucleotide context, GC content, genomic region, and read position.

  • Report whether error rates differ between gDNA and cfDNA, low-input and high-input samples, or different library-preparation protocols.

  • Clarify how oxidative damage, deamination, end-repair artifacts, and PCR errors were modeled or filtered.

This is especially important because very low error rates can be influenced by denominator size, filtering strategy, and callable-genome definitions.

4. The clinical ctDNA analysis needs clearer cohort and validation details

The ctDNA application is exciting, but the preprint should better separate technical feasibility from clinical performance.

Suggested additions:

  • Provide cohort size, cancer types, treatment status, disease stage, sample timing, and plasma input amounts.

  • Clarify whether tumor-informed analyses used matched tumor sequencing, matched normal sequencing, or both.

  • Report sensitivity, specificity, positive predictive value, and negative predictive value where applicable.

  • Include orthogonal comparison to existing ctDNA methods, imaging, clinical progression, or known tumor burden.

  • Clarify how clonal hematopoiesis variants were removed, especially in plasma cfDNA.

Because the method is proposed for disease monitoring, clinical validation details are essential.

5. Tumor-naive ctDNA detection needs careful control for confounders

The tumor-naive analysis using trinucleotide mutation signatures is conceptually powerful, but mutation-signature approaches may be affected by age, clonal hematopoiesis, smoking exposure, treatment history, inflammation, and background somatic mosaicism.

Suggested additions:

  • Include matched white blood cell sequencing where possible.

  • Describe how clonal hematopoiesis and non-tumor somatic mutations were filtered.

  • Clarify whether tumor-naive signals were tested in non-cancer controls and disease controls.

  • Provide performance metrics by cancer type and mutation burden.

  • Discuss whether treatment-related signatures, such as platinum exposure, could create ambiguity in interpretation.

This would help readers understand how specific the tumor-naive signal is for cancer-derived cfDNA.

6. Quantitative claims should be separated from detection claims

The abstract suggests that ppmSeq can detect ctDNA at very low concentrations and that signal correlates with imaging-based disease metrics. This is promising, but quantitative use requires additional validation.

Suggested additions:

  • Clarify whether ppmSeq is intended as a quantitative assay or primarily a detection method.

  • Report linearity, precision, and reproducibility across allele fractions.

  • Include replicate testing at low allele frequencies.

  • Show intra-run, inter-run, and inter-operator variability.

  • Define the lower limit of detection and lower limit of quantification separately, if applicable.

This distinction is important for clinical monitoring applications.

7. Scalability and cost-efficiency claims need supporting data

The title and abstract emphasize ultra-high throughput and cost efficiency. These claims would be stronger with practical implementation data.

Suggested additions:

  • Provide estimated cost per sample, cost per genome, and cost per error-corrected base.

  • Report hands-on time, turnaround time, batch size, sequencing depth requirements, and computational requirements.

  • Compare ppmSeq resource requirements with standard duplex sequencing and conventional WGS.

  • Discuss whether the method requires specialized equipment, reagents, or proprietary analysis software.

This would help laboratories assess real-world adoption feasibility.

Minor issues

  1. Define ppmSeq clearly at first use. The term “paired plus-minus sequencing” should be explained in simple workflow language early in the manuscript.

  2. Clarify terminology around duplex yield. The manuscript should define exactly how “duplex recovery” or “duplex yield” is calculated and whether it refers to original molecules, read pairs, callable molecules, or consensus molecules.

  3. Improve figure readability. A single end-to-end figure showing DNA input → strand partitioning → bead amplification → sequencing → consensus/error correction → variant calling would be very helpful.

  4. Add a table comparing ppmSeq with existing methods. A concise table comparing input requirement, duplex recovery, sequencing depth, error rate, cost, turnaround time, and limitations would strengthen the manuscript.

  5. Clarify performance across DNA input amounts. The abstract mentions 1.8–98 ng input. It would help to show whether duplex recovery, uniformity, and error rates remain stable across this full range.

  6. Clarify whether indels and structural variants are supported. The abstract focuses mainly on SNVs. The authors should state whether ppmSeq is currently optimized only for SNVs or whether it can also support indels, copy-number changes, rearrangements, or fragmentomic features.

  7. Describe bioinformatics pipeline availability. If custom software is required, the authors should indicate whether code, parameters, test datasets, and documentation will be made publicly available.

  8. Include limitations explicitly. A clear limitations section should discuss bead-based amplification artifacts, cfDNA fragmentation constraints, high-depth sequencing requirements, tumor-type variability, and potential barriers to clinical implementation.

  9. Clarify regulatory/clinical status. If the technology is currently research-use only, the manuscript should state that clearly and avoid overextending clinical claims.

  10. Improve explanation of dideoxy end-repair protocol. The protocol associated with the lowest gDNA error rate should be described clearly enough for readers to understand whether that performance is broadly achievable or protocol-specific.

Overall assessment

This is a technically innovative and potentially important preprint. The proposed ppmSeq method addresses a major challenge in high-fidelity sequencing: improving duplex recovery while maintaining very low residual error rates. The potential applications in ctDNA monitoring, tumor-naive cancer detection, and somatic mutation discovery are highly relevant to clinical genomics and precision oncology.

The strongest aspects of the work are the high reported duplex recovery, the low error rates, and the attempt to connect sequencing chemistry improvements to real clinical use cases. The main areas needing clarification are benchmarking fairness, workflow transparency, reproducibility, ctDNA cohort details, tumor-naive specificity, and practical implementation metrics such as cost, turnaround time, and computational requirements.

With these additions, the manuscript would provide a stronger and more actionable foundation for laboratories and researchers evaluating ppmSeq for ultra-accurate sequencing and clinical/translational genomics applications.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they used generative AI to come up with new ideas for their review.