Skip to PREreview
Requested PREreview

PREreview of Design and validation of a clinical whole genome sequencing-based assay for patient screening in a large healthcare system

Published
DOI
10.5281/zenodo.20046969
License
CC0 1.0

Short summary of the research and contribution to the field

This preprint describes the development and validation of Geno4ME, a clinical PCR-free whole genome sequencing (WGS)-based laboratory-developed procedure (LDP) designed for population screening of actionable heritable disease genes and pharmacogenomics (PGx). The assay was validated using blood, saliva, and reference specimens, with a clinical reporting scope of 78 genes associated with actionable genomic conditions and PGx-related variants. The authors report validation against outside commercial reference laboratory methods, additional reference materials, and subsequent deployment in more than 2,000 patients as part of the Geno4ME clinical implementation study.

The work is important because WGS has the potential to move beyond one-time targeted panel testing and become a reusable genomic health record that can be reanalyzed as clinical knowledge expands. The study contributes to the field by showing that a scalable WGS-based screening workflow can be analytically validated for small variants, CNVs, selected PGx variants, and saliva/blood sample equivalence in a large healthcare system setting. This is highly relevant for the growing use of population genomic screening and preventive genomic medicine.

Positive feedback / strengths

  1. Clinically important implementation question. The manuscript addresses a major practical question: whether WGS can be validated and operationalized as a primary screening platform for actionable genomic conditions and PGx in a healthcare system. This is timely and relevant as population genomics moves from research programs into clinical laboratories.

  2. Strong clinical laboratory workflow detail. The authors provide useful operational details, including DNA extraction, PCR-free library preparation, NovaSeq 6000 sequencing, target depth, PhiX QC, routine germline QC samples, DRAGEN analysis, CNV/SV calling, and IGV review. These details make the workflow easier for clinical laboratories to evaluate.

  3. Multiple validation layers. The study uses several validation approaches, including comparison to patient EHR variants, outside reference laboratory testing, WES comparison samples, CNV reference materials, PGx reference samples, and blood/saliva paired specimens. The validation-group table is helpful because it clearly separates each validation purpose.

  4. Practical blood/saliva comparison. The 60-patient matched blood/saliva comparison is valuable because saliva collection may be more scalable for population screening than phlebotomy. The reported 100% concordance between matched blood and saliva samples for small variants and PGx diplotypes supports the feasibility of flexible sample collection.

  5. Manual curation remains part of the workflow. I appreciate that the authors do not rely solely on automated classification. The manuscript describes ACE-based initial classification followed by manual curation, literature review, and IGV inspection for selected variants, which is appropriate for clinical genomic interpretation.

Major issues

1. The “100% sensitivity/specificity” claims should be interpreted more cautiously because the positive variant counts are limited

The Geno4ME LDP showed 100% concordance against the outside reference method in the WGS validation cohort, but this was based on 17 positive genes / variants and a very large number of negative gene comparisons. The confidence interval for sensitivity is wide, reported as 80.5%–100%, which reflects the small number of positive P/LP findings.

Suggested improvement: The authors should emphasize the distinction between excellent observed concordance and the uncertainty created by limited positive variant counts. It would strengthen the manuscript to include more positive reference samples, especially for clinically important genes and variant types that are harder to detect.

2. CNV validation is promising but limited by the number and diversity of CNV reference samples

The manuscript reports 100% concordance for seven NIBSC CNV reference samples involving MLH1 and MSH2 CNVs. This is encouraging, but it is a relatively narrow CNV validation set for a WGS screening assay intended to cover many genes and conditions.

Suggested improvement: The authors should clarify the intended clinical scope of CNV reporting and consider adding more diverse CNV examples, including single-exon deletions/duplications, multi-exon events, whole-gene events, complex rearrangements, and CNVs across genes beyond MLH1/MSH2. If additional CNV data are not available, the manuscript should more explicitly state the limits of the current CNV validation.

3. PGx scope should be clarified, especially regarding the number of genes and excluded complex PGx loci

The abstract states that the clinical deliverable includes 4 PGx genes, while the methods describe five PGx genes / gene-drug elements, including rs12777823 in the CYP2C cluster. This may create confusion for readers.

Suggested improvement: The authors should harmonize the abstract, methods, tables, and discussion so the PGx scope is completely clear. They should also clearly explain why key complex PGx regions such as CYP2D6 were not included, since the discussion acknowledges challenges with CYP2D6 and other polymorphic or repetitive regions.

4. The reliance on reported P/LP variants from outside reference laboratories may limit evaluation of VUS and classification discordance

For the outside reference method comparison, only variants reported as pathogenic or likely pathogenic by the outside provider were treated as true positives because not all VUS were reported by the outside laboratory. This is understandable operationally, but it limits evaluation of VUS concordance and broader variant-classification performance.

Suggested improvement: The authors should discuss this as a limitation more explicitly. If possible, they should provide additional analysis of classification discordance, especially for variants that Geno4ME initially classified differently from manual curation or from outside laboratory interpretation.

5. The ACE automated classification workflow needs clearer performance framing

The manuscript reports that ACE initially evaluated a large number of variants and that manual curation reclassified some variants, including 250 VUS to benign and 8 VUS to pathogenic. This is an important part of the clinical workflow, but readers need more clarity on how much human curation is required and where automation is most reliable or risky.

Suggested improvement: The authors should add more detail on:

  • the number of variants requiring manual review per patient

  • average curator time per case

  • whether curation was performed by one or more reviewers

  • whether classification was blinded to reference results

  • how disagreements were resolved

  • whether ACE performance differed by gene or variant class

This would help laboratories understand implementation burden and risk.

6. The VAF threshold of 10% should be justified for a clinical germline screening assay

The methods state that variants with VAF below 10% were excluded from further analysis. This may be appropriate for germline screening, but low-level variants can sometimes reflect mosaicism, clonal hematopoiesis, sample contamination, or technical artifacts. The TP53 discrepancy discussed in the manuscript highlights the importance of interpreting low-VAF findings carefully.

Suggested improvement: The authors should explain why 10% was chosen and how variants below this threshold are handled operationally. It would also be helpful to clarify whether low-VAF findings in cancer-predisposition genes such as TP53 are reviewed, flagged, or excluded automatically.

7. Generalizability to diverse populations should be strengthened

The validation cohort is described as having an average age of 56.5 years, with 157 participants self-reporting female sex and 32 male sex. However, population genomic screening performance may vary by ancestry, population allele frequency representation, medically underserved groups, and sample collection context.

Suggested improvement: The authors should provide ancestry/ethnicity distribution if available and discuss how well the validation cohort represents the broader population intended for implementation. This is particularly important for variant classification, VUS rates, and equitable population screening.

8. The deployed >2,000-patient implementation is important but under-described in the validation-focused abstract

The abstract states that the deployed LDP was used to sequence more than 2,000 patients as part of Geno4ME. This is highly valuable, but readers would benefit from more operational outcomes from that implementation phase.

Suggested improvement: The authors should include more implementation metrics, such as:

  • sample failure rate

  • redraw/recollection rate

  • average turnaround time

  • proportion of saliva vs blood samples

  • proportion with reportable findings

  • VUS burden

  • number requiring manual review

  • repeat sequencing rate

  • clinical follow-up workflow

These details would make the manuscript more actionable for healthcare systems considering similar WGS-based screening.

Minor issues

  1. Clarify whether the assay is “screening” or “diagnostic.” The manuscript uses population screening language, but it also describes a clinical LDP. A short clarification of intended use would help readers distinguish screening, diagnostic confirmation, and return-of-results workflows.

  2. Define LDP early and consistently. “Laboratory-developed procedure” should be clearly defined at first use, especially for readers more familiar with “laboratory-developed test” terminology.

  3. Resolve PGx terminology. The manuscript should consistently state whether the PGx deliverable is 4 genes, 5 genes, 4 genes plus one additional variant, or 7 gene-drug pairs.

  4. Clarify the gene list update process. Since WGS enables future expansion, it would be helpful to describe how new genes will be added, validated, interpreted, and governed.

  5. Add more discussion on reanalysis. The introduction emphasizes the value of WGS as a lifetime genomic health record. The manuscript would be strengthened by a clearer plan for reanalysis frequency, recontact, patient consent, and report updates.

  6. Clarify saliva QC thresholds. The manuscript should include minimum DNA quantity/quality, microbial-read threshold, sample failure criteria, and whether saliva samples had lower coverage or more QC failures than blood.

  7. Expand limitations of short-read WGS. The discussion appropriately notes CYP2D6, pseudogene regions, repetitive regions, and complex SVs as challenges. This could be expanded into a clearer limitations subsection.

  8. Report turnaround time and cost if available. Since scalability is a major clinical claim, turnaround time, cost per sample, sequencing batch size, and interpretation workload would be useful.

  9. Clarify confirmation policy. The workflow figure indicates verification of P/LP findings and/or PGx diplotypes with outside reference labs or orthogonal testing. The manuscript should clearly state which findings require confirmation before clinical return.

  10. Improve table readability for non-specialist clinical readers. The validation tables are useful, but a simplified summary table separating SNV/indel, CNV, PGx, blood/saliva, and WES comparison performance would improve readability.

Overall assessment

This is a strong and clinically relevant preprint describing the design and validation of a WGS-based LDP for population genomic screening in a large healthcare system. The manuscript’s main strengths are its practical implementation focus, use of PCR-free WGS, multi-layered validation design, blood/saliva comparison, integration of automated and manual variant interpretation, and early deployment in a large patient cohort.

The most important areas for improvement are clearer framing of performance claims given limited positive variant counts, broader CNV validation, consistent PGx terminology, deeper discussion of automated classification and manual curation burden, and more implementation metrics from the >2,000-patient deployment. With these additions, the manuscript would provide a stronger, more transparent, and more actionable model for healthcare systems considering WGS-based genomic screening.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they used generative AI to come up with new ideas for their review.