Ir para a Avaliação PREreview

Avalilação PREreview de The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation

Publicado
DOI
10.5281/zenodo.15776765
Licença
CC BY 4.0

Suggestions for the manuscript “The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation”: Below, we outline key strengths of the study as well as areas for improvement, intended to support refinement of the manuscript and broader impact of the work.

Strengths

Innovative Multi-Agent System Design

  • Novel orchestration of collaboration between multiple LLM agents with distinct roles (e.g., scientific critic, ML expert), guided by a human-in-the-loop.

  • Modular and customizable architecture — tasks can be assigned to different agents or LLMs; tools like ESM, AlphaFold-Multimer, and Rosetta can be swapped.

 Automated and Efficient Workflow

  • Pipeline automates team formation, tool selection, role assignment, code generation, and iterative improvement steps.

  • Demonstrates low-lift, low-cost screening of trillions of nanobody sequences in minutes, accelerating discovery.

  • Faster iteration via parallel meetings (5–10 mins) followed by human adjudication to select optimal outputs.

Biological Validation

  • Two promising nanobody candidates targeting new SARS-CoV-2 variants (JN.1 and KP.3) identified and experimentally validated via wet-lab expression and ELISA.

  • Strong hybrid validation: AlphaFold-Multimer and Rosetta scores matched prior high-accuracy benchmarks for RBD-antibody binding.

Transparency and Reproducibility

  • All steps documented; easy to install and run in Jupyter notebooks.

  • GitHub repo well-organized and allows re-running experiments or customizing prompts.

Areas for Improvement

Validation and Evaluation Gaps

  • Only one use case (SARS-CoV-2); need additional domains to demonstrate generalizability.

  • Results not benchmarked against simpler alternatives (e.g., using ChatGPT alone or running same prompts without agent structure).

  • No quantitative assessment of how much each metric (e.g., ESM, Rosetta) contributed to final ranking — weighting system not validated.

Multi-Agent Design Transparency

  • Lacks detailed description of how the multi-agent architecture was built or optimized.

  • Decisions such as number of parallel meetings (N=3) or agenda structure are not justified; optimization unclear.

  • Would benefit from defining how agents are “given” tool access — Appendix suggests tools may have been run externally.

Output Quality and Reliability

  • Although agents wrote 99% of words, unclear how many were useful — could be excessive verbosity or hallucinations.

  • Human had to correct errors, including basic file I/O issues and tool execution bugs — suggests limited agent autonomy.

  • Scientific critic, while helpful, could introduce misleading conclusions without oversight.

Usability and Performance Concerns

  • Agents cannot access real-time data (e.g., newly emerging COVID variants).

  • Pipeline is not fully automated — requires step-by-step human prompting (e.g., between ESM → AlphaFold → Rosetta stages).

  • Variability in output not well quantified; how consistent is the pipeline when re-run?

Presentation and Accessibility

  • Figures could use larger text and more interpretation

  • Would benefit from a walkthrough video showing pipeline from start to finish.

  • Nanobody biology and software tool functions could be better explained for broader audiences.

Scientific Rigor and Detail

  • Methods section lacks detail, especially around how LLMs interface with software (e.g., was AlphaFold run internally?).

  • Evaluation could be strengthened by adding in vitro neutralization assays or cell-based functional screens beyond ELISA.

  • Could compare LLM-designed structures against traditional bioinformatics pipelines to assess true value-added.

Competing interests

The authors declare that they have no competing interests.