Avalilação PREreview de The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation
- Publicado
- DOI
- 10.5281/zenodo.15776765
- Licença
- CC BY 4.0
Suggestions for the manuscript “The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation”: Below, we outline key strengths of the study as well as areas for improvement, intended to support refinement of the manuscript and broader impact of the work.
Strengths
Innovative Multi-Agent System Design
Novel orchestration of collaboration between multiple LLM agents with distinct roles (e.g., scientific critic, ML expert), guided by a human-in-the-loop.
Modular and customizable architecture — tasks can be assigned to different agents or LLMs; tools like ESM, AlphaFold-Multimer, and Rosetta can be swapped.
Automated and Efficient Workflow
Pipeline automates team formation, tool selection, role assignment, code generation, and iterative improvement steps.
Demonstrates low-lift, low-cost screening of trillions of nanobody sequences in minutes, accelerating discovery.
Faster iteration via parallel meetings (5–10 mins) followed by human adjudication to select optimal outputs.
Biological Validation
Two promising nanobody candidates targeting new SARS-CoV-2 variants (JN.1 and KP.3) identified and experimentally validated via wet-lab expression and ELISA.
Strong hybrid validation: AlphaFold-Multimer and Rosetta scores matched prior high-accuracy benchmarks for RBD-antibody binding.
Transparency and Reproducibility
All steps documented; easy to install and run in Jupyter notebooks.
GitHub repo well-organized and allows re-running experiments or customizing prompts.
Areas for Improvement
Validation and Evaluation Gaps
Only one use case (SARS-CoV-2); need additional domains to demonstrate generalizability.
Results not benchmarked against simpler alternatives (e.g., using ChatGPT alone or running same prompts without agent structure).
No quantitative assessment of how much each metric (e.g., ESM, Rosetta) contributed to final ranking — weighting system not validated.
Multi-Agent Design Transparency
Lacks detailed description of how the multi-agent architecture was built or optimized.
Decisions such as number of parallel meetings (N=3) or agenda structure are not justified; optimization unclear.
Would benefit from defining how agents are “given” tool access — Appendix suggests tools may have been run externally.
Output Quality and Reliability
Although agents wrote 99% of words, unclear how many were useful — could be excessive verbosity or hallucinations.
Human had to correct errors, including basic file I/O issues and tool execution bugs — suggests limited agent autonomy.
Scientific critic, while helpful, could introduce misleading conclusions without oversight.
Usability and Performance Concerns
Agents cannot access real-time data (e.g., newly emerging COVID variants).
Pipeline is not fully automated — requires step-by-step human prompting (e.g., between ESM → AlphaFold → Rosetta stages).
Variability in output not well quantified; how consistent is the pipeline when re-run?
Presentation and Accessibility
Figures could use larger text and more interpretation
Would benefit from a walkthrough video showing pipeline from start to finish.
Nanobody biology and software tool functions could be better explained for broader audiences.
Scientific Rigor and Detail
Methods section lacks detail, especially around how LLMs interface with software (e.g., was AlphaFold run internally?).
Evaluation could be strengthened by adding in vitro neutralization assays or cell-based functional screens beyond ELISA.
Could compare LLM-designed structures against traditional bioinformatics pipelines to assess true value-added.
Competing interests
The authors declare that they have no competing interests.