Skip to preprint detailsSkip to PREreviews

PREreviews of Citation Hallucination Determines Success: An Empirical Comparison of Six Medical AI Research Systems

1 PREreview

  1. PREreview by Matt Spick

    Shi et al have written an interesting and timely piece on the reliability of large langauge models (LLMs) in producing medical research manuscripts. They introduce MedResearchBench, a benchmarking tool to assess the reliability of LLM outputs, and report on how different LLMs perform on their…

    Read the PREreview by Matt Spick