Skip to main content

Write a PREreview

Towards Evaluating the Diagnostic Ability of LLMs

Posted
Server
Preprints.org
DOI
10.20944/preprints202409.0688.v3

On average, one in ten patients die because of a diagnostic error and medical errors are the third largest cause of death in the US. While LLMs have been proposed to help doctors with diagnoses, no research results have been published on comparing the diagnostic ability of many popular LLMs on an openly accessible real-patient cohort. In thus study, we compare LLMs from Google, OpenAI, Meta, Mistral, Cohere and Anthropic using our previously published evaluation methodology and explore improving their accuracy with RAG.

You can write a PREreview of Towards Evaluating the Diagnostic Ability of LLMs. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now