Skip to main content

Write a PREreview

Integrating Agentic AI to Automate ICD-10 Medical Coding

Posted
Server
Preprints.org
DOI
10.20944/preprints202512.2138.v1

Automating ICD-10 coding from discharge summaries remains demanding because coders analyze clinical narratives while justifying decisions. This study compares three automation patterns: PLM-ICD as a standalone deep learning system emitting 15 codes per case, LLM-only generation with full autonomy, and a hybrid approach where PLM-ICD drafts candidates for an agentic LLM filter to accept or reject. All strategies were evaluated on 19,801 MIMIC-IV summaries using four LLMs spanning compact (Qwen2.5-3B, Llama-3.2-3B, Phi-4-mini) through large scale (Sonnet-4.5). Precision guided evaluation because coders still supply any missing diagnoses. PLM-ICD alone reached 55.8% precision while always surfacing 15 suggestions. LLM-only generation lagged severely (1.5--34.6% precision) and produced inconsistent output sizes. The agentic filter delivered the best trade-off: compact LLMs reviewed the 15 candidates, discarded weak evidence, and returned 2--8 high-confidence codes. Llama-3.2-3B, for example, improved from 1.5% as a generator to 55.1% as a verifier while trimming false positives by 73%. These results show that positioning LLMs as quality controllers, rather than primary generators, yields reliable support for clinical coding teams, while formal recall/F1 reporting remains future work for fully autonomous implementations.

You can write a PREreview of Integrating Agentic AI to Automate ICD-10 Medical Coding. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now