Write a PREreview

Survey and Benchmarking of Large Language Models for RTL Code Generation: Techniques and Open Challenges

by Arun Ravindran, Aditya Patra, Vahid Babaey, and Suresh Purini

Posted: September 19, 2025
Server: Preprints.org
DOI: 10.20944/preprints202509.1681.v1

Large language models (LLMs) are emerging as powerful tools for hardware design, with recent work exploring their ability to generate register-transfer level (RTL) code directly from natural-language specifications. This paper provides a survey and evaluation of LLM-based RTL generation. We review twenty-six published efforts, covering techniques such as fine-tuning, reinforcement learning, retrieval-augmented prompting, and multi-agent orchestration, and we analyze their contributions across eight methodological dimensions including debugging support, post-RTL metrics, and benchmark development. Building on this review, we experimentally evaluate frontier commercial models---GPT-4.1, GPT-4.1-mini, and Claude Sonnet 4---on the VerilogEval and RTLLM benchmarks under both single-shot and lightweight agentic settings. Results show that these models achieve up to 89.74% on VerilogEval and 96.08% on RTLLM, matching or exceeding prior domain-specific pipelines without specialized fine-tuning. Detailed failure analysis reveals systematic error modes, including FSM mis-sequencing, handshake drift, blocking vs. non-blocking misuse, and state-space oversimplification. Finally, we outline a forward-looking research roadmap toward natural-language-to-SoC design, emphasizing controlled specification schemas, open benchmarks and flows, PPA-in-the-loop feedback, and modular assurance frameworks. Together, this work provides both a critical synthesis of recent advances and a baseline evaluation of frontier LLMs, highlighting opportunities and challenges in moving toward AI-native electronic design automation.

You can write a PREreview of Survey and Benchmarking of Large Language Models for RTL Code Generation: Techniques and Open Challenges. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.