PREreviews of “Text-to-SQL for Enterprise Data Analytics”

Skip to preprint details Skip to PREreviews

Text-to-SQL for Enterprise Data Analytics

by Albert Chen, Manas Bundele, Gaurav Ahlawat, Patrick Stetz, Zhitao Wang, Qiang Fei, Donghoon Jung, Audrey Chu, Bharadwaj Jayaraman, Ayushi Panth, Yatin Arora, Sourav Jain, Renjith Varma, Alexey Ilin, Iuliia Melnychuk, Chelsea Chueh, Joyan Sil, and Xiaofeng Wang

Posted: July 18, 2025
Server: arXiv
DOI: 10.48550/arxiv.2507.14372

Abstract

The introduction of large language models has brought rapid progress on Text-to-SQL benchmarks, but it is not yet easy to build a working enterprise solution. In this paper, we present insights from building an internal chatbot that enables LinkedIn's product managers, engineers, and operations teams to self-serve data insights from a large, dynamic data lake. Our approach features three components. First, we construct a knowledge graph that captures up-to-date semantics by indexing database metadata, historical query logs, wikis, and code. We apply clustering to identify relevant tables for each team or product area. Second, we build a Text-to-SQL agent that retrieves and ranks context from the knowledge graph, writes a query, and automatically corrects hallucinations and syntax errors. Third, we build an interactive chatbot that supports various user intents, from data discovery to query writing to debugging, and displays responses in rich UI elements to encourage follow-up chats. Our chatbot has over 300 weekly users. Expert review shows that 53% of its responses are correct or close to correct on an internal benchmark set. Through ablation studies, we identify the most important knowledge graph and modeling components, offering a practical path for developing enterprise Text-to-SQL solutions.

Read the preprint

1 PREreview

Write a PREreview Request a PREreview

PREreview by Rupesh Ghosh

Authored by Rupesh Ghosh

Summary
The paper explains how LinkedIn developers created an enterprise Text-to-SQL system which enables users to perform self-service analytics on their expanding data lake. The system unites three components which include a knowledge graph and Text-to-SQL agent and interactive chatbot to perform…

Read the PREreview by Rupesh Ghosh

PREreviews of Text-to-SQL for Enterprise Data Analytics

1 PREreview