Skip to main content

Write a comment

PREreview of Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence

Published
DOI
10.5281/zenodo.17992918
License
CC BY 4.0

This paper tackles a critical and often under-addressed challenge in self-service Business Intelligence: the data preparation phase that precedes dashboarding and analysis. While modern BI tools have significantly lowered the barrier for visualization, data transformation and table joining remain major pain points for non-technical users. By analyzing approximately 2,000 real-world BI projects, the authors provide strong empirical motivation for addressing data preparation as a first-class problem in BI workflows.

The key contribution of the paper is Auto-Prep, a system that holistically predicts both transformation and join steps by modeling their interdependence. The use of a graph-based approach inspired by Steiner-tree algorithms, coupled with provable quality guarantees, distinguishes this work from prior heuristic-based or language-model-driven methods. The reported results—correctly predicting over 70% of preparation steps and outperforming both existing algorithms and GPT-4—demonstrate the effectiveness of the proposed approach.

One limitation is that the system focuses on predicting preparation steps rather than executing or validating them within end-to-end BI pipelines. Additionally, the evaluation is based on public BI projects, and it remains unclear how well the approach generalizes to enterprise environments with stricter governance and schema complexity.

Overall, this paper makes a strong contribution to BI research by addressing a core bottleneck in self-service analytics and proposing a principled, data-driven solution with clear practical relevance.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.

You can write a comment on this PREreview of Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now