PREreview of Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence

by Rupesh Ghosh

Published: December 14, 2025
DOI: 10.5281/zenodo.17925096
License: CC BY 4.0

Summary

The research investigates the ongoing problem which self-service Business Intelligence systems face when preparing their data. The authors studied 2000 actual BI projects to demonstrate that data transformation operations and table joining processes need simultaneous prediction because they exist in a close relationship with each other. The authors present Auto-Prep as a system which predicts all preparation steps through a Steiner tree-inspired graph-based method.

Contribution

The framework unites data preparation prediction through a single model which handles transformations and joins together instead of analyzing them independently. The proposed algorithm demonstrates theoretical evidence which produces superior prediction accuracy than all existing methods and large language models.

Relevance

The research provides important findings because data preparation remains the longest and most error-prone stage of Business Intelligence work even though organizations use modern self-service platforms. The research maintains its practical value because it investigates business intelligence workflow applications which businesses operate in their actual environments.

Approach

The methodology is sound and well motivated. The research uses real project data to conduct a large-scale empirical study which supports the method and the graph-based modeling technique suits better for representing structural dependencies in BI workflows.

Strengths

Use of a large, real-world BI project dataset

Clear identification of an overlooked problem in self-service BI

Strong empirical results and theoretical guarantees

Meaningful comparison against existing algorithms and LLMs

Limitations

The evaluation focuses on historical workflows and prediction accuracy; integration into live BI tools and user-centric evaluation would further strengthen practical impact.

Overall assessment

The research presents a solid approach which enhances BI data preparation automation through its implementation. It offers both theoretical rigor and practical relevance and is well suited for further peer-reviewed publication.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.

Comments

Write a comment

No comments have been published yet.