Predicting Incident Recurrence in Real-World IT Operations
- Posted
- Server
- Zenodo
- DOI
- 10.5281/zenodo.17954627
This paper presents an empirical study on predicting incident recurrence in real-world IT operations environments using Artificial Intelligence for IT Operations (AIOps).
The study is based on 7,123 operational incidents collected over a period of 7.9 months (April–December 2025) from a heterogeneous production-like monitoring environment, including Linux servers, Windows systems, and network devices. Incidents were collected via an industry-standard monitoring platform and analyzed using time-series forecasting models, with a primary focus on Facebook Prophet.
The objective of this work is not to present optimized or idealized results, but to provide an honest assessment of the challenges involved in applying predictive models to real operational data. Initial results show a prediction accuracy of approximately 28.3%, highlighting limitations related to data sparsity, seasonality, concept drift, and the lack of contextual operational features.
Beyond quantitative results, this study discusses practical implications for AIOps, including the impact of false positives in production environments, the trade-off between sensitivity and specificity, and the importance of explainability and human-in-the-loop feedback mechanisms.
This work is part of an ongoing applied research project (TheMonitoring.AI) focused on bridging the gap between academic AIOps research and real-world operational constraints. An updated version of this paper may be submitted to arXiv or extended for journal and conference submission.