Saltar al contenido principal

Escribe una PREreview

A Machine Learning Framework for Team Success Classification in Professional Football: A Pilot Study Using Premier League Performance Data

Publicada
Servidor
Preprints.org
DOI
10.20944/preprints202605.1076.v1

In the era of data-driven decision-making, the pursuit of competitive excellence in professional football has evolved beyond instinct and tradition. This research explores the question: What makes a football team successful? — by adopting a team-centric machine learning approach grounded in performance analytics. Using a comprehensive dataset of Premier League player statistics from 1992 to 2019, the study aims to develop predictive models that can identify the key performance indicators (KPIs) that drive team success over time. Chapter I establishes the research background, problem statement, and objectives, emphasizing the growing relevance of artificial intelligence in modern football analysis. Chapter II presents a critical review of existing literature on sports analytics and machine learning, highlighting methodological gaps in explainable, team-focused success modelling. Chapter III details a structured methodology based on the CRISP-DM framework, encompassing data preprocessing, feature engineering, performance tier formulation, feature selection strategies, and supervised learning model development. Three supervised classification models-Logistic Regression, Random Forest, and Gradient Boosting—were implemented and evaluated using metrics including Accuracy, F1-Score, ROC-AUC, and confusion matrices. Ensemble learning techniques, including voting and stacking, were further explored to enhance predictive robustness. Model stability was assessed through 5-fold stratified cross-validation, and paired t-tests on cross-validated F1-scores indicated no statistically significant performance differences between models (p > 0.05). Gradient Boosting demonstrated consistently strong performance (mean F1-score ≈ 1.00), low variance across folds, and superior interpretability, supporting its selection as the primary base learner within the final ensemble framework. To address model transparency, SHAP (SHapley Additive exPlanations) was applied at both team and player levels, enabling granular interpretation of feature contributions to success predictions. The findings reveal that attacking efficiency, defensive stability, and disciplinary control consistently influence successful team outcomes. Beyond predictive accuracy, the study proposes practical decision-support extensions, like performance tiering, highlighting the real-world applicability of the framework. This project ultimately aims not only to predict success but to uncover why certain teams win—offering insights that could inform coaching, scouting, and strategy. The outcome is a step forward in applying AI to assist the beautiful game to further evolve.

Puedes escribir una PREreview de A Machine Learning Framework for Team Success Classification in Professional Football: A Pilot Study Using Premier League Performance Data. Una PREreview es una revisión de un preprint y puede variar desde unas pocas oraciones hasta un extenso informe, similar a un informe de revisión por pares organizado por una revista.

Antes de comenzar

Te pediremos que inicies sesión con tu ORCID iD. Si no tienes un iD, puedes crear uno.

¿Qué es un ORCID iD?

Un ORCID iD es un identificador único que te distingue de otros/as con tu mismo nombre o uno similar.

Comenzar ahora