Skip to main content

Write a PREreview

Early Detection of Diabetes With Different Machine Learning Approach

Posted
Server
OSF
DOI
10.17605/osf.io/zwdsv

Early detection of diabetes is critical for effective management and prevention of complications. This study leverages DiaBD dataset to develop a machine learning approach for predicting diabetes status, utilizing clinical data from approximately 5,288 individuals after rigorous quality control. Key features include age, gender, vital signs (e.g., pulse rate, blood pressure), glucose levels, anthropometric measures (e.g., height, weight), and family history of diabetes and hypertension. Notably, the dataset presented two major challenges: class imbalance—with substantially fewer diabetic cases compared to non-diabetic cases—and data anomalies such as implausible numeric values (e.g., extreme glucose readings). Preprocessing steps included anomaly detection, and the use of stratified sampling to preserve class proportions during model training and evaluation. We evaluated multiple classification models—including Linear Discriminant Analysis (LDA), Random Forests, Gradient Boosting, Artificial Neural Networks (ANN), and others—using stratified cross-validation and an independent test set. Despite the imbalance, our best-performing model achieved a ROC-AUC of 0.85, demonstrating moderate-to-strong predictive capability. Feature importance analysis consistently highlighted glucose levels and weight as the most influential predictors. These findings underscore the potential of machine learning for diabetes risk stratification, while emphasizing the importance of addressing class imbalance and validating models on more representative datasets.

You can write a PREreview of Early Detection of Diabetes With Different Machine Learning Approach. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now