PREreview of Exploring AI’s Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare

by Gabriel Kwaku Agbeshie of Review & Curate Network (RCN)

Published: August 26, 2025
DOI: 10.5281/zenodo.16949097
License: CC BY 4.0

Exploring AI’s Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare

Brief summary of the study

This study evaluates the potential of AI to detect papilledema from fundus photographs in the context of dermatological treatment decisions in rural healthcare. The authors compared a fine-tuned ResNet CNN and GPT-4 against two human ophthalmologists using 1,389 fundus images. The ResNet model achieved the highest performance, with 99.49% accuracy and 100% specificity, surpassing both human experts (95.96% accuracy) and GPT-4 (85.86% accuracy). These findings highlight AI’s promise for supporting early papilledema detection, particularly in resource-limited settings where specialist access is scarce.

The study situates its contribution within existing literature on AI in ophthalmology and dermatology, extending it by focusing on drug-induced papilledema risk in dermatology patients. The authors conclude that while AI models, especially ResNet, show strong diagnostic potential, validation on diverse real-world datasets remains necessary. The most interesting aspect is the demonstration that AI can outperform specialists in a critical, vision-threatening condition, offering tangible benefits for healthcare equity in underserved areas.

Major comments

Comments on the strengths of methods employed and discussion written

The test set included only papilledema and normal cases, while pseudo-papilledema (clinically important) was excluded. This reduces clinical realism. It would be better to clearly state this limitation in the abstract and discussion, and if feasible, include pseudo-papilledema in future test sets to reflect real-world diagnostic challenges.
GPT-4’s training exposure to ophthalmic images is unknown, limiting interpretability of its results. Please clarify in the methods that GPT-4 was treated as a “black-box comparator” and emphasize in the discussion that its lower performance should not discredit LLMs broadly but highlight the need for domain-specific fine-tuning.
While the dataset is well described, it is not openly available, and the source code is not shared. This limits reproducibility and independent validation. Please provide a code repository (e.g., GitHub, Zenodo) with preprocessing scripts and model training details. If full data sharing is not possible due to ethics, consider providing a de-identified subset or synthetic dataset for benchmarking.

Minor comments

Comments on interpretation of the results, presentation of the data/figures

Confusion matrices could be clearer, and fundus images are relatively of low resolution. It would be better to consider adding explicit labels for “true positive/false negative” in confusion matrices and provide higher-resolution fundus images.
Economic considerations and rural healthcare integration of AI are mentioned only briefly. Add a short paragraph on cost-effectiveness, the feasibility of smartphone fundus cameras, and workflow integration in rural care.
While limitations are discussed (e.g., exclusion of pseudo-papilledema, single dataset) are not prominently linked back to clinical practice. Reframe this limitation in terms of clinical impact (e.g., how exclusion of pseudo-papilledema may overestimate accuracy).

Conflicts of interest of reviewers

None declared

Data and code availability

Data was not openly available; the authors note it is accessible upon request due to ethical restrictions. No source code link was provided; this reduces reproducibility.

Ethical clearance and approval

Ethical approval was reported with certificate number well outlined

Comments by section

Title

The title, “Exploring AI’s Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare,” appropriately reflects the study’s scope. It is specific and highlights both the technical and clinical dimensions.

Abstract

The abstract clearly states the research question (comparing AI models and human ophthalmologists for papilledema detection in dermatology-related care). It also outlines the approach (ResNet CNN vs GPT-4 vs humans) and the key findings (ResNet outperforming both). However, the abstract could benefit from a shorter contextual sentence linking papilledema more explicitly to dermatological drug risks (currently implied but not emphasized).

Introduction

The introduction summarized the research problem well: papilledema as a vision-threatening condition, its relevance in dermatology (drug-induced intracranial hypertension), and the lack of access to ophthalmologists in rural settings. The research question was also situated within AI’s growing role in ophthalmology. The authors also referenced relevant and the most recent literature.

Materials and methods

The dataset (1,389 fundus images) was clearly described, with preprocessing steps (contrast normalization, cropping, and resizing). Training methods (ResNet fine-tuning with discriminative learning rates and one-cycle policy) were appropriate for the limited dataset. GPT-4 evaluation was also included for comparison.
Statistical methods (sensitivity, specificity, PPV, NPV, accuracy, Cohen’s Kappa, and two-sample proportion tests) were appropriate and correctly reported.
Results were consistent across text and tables. However, explicit labeling of “true positive/false negative” in confusion matrices would be better for clarity purposes.

Discussion & conclusions

The discussion appropriately concludes that ResNet outperformed humans and GPT-4, with strong evidence (accuracy and Kappa values). In addition, the authors could further expand on cost-effectiveness and implementation challenges of AI in rural healthcare