Ir para o conteúdo principal

Escrever uma avaliação PREreview

A Resource-Efficient Approach to Text-Conditional Chest X-ray Generation Using Latent Diffusion Models

Publicado
Servidor
Preprints.org
DOI
10.20944/preprints202506.1783.v1

Background: Current medical image generation models typically require substantial computational resources, creating practical barriers for many research institutions. Recent diffusion models achieve notable results but demand multiple high-end GPUs and large datasets, limiting accessibility and reproducibility in medical AI research.Methods: We present a resource-efficient latent diffusion model for text-conditional chest X-ray generation, trained on a single NVIDIA RTX 4060 GPU using the Indiana University Chest X-ray dataset (3,301 frontal images). Our architecture combines a Variational Autoencoder (VAE) with 3.25M parameters and 8 latent channels, a U-Net denoising network with 39.66M parameters incorporating cross-attention mechanisms, and a BioBERT text encoder fine-tuned with parameter-efficient methods (593K trainable from 108.9M total parameters). We employ optimization strategies including gradient checkpointing, mixed precision training, and gradient accumulation to enable training within 8GB VRAM constraints.Results: The model achieves a validation loss of 0.0221 after 387 epochs of diffusion training, with the VAE converging at epoch 67. Inference time averages 663ms per 256×256 image on the RTX 4060, enabling real-time generation. Total training time was approximately 96 hours compared to 552+ hours reported for comparable multi-GPU models. The system successfully generates anatomically plausible chest X-rays conditioned on clinical text descriptions including various pathological findings.Conclusions: Our work demonstrates that effective medical image generation does not require massive computational resources. By achieving functional results with a single consumer GPU and limited data, we provide a practical pathway for medical AI research in resource-constrained settings. All code, model weights, and training configurations are publicly available at https://github.com/priyam-choksi/cxr-diffusion to facilitate reproducibility and further research.

Você pode escrever uma avaliação PREreview de A Resource-Efficient Approach to Text-Conditional Chest X-ray Generation Using Latent Diffusion Models. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.

O que é um ORCID iD?

Um ORCID iD é um identificador único que diferencia você de outras pessoas com o mesmo nome ou nome semelhante.

Começar agora