Diabetic Retinopathy (DR) remains a leading cause of preventable blindness worldwide, and the markedly different rates of disease progression among individuals hinder the personalised design of screening intervals. Most current deep learning systems for DR assessment require tens of thousands of pixel-level lesion annotations, a demand that is impractical for many clinical centres. We conducted a retrospective diagnostic-accuracy study using the publicly released Davis dataset, which contains 9,939 non-mydriatic colour fundus photographs collected annually in four posterior-pole fields per eye from 2,740 patients examined at Jichi Medical University between May 2011 and June 2015. Images were categorized according to the International Clinical DR scale as no diabetic retinopathy (NDR), simple diabetic retinopathy (SDR), pre-proliferative diabetic retinopathy (PPDR) or proliferative diabetic retinopathy (PDR). An extremely small training set of 30 unlabeled normal images was used to fine-tune a Vision-Language Large Model (VLLM) that couples a frozen CLIP ViT-L/14 visual encoder with a Stable Diffusion v1.5 multimodal decoder through few-shot prompt engineering. The remaining photographs formed an independent test set containing 776 NDR, 256 SDR, 102 PPDR and 106 PDR images. Training was performed in PyTorch at a resolution of 512, and performance was reported as the Area Under the Receiver Operating Characteristic Curve (AUROC). For detection of DR, Our model achieved an AUROC of 0.952 (95% CI 0.941-0.962), outperforming PaDiM (0.812, 95% CI 0.802-0.823), RegAD (0.836, 95% CI 0.825-0.847) and WinCLIP (0.902, 95% CI 0.891-0.912). For four-class severity grading, our model reached an AUROC of 0.835 (95% CI 0.825-0.844), again exceeding PaDiM (0.686, 95% CI 0.675-0.696), RegAD (0.709, 95% CI 0.699-0.720) and WinCLIP (0.742, 95% CI 0.732-0.751). These results demonstrate that a high-performing VLLM can be trained using the proposed approach and an extremely small dataset without any manual annotations, greatly improving clinical data utilisation and diagnostic efficiency, while also providing a scalable zero-annotation pathway for DR screening in resource-limited settings.
Published in | Abstract Book of ICPHMS2025 & ICPBS2025 |
Page(s) | 44-44 |
Creative Commons |
This is an Open Access abstract, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Vision–Language Large Model, Diabetic Retinopathy Detection, Disease Severity Grading, Medical Image Analysis