five

xini666666/OLIVES_Dataset

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/xini666666/OLIVES_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit size_categories: - 10K<n<100K pretty_name: 'OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics' tags: - medical dataset_info: - config_name: biomarker_detection features: - name: Image dtype: image - name: B1 dtype: float64 - name: B2 dtype: float64 - name: B3 dtype: float64 - name: B4 dtype: float64 - name: B5 dtype: float64 - name: B6 dtype: float64 - name: BCVA dtype: float64 - name: CST dtype: float64 - name: Eye_ID dtype: float64 - name: Patient_ID dtype: float64 splits: - name: train num_bytes: 15852565958.136 num_examples: 78822 - name: test num_bytes: 968486633.741 num_examples: 3871 download_size: 15923453393 dataset_size: 16821052591.876999 - config_name: disease_classification features: - name: Image dtype: image - name: Scan (n/49) dtype: float64 - name: Atrophy / thinning of retinal layers dtype: float64 - name: Disruption of EZ dtype: float64 - name: DRIL dtype: float64 - name: IR hemorrhages dtype: float64 - name: IR HRF dtype: float64 - name: Partially attached vitreous face dtype: float64 - name: Fully attached vitreous face dtype: float64 - name: Preretinal tissue/hemorrhage dtype: float64 - name: Vitreous debris dtype: float64 - name: VMT dtype: float64 - name: DRT/ME dtype: float64 - name: Fluid (IRF) dtype: float64 - name: Fluid (SRF) dtype: float64 - name: Disruption of RPE dtype: float64 - name: PED (serous) dtype: float64 - name: SHRM dtype: float64 - name: Eye_ID dtype: float64 - name: BCVA dtype: float64 - name: CST dtype: float64 - name: Patient_ID dtype: int64 - name: Disease Label dtype: float64 splits: - name: train num_bytes: 15860241253.136 num_examples: 78822 download_size: 15061835755 dataset_size: 15860241253.136 configs: - config_name: biomarker_detection data_files: - split: train path: biomarker_detection/train-* - split: test path: biomarker_detection/test-* - config_name: disease_classification data_files: - split: train path: disease_classification/train-* --- # OLIVES_Dataset ## Abstract Clinical diagnosis of the eye is performed over multifarious data modalities including scalar clinical labels, vectorized biomarkers, two-dimensional fundus images, and three-dimensional Optical Coherence Tomography (OCT) scans. While the clinical labels, fundus images and OCT scans are instrumental measurements, the vectorized biomarkers are interpreted attributes from the other measurements. Clinical practitioners use all these data modalities for diagnosing and treating eye diseases like Diabetic Retinopathy (DR) or Diabetic Macular Edema (DME). Enabling usage of machine learning algorithms within the ophthalmic medical domain requires research into the relationships and interactions between these relevant data modalities. Existing datasets are limited in that: ($i$) they view the problem as disease prediction without assessing biomarkers, and ($ii$) they do not consider the explicit relationship among all four data modalities over the treatment period. In this paper, we introduce the Ophthalmic Labels for Investigating Visual Eye Semantics (OLIVES) dataset that addresses the above limitations. This is the first OCT and fundus dataset that includes clinical labels, biomarker labels, and time-series patient treatment information from associated clinical trials. The dataset consists of $1268$ fundus eye images each with $49$ OCT scans, and $16$ biomarkers, along with $3$ clinical labels and a disease diagnosis of DR or DME. In total, there are 96 eyes' data averaged over a period of at least two years with each eye treated for an average of 66 weeks and 7 injections. OLIVES dataset has advantages in other fields of machine learning research including self-supervised learning as it provides alternate augmentation schemes that are medically grounded. ## Subsets There are 2 subsets included in this dataset: - Disease Classification (`disease_classification`) - Biomarker Detection ('biomarker_detection') Disease classification provides the full dataset while the biomarker detection subset provides a curated train-test split. ### Disease Classification This subset contains information regarding 78,000+ OCT scans obtained from a series of visits patients performed. In terms of labels, there are: - `Image`: An image of the OCT scan - `BCVA`: Best Central Visual Acuity - `CST`: Central Subfield Thickness - `Patient ID`: A value to help distinguish different patients - `Disease Label`: A value of `0` for DR (Diabetic Retinopathy) and `1` for DME (Diabetic Macular Edema) This information can be used to classify the disease. In addition, the first and last visit of a patient included extra biomarker information. This can be summarized into these 16 mostly-boolean labels: - `Scan (n/49)`: The scan number out of the 49 scans taken for each patient - `Atrophy / thinning of retinal layer` - `Disruption of EZ`: Disruption of Ellipsoid Zone - `DRIL`: Disruption of Retinal Inner Layers - `IR hemorrhages`: Intraretinal hemorrhages - `IR HRF`: Intraretinal Hyperreflective Foci - `Partially attached vitreous face` - `Fully attached vitreous face` - `Preretinal tissue/hemorrhage` - `Vitreous debris` - `VMT`: Vitreomacular Traction - `DRT/ME`: Diffuse Retinal Thickening or Macular Edema - `Fluid (IRF)`: Intraretinal Fluid - `Fluid (SRF)`: Subretinal Fluid - `Disruption of RPE`: Disruption of Retinal Pigment Epithelium - `PED (serous)`: Pigment Epithelial Detachment - `SHRM`: Subretinal Hyperreflective Material - `Eye_ID`: A value to help distinguish different eye scans ### Biomarker Detection This subset was used to host the 2023 Video and Image Processing Cup (VIP) challenge hosted by IEEE Signal Processing Society. The goal is to detect 6 biomarkers given image and clinical label data. For additional information reagarding the challenge, the metrics, the train-test data splits, please visit: https://alregib.ece.gatech.edu/competitions/2023-vip-cup/ The 6 biomarkers and their associated interpretations: - **B1 (Intraretinal Hyperreflective Foci (IRHRF)):** were indicated as present with the appearance of intraretinal, highly reflective spots, which correspond pathologically to microaneurysms or hard exudates, with or without shadowing of the more posterior retinal layers. - **B2 (A Partially Attached Vitreous Face (PAVF)):** was indicated as present with evidence of perifoveal detachment of the vitreous from the internal limiting membrane (ILM) with a macular attachment point within a 3-mm radius of the fovea. - **B3 (A Fully Attached Vitreous Face (FAVF)):** was indicated as present with no evidence of perifoveal or macular detachment from the ILM. - **B4 (Intraretinal Fluid (IRF)):** was indicated as present when intraretinal hyporeflective areas or cysts had a minimum fluid height of 20 µm - **B5 (Diffuse Retinal Thickening or Diabetic Macular Edema (DRT/ME)):** was indicated as present when there was increased retinal thickness of 50 µm above the otherwise flat retina surface with associated reduced reflectivity in the intraretinal tissues - **B6 (Vitreous Debris (VD)):** was indicated as present with evidence of hyperreflective foci in the vitreous or shadowing of the retinal layers in the absence of an intraretinal hemorrhage. The 6 biomarkers are chosen to have balanced train-test splits. ## Data Download Sample code to download the disease classification dataset: ```python from datasets import load_dataset from torch.utils.data import DataLoader olives = load_dataset('gOLIVES/OLIVES_Dataset', 'disease_classification', split = 'train') # Covert into a Format Usable by Pytorch olives = olives.with_format("torch") dataloader = DataLoader(olives, batch_size=4) for batch in dataloader: print(batch) # Example to get the VMT Biomarker of the first image in the dataset. print(olives[0]['VMT']) ``` ## Known Issues - Patient ID #79 has missing `BCVA` and `CST` for most visits except the first and last visit as the biomarker information is present - Certain visits for patients seem to have the exact same scans as a previous visit. For instance Patient ID #61 has identical images in W8 and their next visit in W12. ## Links **Associated Website**: https://alregib.ece.gatech.edu/olives-dataset/ **More work from our Lab**: https://alregib.ece.gatech.edu/ **Paper**: 1. OLIVES Dataset: https://arxiv.org/pdf/2209.11195 2. VIP Cup Paper: https://arxiv.org/pdf/2408.11170 ## Citations If you find the work useful, please include the following citations in your work: > @inproceedings{prabhushankarolives2022,\ > title={OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics},\ > author={Prabhushankar, Mohit and Kokilepersaud, Kiran and Logan, Yash-yee and Trejo Corona, Stephanie and AlRegib, Ghassan and Wykoff, Charles},\ > booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2 (NeurIPS Datasets and Benchmarks 2022)},\ > year={2022}\ > } > > @article{alregib2024ophthalmic, title={Ophthalmic Biomarker Detection: Highlights From the IEEE Video and Image Processing Cup 2023 Student Competition [SP Competitions]}, author={AlRegib, Ghassan and Prabhushankar, Mohit and Kokilepersaud, Kiran and Chowdhury, Prithwijit and Fowler, Zoe and Corona, Stephanie Trejo and Thomaz, Lucas A and Majumdar, Angshul}, journal={IEEE Signal Processing Magazine}, volume={41}, number={4}, pages={96--104}, year={2024}, publisher={IEEE} }
提供机构:
xini666666
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作