Synthetic Dermatology Dataset for Racial Bias Mitigation in Tumor Classification
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/bb844rcjx5
下载链接
链接失效反馈官方服务:
资源简介:
The Synthetic Dermatology Dataset is a two-part collection created through generative image translation designed to facilitate the development of fair and robust skin disease classification models across the full spectrum of Fitzpatrick Skin Types (FST). This dataset was created for the data augmentation study in our paper, which explores the use of this synthetic data approach to mitigate racial bias in AI-assisted dermatologic diagnosis.
Dataset Composition (Total 6,000 Images)
- The dataset consists of two complementary subsets, derived from 3,000 manually-filtered clinical images:
3,000 Original Images (Simulated FST I–II):
- Sourced from the publicly available SD-260 dataset (http://doi.org/10.1109/TNNLS.2019.2917524)
- Manually filtered by a board-certified dermatologist to include only high-quality clinical photographs visually consistent with light skin tones (FST I–II).
3,000 Synthetic Images (Simulated FST V–VI):
- Generated as synthetic counterparts of the original 3,000 images, translated to simulate darker skin tones (FST V–VI).
- The translation was performed using the high-performing I²SB-based translation model (https://doi.org/10.48550/arXiv.2302.05872).
Skin Lesion Content
- The images are organized for binary classification of tumorous skin diseases (Benign vs. Malignant).
- The Benign category includes 2,004 samples, consisting of Nevus (NV) with 926 samples and Seborrheic Keratosis (SK) with 1,078 samples.
- The Malignant category includes 996 samples, composed of Basal Cell Carcinoma (BCC) with 395 samples, Squamous Cell Carcinoma (SCC) with 289 samples, and Malignant Melanoma (MM) with 312 samples.
创建时间:
2025-10-23



