five

Synthetic Dermatology Dataset for Racial Bias Mitigation in Tumor Classification

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/bb844rcjx5
下载链接
链接失效反馈
官方服务:
资源简介:
The Synthetic Dermatology Dataset is a two-part collection created through generative image translation designed to facilitate the development of fair and robust skin disease classification models across the full spectrum of Fitzpatrick Skin Types (FST). This dataset was created for the data augmentation study in our paper, which explores the use of this synthetic data approach to mitigate racial bias in AI-assisted dermatologic diagnosis. Dataset Composition (Total 6,000 Images) - The dataset consists of two complementary subsets, derived from 3,000 manually-filtered clinical images: 3,000 Original Images (Simulated FST I–II): - Sourced from the publicly available SD-260 dataset (http://doi.org/10.1109/TNNLS.2019.2917524) - Manually filtered by a board-certified dermatologist to include only high-quality clinical photographs visually consistent with light skin tones (FST I–II). 3,000 Synthetic Images (Simulated FST V–VI): - Generated as synthetic counterparts of the original 3,000 images, translated to simulate darker skin tones (FST V–VI). - The translation was performed using the high-performing I²SB-based translation model (https://doi.org/10.48550/arXiv.2302.05872). Skin Lesion Content - The images are organized for binary classification of tumorous skin diseases (Benign vs. Malignant). - The Benign category includes 2,004 samples, consisting of Nevus (NV) with 926 samples and Seborrheic Keratosis (SK) with 1,078 samples. - The Malignant category includes 996 samples, composed of Basal Cell Carcinoma (BCC) with 395 samples, Squamous Cell Carcinoma (SCC) with 289 samples, and Malignant Melanoma (MM) with 312 samples.
创建时间:
2025-10-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作