five

lukemelas/synthetic-derm

收藏
Hugging Face2023-10-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lukemelas/synthetic-derm
下载链接
链接失效反馈
官方服务:
资源简介:
## Augmenting medical image classifiers with synthetic data from latent diffusion models Luke W. Sagers*, James A. Diao*, Luke Melas-Kyriazi*, Matthew Groh, Vijaytha Muralidharan, Zhuo Ran Cai, Jesutofunmi A. Omiye, Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou, and Arjun K. Manrai **Abstract:** While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearances, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. We further conducted a human reader study on the synthetic generations, revealing a correlation between physician-assessed photorealism and improvements in model performance. We release a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms. ### Data This is the data repository. ### Code The code and all instructions are available [here](https://github.com/manrai/synthetic-derm).
提供机构:
lukemelas
原始信息汇总

数据集概述

数据集名称

Augmenting medical image classifiers with synthetic data from latent diffusion models

数据集简介

该数据集包含458,920张合成皮肤疾病图像,这些图像由潜在扩散模型生成。数据集的发布旨在展示合成数据在增强医疗图像分类器训练中的应用,特别是在数据有限的场景下。研究表明,使用这些合成数据进行模型训练可以提高性能,但性能提升在合成图像与真实图像比例超过10:1时达到饱和,并且提升幅度小于添加真实图像带来的提升。

数据集内容

  • 合成皮肤疾病图像数量:458,920张
  • 生成策略:多种生成策略

数据集用途

用于增强医疗图像分类器的训练,特别是在数据有限的场景下。

相关研究

研究结果表明,合成数据可以作为模型开发的增强工具,但收集多样化的真实世界数据仍然是提高医疗AI算法性能的关键步骤。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作