five

"Dermatology Image and Text Dataset for AI-Powered Diagnosis and RAG-Based Medical Support"

收藏
DataCite Commons2025-05-01 更新2025-05-17 收录
下载链接:
https://ieee-dataport.org/documents/professor-x-data-set
下载链接
链接失效反馈
官方服务:
资源简介:
"This dataset has been compiled and derived from publicly available dermatological image collections, including the ISIC 2018 Skin Lesion Dataset and the Atlas Dermatology archive. It comprises 49,100 high-resolution, anonymized images categorized into 32 classes, including 31 dermatological diseases and an additional \u201cUnknown\u201d class to improve real-world generalization. Each image is labeled based on expert classification standards and curated for deep learning applications.In addition to visual data, the dataset integrates a text corpus composed of medical literature related to each disease class. These documents have been segmented into smaller text chunks and transformed into semantic vector representations using OpenAI embeddings. This dual structure enables both image-based disease classification and Retrieval-Augmented Generation (RAG)-based contextual medical support, allowing for reproducible research in multimodal AI-driven diagnostics.This dataset is intended for non-commercial academic use and follows appropriate ethical guidelines. It supports research in medical computer vision, explainable AI, and hybrid decision support systems."

本数据集整合自公开可获取的皮肤病影像资源库,其中包含ISIC 2018皮肤病变数据集(ISIC 2018 Skin Lesion Dataset)与皮肤病学图谱档案库(Atlas Dermatology archive)。数据集共包含49100张经匿名化处理的高分辨率影像,被划分为32个类别:31个类别对应各类皮肤病,额外增设1个“未知(Unknown)”类别以提升模型在真实场景中的泛化能力。每张影像均依据专家分类标准完成标注,并针对深度学习应用场景进行了精心筛选与整理。 除视觉影像数据外,本数据集还集成了与各疾病类别相关的医学文献文本语料库。这些文献已被分割为小型文本片段,并通过OpenAI嵌入(OpenAI embeddings)技术转换为语义向量表征。这种双模态结构同时支持基于影像的疾病分类,以及基于检索增强生成(Retrieval-Augmented Generation,RAG)的上下文医疗辅助功能,可为多模态AI驱动的诊断研究提供可复现的实验基础。 本数据集仅可用于非商业性学术研究,并严格遵循相关伦理规范。其可支撑医学计算机视觉、可解释人工智能(explainable AI)以及混合决策支持系统等领域的研究工作。
提供机构:
IEEE DataPort
创建时间:
2025-05-01
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个包含49,100张皮肤病图像和对应医学文献的多模态数据集,涵盖32种皮肤病分类,支持AI驱动的皮肤病诊断和基于RAG的医学支持研究。数据集来源于公开的皮肤病图像集合,并经过专业标注和预处理,适用于非商业学术用途。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作