General-Medical-AI/GMAI-Reasoning10K

Name: General-Medical-AI/GMAI-Reasoning10K
Creator: General-Medical-AI
Published: 2025-07-21 11:31:05
License: 暂无描述

Hugging Face2025-07-21 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/General-Medical-AI/GMAI-Reasoning10K

下载链接

链接失效反馈

官方服务：

资源简介：

GMAI-Reasoning10K是一个高质量的医疗图像推理数据集，包含10,000个经过精心挑选的样本。数据来源于95个可靠的医学数据集，如Kaggle、GrandChallenge和Open-Release，涵盖了包括X射线、CT和MRI在内的12种成像模式。数据预处理采用了SAMed-20M的标准方法，对3D数据（CT/MRI）提取了个别切片，并将像素值标准化到0-255范围内；对视频数据以每秒2帧的速度提取关键帧。从每个数据集中提取了关键元数据，包括背景信息、成像模式和标签，并使用GPT构建了信息性提示，生成带有单个正确答案的多选问题。通过严格的质量控制和淘汰抽样策略，排除了不符合预设标准的样本（如相关的注释或正确的标签），确保了最终数据集的高质量和可靠性。

GMAI-Reasoning10K is a high-quality medical image reasoning dataset containing 10,000 carefully selected samples. The data was collected from 95 medical datasets from reliable sources such as Kaggle, GrandChallenge, and Open-Release, covering 12 imaging modalities including X-ray, CT, and MRI. Data preprocessing followed the standardization methods from SAMed-20M: 3D data (CT/MRI) had individual slices extracted with pixel values normalized to the 0-255 range, while video data had key frames extracted at a rate of 2 frames per second. Key metadata was extracted from each dataset, including background information, imaging modality, and labels, and GPT was used to construct informative prompts that generate multiple-choice questions with a single correct answer. Strict quality control and reject sampling strategies were employed to eliminate samples that did not meet predefined standards (such as relevant annotations or correct labels), ensuring the high quality and reliability of the final dataset.

提供机构：

General-Medical-AI

5,000+

优质数据集

54 个

任务类型

进入经典数据集