OctoMed/II-Medical-SFT
收藏Hugging Face2026-04-10 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/OctoMed/II-Medical-SFT
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: model
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
- name: responses
sequence: string
splits:
- name: train
num_bytes: 24648065088
num_examples: 2197741
download_size: 11378805765
dataset_size: 24648065088
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# II-Medical - Medical Reasoning SFT Dataset
## Description
This dataset contains medical reasoning data for supervised fine-tuning. It includes questions with detailed reasoning responses that demonstrate step-by-step medical thinking. We greatly appreciate and build from the original data source available at https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT. We just modify it slightly and filter some examples to have <think>...</think> tokens surrounding model reasoning.
## Data Fields
- `question`: The medical question
- `answer`: The final answer (extracted from reasoning)
- `responses`: Detailed reasoning responses with chain-of-thought
## Splits
- `train`: Training data with reasoning responses
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("OctoMed/II-Medical-SFT")
```
## Citation
If you find our work helpful, feel free to give us a cite!
```
@article{ossowski2025octomed,
title={OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning},
author={Ossowski, Timothy and Zhang, Sheng and Liu, Qianchu and Qin, Guanghui and Tan, Reuben and Naumann, Tristan and Hu, Junjie and Poon, Hoifung},
journal={arXiv preprint arXiv:2511.23269},
year={2025}
}
```
---
dataset_info:
特征字段:
- 字段名: model,数据类型: 字符串
- 字段名: question,数据类型: 字符串
- 字段名: answer,数据类型: 字符串
- 字段名: responses,数据类型: 字符串序列
数据拆分:
- 拆分名称: train,字节数: 24648065088,样本数: 2197741
下载大小: 11378805765
数据集占用大小: 24648065088
配置项:
- 配置名称: default,数据文件:
- 拆分: train,路径: data/train-*
---
# II-Medical——医疗推理监督微调(Supervised Fine-Tuning)数据集
## 数据集描述
本数据集面向监督微调任务提供医疗推理相关数据,涵盖带有详细推理回复的医疗问题,此类回复可展示逐步推演的医学思维过程。本项目基于原始数据集开发,在此感谢原始数据集来源:https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT。我们仅对原始数据进行了小幅修改与筛选,并为模型推理内容添加了<think>...</think>标记进行包裹。
## 数据字段
- `question`: 医疗问题
- `answer`: 最终答案(从推理过程中提取)
- `responses`: 包含思维链(Chain-of-Thought)的详细推理回复
## 数据拆分
- `train`: 带有推理回复的训练数据集
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("OctoMed/II-Medical-SFT")
## 引用
若本数据集对你的研究有所帮助,请引用以下文献:
@article{ossowski2025octomed,
title={OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning},
author={Ossowski, Timothy and Zhang, Sheng and Liu, Qianchu and Qin, Guanghui and Tan, Reuben and Naumann, Tristan and Hu, Junjie and Poon, Hoifung},
journal={arXiv preprint arXiv:2511.23269},
year={2025}
}
提供机构:
OctoMed



