OctoMed/II-Medical-SFT

Name: OctoMed/II-Medical-SFT
Creator: OctoMed
Published: 2026-04-10 04:54:03
License: 暂无描述

Hugging Face2026-04-10 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/OctoMed/II-Medical-SFT

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: model dtype: string - name: question dtype: string - name: answer dtype: string - name: responses sequence: string splits: - name: train num_bytes: 24648065088 num_examples: 2197741 download_size: 11378805765 dataset_size: 24648065088 configs: - config_name: default data_files: - split: train path: data/train-* --- # II-Medical - Medical Reasoning SFT Dataset ## Description This dataset contains medical reasoning data for supervised fine-tuning. It includes questions with detailed reasoning responses that demonstrate step-by-step medical thinking. We greatly appreciate and build from the original data source available at https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT. We just modify it slightly and filter some examples to have <think>...</think> tokens surrounding model reasoning. ## Data Fields - `question`: The medical question - `answer`: The final answer (extracted from reasoning) - `responses`: Detailed reasoning responses with chain-of-thought ## Splits - `train`: Training data with reasoning responses ## Usage ```python from datasets import load_dataset dataset = load_dataset("OctoMed/II-Medical-SFT") ``` ## Citation If you find our work helpful, feel free to give us a cite! ``` @article{ossowski2025octomed, title={OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning}, author={Ossowski, Timothy and Zhang, Sheng and Liu, Qianchu and Qin, Guanghui and Tan, Reuben and Naumann, Tristan and Hu, Junjie and Poon, Hoifung}, journal={arXiv preprint arXiv:2511.23269}, year={2025} } ```

--- dataset_info: 特征字段： - 字段名: model，数据类型: 字符串 - 字段名: question，数据类型: 字符串 - 字段名: answer，数据类型: 字符串 - 字段名: responses，数据类型: 字符串序列数据拆分： - 拆分名称: train，字节数: 24648065088，样本数: 2197741 下载大小: 11378805765 数据集占用大小: 24648065088 配置项： - 配置名称: default，数据文件： - 拆分: train，路径: data/train-* --- # II-Medical——医疗推理监督微调（Supervised Fine-Tuning）数据集 ## 数据集描述本数据集面向监督微调任务提供医疗推理相关数据，涵盖带有详细推理回复的医疗问题，此类回复可展示逐步推演的医学思维过程。本项目基于原始数据集开发，在此感谢原始数据集来源：https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT。我们仅对原始数据进行了小幅修改与筛选，并为模型推理内容添加了<think>...</think>标记进行包裹。 ## 数据字段 - `question`: 医疗问题 - `answer`: 最终答案（从推理过程中提取） - `responses`: 包含思维链（Chain-of-Thought）的详细推理回复 ## 数据拆分 - `train`: 带有推理回复的训练数据集 ## 使用方法 python from datasets import load_dataset dataset = load_dataset("OctoMed/II-Medical-SFT") ## 引用若本数据集对你的研究有所帮助，请引用以下文献： @article{ossowski2025octomed, title={OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning}, author={Ossowski, Timothy and Zhang, Sheng and Liu, Qianchu and Qin, Guanghui and Tan, Reuben and Naumann, Tristan and Hu, Junjie and Poon, Hoifung}, journal={arXiv preprint arXiv:2511.23269}, year={2025} }

提供机构：

OctoMed

5,000+

优质数据集

54 个

任务类型

进入经典数据集