five

OctoMed/II-Medical-SFT

收藏
Hugging Face2026-04-10 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/OctoMed/II-Medical-SFT
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: model dtype: string - name: question dtype: string - name: answer dtype: string - name: responses sequence: string splits: - name: train num_bytes: 24648065088 num_examples: 2197741 download_size: 11378805765 dataset_size: 24648065088 configs: - config_name: default data_files: - split: train path: data/train-* --- # II-Medical - Medical Reasoning SFT Dataset ## Description This dataset contains medical reasoning data for supervised fine-tuning. It includes questions with detailed reasoning responses that demonstrate step-by-step medical thinking. We greatly appreciate and build from the original data source available at https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT. We just modify it slightly and filter some examples to have <think>...</think> tokens surrounding model reasoning. ## Data Fields - `question`: The medical question - `answer`: The final answer (extracted from reasoning) - `responses`: Detailed reasoning responses with chain-of-thought ## Splits - `train`: Training data with reasoning responses ## Usage ```python from datasets import load_dataset dataset = load_dataset("OctoMed/II-Medical-SFT") ``` ## Citation If you find our work helpful, feel free to give us a cite! ``` @article{ossowski2025octomed, title={OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning}, author={Ossowski, Timothy and Zhang, Sheng and Liu, Qianchu and Qin, Guanghui and Tan, Reuben and Naumann, Tristan and Hu, Junjie and Poon, Hoifung}, journal={arXiv preprint arXiv:2511.23269}, year={2025} } ```

--- dataset_info: 特征字段: - 字段名: model,数据类型: 字符串 - 字段名: question,数据类型: 字符串 - 字段名: answer,数据类型: 字符串 - 字段名: responses,数据类型: 字符串序列 数据拆分: - 拆分名称: train,字节数: 24648065088,样本数: 2197741 下载大小: 11378805765 数据集占用大小: 24648065088 配置项: - 配置名称: default,数据文件: - 拆分: train,路径: data/train-* --- # II-Medical——医疗推理监督微调(Supervised Fine-Tuning)数据集 ## 数据集描述 本数据集面向监督微调任务提供医疗推理相关数据,涵盖带有详细推理回复的医疗问题,此类回复可展示逐步推演的医学思维过程。本项目基于原始数据集开发,在此感谢原始数据集来源:https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT。我们仅对原始数据进行了小幅修改与筛选,并为模型推理内容添加了<think>...</think>标记进行包裹。 ## 数据字段 - `question`: 医疗问题 - `answer`: 最终答案(从推理过程中提取) - `responses`: 包含思维链(Chain-of-Thought)的详细推理回复 ## 数据拆分 - `train`: 带有推理回复的训练数据集 ## 使用方法 python from datasets import load_dataset dataset = load_dataset("OctoMed/II-Medical-SFT") ## 引用 若本数据集对你的研究有所帮助,请引用以下文献: @article{ossowski2025octomed, title={OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning}, author={Ossowski, Timothy and Zhang, Sheng and Liu, Qianchu and Qin, Guanghui and Tan, Reuben and Naumann, Tristan and Hu, Junjie and Poon, Hoifung}, journal={arXiv preprint arXiv:2511.23269}, year={2025} }
提供机构:
OctoMed
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作