five

Medical-Reasoning-SFT-Baichuan-M3-235B

收藏
魔搭社区2026-05-21 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B
下载链接
链接失效反馈
官方服务:
资源简介:
# Medical-Reasoning-SFT-Baichuan-M3-235B A large-scale medical reasoning dataset generated using [baichuan-inc/Baichuan-M3-235B](https://huggingface.co/baichuan-inc/Baichuan-M3-235B), containing over 124,000 samples with detailed chain-of-thought reasoning for medical and healthcare questions. **Baichuan-M3-235B is ranked #1 on HealthBench Total leaderboard and achieves state-of-the-art performance on medical reasoning benchmarks.** ## Dataset Overview | Metric | Value | |--------|-------| | **Model** | baichuan-inc/Baichuan-M3-235B | | **Total Samples** | 124,520 | | **Samples with Reasoning** | 124,520 (100%) | | **Estimated Tokens** | ~255 Million | | **Content Tokens** | ~160 Million | | **Reasoning Tokens** | ~95 Million | | **Language** | English | ## Why Baichuan-M3-235B? Baichuan-M3-235B is a purpose-built medical AI model with exceptional health evaluation results: ### HealthBench Performance - **#1 on HealthBench Total Leaderboard** - Top-ranked model globally - **HealthBench-Hard: 44.4%** - A 28-point improvement over M2, surpassing GPT-5.2 - **Industry-lowest hallucination rate: 3.5%** - Achieved through innovative Fact-Aware RL training ### Clinical Benchmarks - **SCAN-Bench: First Place** - Across all three dimensions: - Clinical Inquiry - Lab Testing - Final Diagnosis - **SPAR Algorithm** - Segmented Pipeline Reinforcement Learning specifically designed for clinical decision-making ### Model Architecture - **Parameters**: 235B - **Base**: Qwen3-235B-A22B - **License**: Apache 2.0 ## Schema Each sample follows the conversational messages format with reasoning content: ```json { "messages": [ { "role": "system", "content": "You are a medical expert...", "reasoning_content": null }, { "role": "user", "content": "What are the symptoms of diabetes?", "reasoning_content": null }, { "role": "assistant", "content": "The main symptoms of diabetes include...", "reasoning_content": "Let me think through this systematically. Diabetes affects blood sugar regulation, so I should consider symptoms related to hyperglycemia..." } ] } ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `messages` | list | Array of message objects in the conversation | | `messages[].role` | string | Either "system", "user", or "assistant" | | `messages[].content` | string | The main message content | | `messages[].reasoning_content` | string or null | Chain-of-thought reasoning (assistant messages only) | ## Usage ### Loading with Datasets Library ```python from datasets import load_dataset dataset = load_dataset("OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B") ``` ### Accessing Samples ```python # Get a sample sample = dataset['train'][0] # Access messages for msg in sample['messages']: print(f"Role: {msg['role']}") print(f"Content: {msg['content'][:100]}...") if msg['reasoning_content']: print(f"Reasoning: {msg['reasoning_content'][:100]}...") ``` ### Filtering by Reasoning ```python # Get samples with reasoning content samples_with_reasoning = dataset['train'].filter( lambda x: x['messages'][-1]['reasoning_content'] is not None ) ``` ## Intended Use This dataset is designed for: - **Fine-tuning medical reasoning models**: Train LLMs to provide detailed, step-by-step medical reasoning - **Chain-of-thought training**: Develop models that show their thinking process - **Medical QA systems**: Build question-answering systems for healthcare applications - **Research**: Study reasoning patterns in medical domain AI ## Limitations and Considerations - This dataset is generated by an AI model and should not be used as a substitute for professional medical advice - Responses may contain inaccuracies and should be validated by medical professionals - Not intended for clinical decision-making without expert review - The reasoning traces reflect the model's approach, not necessarily optimal clinical reasoning ## Citation If you use this dataset, please cite: ```bibtex @dataset{medical_reasoning_sft_baichuan_m3_235b, title={Medical-Reasoning-SFT-Baichuan-M3-235B}, author={OpenMed}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/datasets/OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B} } ``` ## License Apache 2.0

# 医疗推理SFT-Baichuan-M3-235B 本数据集为基于[baichuan-inc/Baichuan-M3-235B](https://huggingface.co/baichuan-inc/Baichuan-M3-235B)生成的大规模医疗推理数据集,包含超过12.4万条针对医疗健康问题的、带有详细思维链(Chain-of-Thought)推理过程的样本。 **Baichuan-M3-235B在HealthBench总排行榜中位列第一,并在医疗推理基准测试中取得了当前最优(state-of-the-art)性能。** ## 数据集概览 | 指标 | 数值 | |--------|-------| | **模型** | baichuan-inc/Baichuan-M3-235B | | **总样本数** | 124,520 | | **带推理过程的样本数** | 124,520 (100%) | | **估算Token数** | ~2.55亿 | | **内容Token数** | ~1.60亿 | | **推理Token数** | ~9500万 | | **语言** | 英语 | ## 为何选择Baichuan-M3-235B? Baichuan-M3-235B是一款专为医疗场景打造的AI模型,具备卓越的健康评估性能: ### HealthBench性能表现 - **HealthBench总排行榜全球第一** - 全球排名最高的医疗AI模型 - **HealthBench-Hard任务准确率:44.4%** - 较M2提升28个百分点,超越GPT-5.2 - **行业最低幻觉率(hallucination rate):3.5%** - 通过创新性的事实感知强化学习(Fact-Aware RL)训练实现 ### 临床基准测试 - **SCAN-Bench排名第一** - 覆盖三大维度: - 临床问诊 - 实验室检测 - 最终诊断 - **SPAR算法**:专为临床决策设计的分段流水线强化学习(Segmented Pipeline Reinforcement Learning) ### 模型架构 - **参数量**:2350亿 - **基础模型**:Qwen3-235B-A22B - **许可证**:Apache 2.0 ## 数据格式规范 每条样本采用带推理内容的对话消息格式: json { "messages": [ { "role": "system", "content": "您是一名医疗专家……", "reasoning_content": null }, { "role": "user", "content": "糖尿病的症状有哪些?", "reasoning_content": null }, { "role": "assistant", "content": "糖尿病的主要症状包括……", "reasoning_content": "让我系统地梳理一下思路。糖尿病会影响血糖调节,因此我需要考虑与高血糖相关的症状……" } ] } ### 字段说明 | 字段 | 类型 | 描述 | |-------|------|-------------| | `messages` | 列表 | 对话消息对象数组 | | `messages[].role` | 字符串 | 角色,可选值为"system"、"user"或"assistant" | | `messages[].content` | 字符串 | 消息主体内容 | | `messages[].reasoning_content` | 字符串或空 | 思维链推理过程(仅助手消息包含此字段) | ## 使用方法 ### 使用Datasets库加载数据集 python from datasets import load_dataset dataset = load_dataset("OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B") ### 获取样本 python # 获取单条样本 sample = dataset['train'][0] # 遍历对话消息 for msg in sample['messages']: print(f"角色:{msg['role']}") print(f"内容:{msg['content'][:100]}……") if msg['reasoning_content']: print(f"推理过程:{msg['reasoning_content'][:100]}……") ### 按推理过程过滤样本 python # 获取带有推理过程的样本 samples_with_reasoning = dataset['train'].filter( lambda x: x['messages'][-1]['reasoning_content'] is not None ) ## 预期用途 本数据集适用于: - **医疗推理模型微调**:训练大语言模型以生成详细的分步医疗推理过程 - **思维链训练**:开发可展示自身思考过程的AI模型 - **医疗问答系统**:构建面向医疗健康场景的问答系统 - **学术研究**:探究医疗领域AI的推理模式与规律 ## 局限性与注意事项 - 本数据集由AI模型生成,不得替代专业医疗建议 - 生成内容可能存在不准确之处,需由医疗专业人员验证 - 未经专家审核,不得用于临床决策 - 推理轨迹仅反映模型自身的推理路径,未必代表最优临床推理逻辑 ## 引用方式 若您使用本数据集,请引用如下文献: bibtex @dataset{medical_reasoning_sft_baichuan_m3_235b, title={Medical-Reasoning-SFT-Baichuan-M3-235B}, author={OpenMed}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/datasets/OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B} } ## 许可证 Apache 2.0
提供机构:
maas
创建时间:
2026-02-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作