Medical-Reasoning-SFT-Baichuan-M3-235B
收藏魔搭社区2026-05-21 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B
下载链接
链接失效反馈官方服务:
资源简介:
# Medical-Reasoning-SFT-Baichuan-M3-235B
A large-scale medical reasoning dataset generated using [baichuan-inc/Baichuan-M3-235B](https://huggingface.co/baichuan-inc/Baichuan-M3-235B), containing over 124,000 samples with detailed chain-of-thought reasoning for medical and healthcare questions.
**Baichuan-M3-235B is ranked #1 on HealthBench Total leaderboard and achieves state-of-the-art performance on medical reasoning benchmarks.**
## Dataset Overview
| Metric | Value |
|--------|-------|
| **Model** | baichuan-inc/Baichuan-M3-235B |
| **Total Samples** | 124,520 |
| **Samples with Reasoning** | 124,520 (100%) |
| **Estimated Tokens** | ~255 Million |
| **Content Tokens** | ~160 Million |
| **Reasoning Tokens** | ~95 Million |
| **Language** | English |
## Why Baichuan-M3-235B?
Baichuan-M3-235B is a purpose-built medical AI model with exceptional health evaluation results:
### HealthBench Performance
- **#1 on HealthBench Total Leaderboard** - Top-ranked model globally
- **HealthBench-Hard: 44.4%** - A 28-point improvement over M2, surpassing GPT-5.2
- **Industry-lowest hallucination rate: 3.5%** - Achieved through innovative Fact-Aware RL training
### Clinical Benchmarks
- **SCAN-Bench: First Place** - Across all three dimensions:
- Clinical Inquiry
- Lab Testing
- Final Diagnosis
- **SPAR Algorithm** - Segmented Pipeline Reinforcement Learning specifically designed for clinical decision-making
### Model Architecture
- **Parameters**: 235B
- **Base**: Qwen3-235B-A22B
- **License**: Apache 2.0
## Schema
Each sample follows the conversational messages format with reasoning content:
```json
{
"messages": [
{
"role": "system",
"content": "You are a medical expert...",
"reasoning_content": null
},
{
"role": "user",
"content": "What are the symptoms of diabetes?",
"reasoning_content": null
},
{
"role": "assistant",
"content": "The main symptoms of diabetes include...",
"reasoning_content": "Let me think through this systematically. Diabetes affects blood sugar regulation, so I should consider symptoms related to hyperglycemia..."
}
]
}
```
### Fields
| Field | Type | Description |
|-------|------|-------------|
| `messages` | list | Array of message objects in the conversation |
| `messages[].role` | string | Either "system", "user", or "assistant" |
| `messages[].content` | string | The main message content |
| `messages[].reasoning_content` | string or null | Chain-of-thought reasoning (assistant messages only) |
## Usage
### Loading with Datasets Library
```python
from datasets import load_dataset
dataset = load_dataset("OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B")
```
### Accessing Samples
```python
# Get a sample
sample = dataset['train'][0]
# Access messages
for msg in sample['messages']:
print(f"Role: {msg['role']}")
print(f"Content: {msg['content'][:100]}...")
if msg['reasoning_content']:
print(f"Reasoning: {msg['reasoning_content'][:100]}...")
```
### Filtering by Reasoning
```python
# Get samples with reasoning content
samples_with_reasoning = dataset['train'].filter(
lambda x: x['messages'][-1]['reasoning_content'] is not None
)
```
## Intended Use
This dataset is designed for:
- **Fine-tuning medical reasoning models**: Train LLMs to provide detailed, step-by-step medical reasoning
- **Chain-of-thought training**: Develop models that show their thinking process
- **Medical QA systems**: Build question-answering systems for healthcare applications
- **Research**: Study reasoning patterns in medical domain AI
## Limitations and Considerations
- This dataset is generated by an AI model and should not be used as a substitute for professional medical advice
- Responses may contain inaccuracies and should be validated by medical professionals
- Not intended for clinical decision-making without expert review
- The reasoning traces reflect the model's approach, not necessarily optimal clinical reasoning
## Citation
If you use this dataset, please cite:
```bibtex
@dataset{medical_reasoning_sft_baichuan_m3_235b,
title={Medical-Reasoning-SFT-Baichuan-M3-235B},
author={OpenMed},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/datasets/OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B}
}
```
## License
Apache 2.0
# 医疗推理SFT-Baichuan-M3-235B
本数据集为基于[baichuan-inc/Baichuan-M3-235B](https://huggingface.co/baichuan-inc/Baichuan-M3-235B)生成的大规模医疗推理数据集,包含超过12.4万条针对医疗健康问题的、带有详细思维链(Chain-of-Thought)推理过程的样本。
**Baichuan-M3-235B在HealthBench总排行榜中位列第一,并在医疗推理基准测试中取得了当前最优(state-of-the-art)性能。**
## 数据集概览
| 指标 | 数值 |
|--------|-------|
| **模型** | baichuan-inc/Baichuan-M3-235B |
| **总样本数** | 124,520 |
| **带推理过程的样本数** | 124,520 (100%) |
| **估算Token数** | ~2.55亿 |
| **内容Token数** | ~1.60亿 |
| **推理Token数** | ~9500万 |
| **语言** | 英语 |
## 为何选择Baichuan-M3-235B?
Baichuan-M3-235B是一款专为医疗场景打造的AI模型,具备卓越的健康评估性能:
### HealthBench性能表现
- **HealthBench总排行榜全球第一** - 全球排名最高的医疗AI模型
- **HealthBench-Hard任务准确率:44.4%** - 较M2提升28个百分点,超越GPT-5.2
- **行业最低幻觉率(hallucination rate):3.5%** - 通过创新性的事实感知强化学习(Fact-Aware RL)训练实现
### 临床基准测试
- **SCAN-Bench排名第一** - 覆盖三大维度:
- 临床问诊
- 实验室检测
- 最终诊断
- **SPAR算法**:专为临床决策设计的分段流水线强化学习(Segmented Pipeline Reinforcement Learning)
### 模型架构
- **参数量**:2350亿
- **基础模型**:Qwen3-235B-A22B
- **许可证**:Apache 2.0
## 数据格式规范
每条样本采用带推理内容的对话消息格式:
json
{
"messages": [
{
"role": "system",
"content": "您是一名医疗专家……",
"reasoning_content": null
},
{
"role": "user",
"content": "糖尿病的症状有哪些?",
"reasoning_content": null
},
{
"role": "assistant",
"content": "糖尿病的主要症状包括……",
"reasoning_content": "让我系统地梳理一下思路。糖尿病会影响血糖调节,因此我需要考虑与高血糖相关的症状……"
}
]
}
### 字段说明
| 字段 | 类型 | 描述 |
|-------|------|-------------|
| `messages` | 列表 | 对话消息对象数组 |
| `messages[].role` | 字符串 | 角色,可选值为"system"、"user"或"assistant" |
| `messages[].content` | 字符串 | 消息主体内容 |
| `messages[].reasoning_content` | 字符串或空 | 思维链推理过程(仅助手消息包含此字段) |
## 使用方法
### 使用Datasets库加载数据集
python
from datasets import load_dataset
dataset = load_dataset("OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B")
### 获取样本
python
# 获取单条样本
sample = dataset['train'][0]
# 遍历对话消息
for msg in sample['messages']:
print(f"角色:{msg['role']}")
print(f"内容:{msg['content'][:100]}……")
if msg['reasoning_content']:
print(f"推理过程:{msg['reasoning_content'][:100]}……")
### 按推理过程过滤样本
python
# 获取带有推理过程的样本
samples_with_reasoning = dataset['train'].filter(
lambda x: x['messages'][-1]['reasoning_content'] is not None
)
## 预期用途
本数据集适用于:
- **医疗推理模型微调**:训练大语言模型以生成详细的分步医疗推理过程
- **思维链训练**:开发可展示自身思考过程的AI模型
- **医疗问答系统**:构建面向医疗健康场景的问答系统
- **学术研究**:探究医疗领域AI的推理模式与规律
## 局限性与注意事项
- 本数据集由AI模型生成,不得替代专业医疗建议
- 生成内容可能存在不准确之处,需由医疗专业人员验证
- 未经专家审核,不得用于临床决策
- 推理轨迹仅反映模型自身的推理路径,未必代表最优临床推理逻辑
## 引用方式
若您使用本数据集,请引用如下文献:
bibtex
@dataset{medical_reasoning_sft_baichuan_m3_235b,
title={Medical-Reasoning-SFT-Baichuan-M3-235B},
author={OpenMed},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/datasets/OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B}
}
## 许可证
Apache 2.0
提供机构:
maas
创建时间:
2026-02-05



