医疗循证DPO数据集
收藏魔搭社区2026-05-23 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/modelzhang/medical_evidence_DPO
下载链接
链接失效反馈官方服务:
资源简介:
# Medical Evidence DPO Dataset
## 简介
Medical Evidence DPO Dataset 是一个面向医学领域的中文偏好对齐数据集,专为 Direct Preference Optimization (DPO) 训练设计。该数据集包含医学问答三元组(问题、高质量回答、低质量回答),可用于微调语言模型以提升其医学领域的回答质量。
## 数据集组成
| 数据集名称 | 分割 | 样本数 | 说明 |
|-----------|------|--------|------|
| medical_evidence_dpo | train | ~1400 | DPO 训练数据 |
## 数据格式
每条数据包含三个字段:
- **prompt** (string): 医学问题或查询
- **chosen** (string): 高质量回答(作为偏好目标)
- **rejected** (string): 低质量回答(作为负样本)
### 数据示例
```json
{
"prompt": "在阿尔茨海默病与溃疡性结肠炎患者中,PPARG 和 NOS2 作为共同基因,是否通过调控巨噬细胞和小胶质细胞极化参与疾病的发生发展?",
"chosen": "从目前的人类与动物实验证据来看,PPARG 和 NOS2 很有可能作为共同炎症枢纽基因,通过调控巨噬细胞/小胶质细胞的极化状态参与阿尔茨海默病和溃疡性结肠炎的发生发展...",
"rejected": "这是一个非常具体且专业的问题,涉及到两种疾病的共同机制。根据现有的生物医学研究,我们可以进行一个基于科学逻辑的推理和分析..."
}
```
## 主题覆盖
数据集涵盖多个医学主题领域,包括:
- **临床指南与治疗管理**:肝硬化、川崎病、器官移植等疾病的循证治疗
- **疾病机制与病理生理**:肿瘤耐药机制、神经退行性疾病机制
- **药物疗法与药理学**:免疫抑制剂、化疗药物作用机制
- **医学诊断与检验**:生物标志物解读、影像学分析
- **医学研究方法**:临床研究设计、meta分析解读
## 用途
本数据集主要用于:
1. **DPO (Direct Preference Optimization)**:训练语言模型偏好对齐
2. **RLHF 训练**:作为偏好反馈数据用于人类反馈强化学习
3. **医学问答模型微调**:提升模型在医学领域的回答质量
4. **医学知识评估**:评估模型在医学领域的知识水平和推理能力
## 使用方法
### 使用 Hugging Face Datasets 加载
```python
from datasets import load_dataset
# 加载数据集
dataset = load_dataset("path/to/medical_evidence_DPO.py", name="medical_evidence_dpo")
# 查看数据
print(dataset["train"][0])
```
### DPO 训练示例
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import DPOTrainer
import torch
# 加载模型和分词器
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# 准备数据
def format_dpo_sample(sample):
return {
"prompt": sample["prompt"],
"chosen": sample["chosen"],
"rejected": sample["rejected"]
}
train_dataset = dataset["train"].map(format_dpo_sample)
# DPO 训练
dpo_trainer = DPOTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
beta=0.1,
)
dpo_trainer.train()
```
## 数据来源
本数据集基于医学领域专家撰写的循证医学问答内容构建,每个问题都经过专业筛选,回答内容参考最新临床指南和医学文献。
## 许可证
本数据集采用MIT许可证。
## 引用
```bibtex
@misc{medical_evidence_dpo,
title = {Medical Evidence DPO Dataset},
author = {Medical Evidence Team},
year = {2026},
url = {https://modelscope.cn/datasets/modelzhang/medical_evidence_DPO}
}
```
## 注意事项
1. 模型输出的医学建议仅供参考,不能替代专业医疗诊断
2. 使用前请确保了解数据的内容和局限性
# DPO的Agent的训练数据集
filter_dpo_dataset.jsonl
# Medical Evidence DPO Dataset
## Introduction
Medical Evidence DPO Dataset is a Chinese preference alignment dataset tailored for the medical domain, specifically designed for Direct Preference Optimization (DPO) training. This dataset contains medical question-answer triplets (question, high-quality answer, low-quality answer), which can be used to fine-tune language models to improve their answer quality in the medical field.
## Dataset Composition
| Dataset Name | Split | Number of Samples | Description |
|--------------|-------|-------------------|-------------|
| medical_evidence_dpo | train | ~1400 | DPO training data |
## Data Format
Each sample contains three fields:
- **prompt** (string): Medical question or query
- **chosen** (string): High-quality answer (as the preference target)
- **rejected** (string): Low-quality answer (as the negative sample)
### Sample Data
json
{
"prompt": "在阿尔茨海默病与溃疡性结肠炎患者中,PPARG 和 NOS2 作为共同基因,是否通过调控巨噬细胞和小胶质细胞极化参与疾病的发生发展?",
"chosen": "从目前的人类与动物实验证据来看,PPARG 和 NOS2 很有可能作为共同炎症枢纽基因,通过调控巨噬细胞/小胶质细胞的极化状态参与阿尔茨海默病和溃疡性结肠炎的发生发展...",
"rejected": "这是一个非常具体且专业的问题,涉及到两种疾病的共同机制。根据现有的生物医学研究,我们可以进行一个基于科学逻辑的推理和分析..."
}
## Topic Coverage
This dataset covers multiple medical thematic domains, including:
- **Clinical Guidelines and Treatment Management**: Evidence-based treatment for diseases such as cirrhosis, Kawasaki disease, organ transplantation, etc.
- **Disease Mechanisms and Pathophysiology**: Tumor drug resistance mechanisms, neurodegenerative disease mechanisms
- **Drug Therapy and Pharmacology**: Mechanisms of action of immunosuppressants, chemotherapeutic drugs
- **Medical Diagnosis and Laboratory Testing**: Biomarker interpretation, imaging analysis
- **Medical Research Methods**: Clinical study design, meta-analysis interpretation
## Intended Uses
This dataset is primarily used for:
1. **Direct Preference Optimization (DPO)**: Aligning language model preferences via DPO training
2. **RLHF Training**: Serving as preference feedback data for Reinforcement Learning from Human Feedback (RLHF)
3. **Medical Question-Answering Model Fine-Tuning**: Improving the model's answer quality in the medical domain
4. **Medical Knowledge Evaluation**: Assessing the model's medical knowledge level and reasoning ability
## Usage Instructions
### Loading with Hugging Face Datasets
python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("path/to/medical_evidence_DPO.py", name="medical_evidence_dpo")
# Inspect a sample
print(dataset["train"][0])
### DPO Training Example
python
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import DPOTrainer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Prepare training data
def format_dpo_sample(sample):
return {
"prompt": sample["prompt"],
"chosen": sample["chosen"],
"rejected": sample["rejected"]
}
train_dataset = dataset["train"].map(format_dpo_sample)
# Perform DPO training
dpo_trainer = DPOTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
beta=0.1,
)
dpo_trainer.train()
## Data Source
This dataset is constructed based on evidence-based medical question-answer content written by medical experts. Each question has undergone professional screening, and the answers reference the latest clinical guidelines and medical literature.
## License
This dataset is licensed under the MIT License.
## Citation
bibtex
@misc{medical_evidence_dpo,
title = {Medical Evidence DPO Dataset},
author = {Medical Evidence Team},
year = {2026},
url = {https://modelscope.cn/datasets/modelzhang/medical_evidence_DPO}
}
## Notes
1. Medical advice provided by the model is for reference only and cannot replace professional medical diagnosis
2. Please ensure you understand the content and limitations of the dataset before use
# DPO Agent Training Dataset filter_dpo_dataset.jsonl
提供机构:
maas
创建时间:
2026-01-24
搜集汇总
数据集介绍

背景与挑战
背景概述
医疗循证DPO数据集是一个专为医疗领域设计的中文偏好对齐数据集,包含约1400个训练样本,每个样本由医学问题、高质量答案和低质量答案组成。该数据集基于医学专家的循证内容构建,覆盖临床指南、疾病机制等主题,适用于DPO训练、RLHF和医疗问答模型优化。
以上内容由遇见数据集搜集并总结生成



