II-Medical-RL
收藏魔搭社区2025-12-05 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/Intelligent-Internet/II-Medical-RL
下载链接
链接失效反馈官方服务:
资源简介:
# Overview
The MedReason-RL dataset is a refined version of the original [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason), specifically curated for training reinforcement learning (RL) models to enhance reasoning abilities. It has been proven to be the best dataset for improving model reasoning through RL training.
# Source
This dataset is derived from the original MedReason dataset, which focuses on medical reasoning tasks. However, the original dataset contained significant overlap with benchmarking datasets, necessitating decontamination.
# Data Decontamination
To ensure the integrity and reliability of the dataset for RL training, a rigorous two-step decontamination process was applied:
## 8-grams Decontamination
Followed the open-r1 methodology to identify and eliminate overlap with evaluation datasets using 8-gram sequences.
This step ensures that the dataset does not contain sequences that could bias evaluation results.
## Fuzzy Decontamination
Applied the s1k method with a stringent 80% similarity threshold to further remove any near-duplicate or highly similar samples.
This additional step guarantees minimal overlap with evaluation datasets, preserving the dataset's purity.
# Conclusion
The decontamination process ensures that the MedReason-RL dataset is free from contamination by benchmarking datasets, making it a reliable and high-quality resource for training RL models to improve reasoning abilities in medical contexts.
# 概述
MedReason-RL 数据集是原始 **MedReason 数据集(MedReason dataset)**([https://huggingface.co/datasets/UCSC-VLAA/MedReason](https://huggingface.co/datasets/UCSC-VLAA/MedReason))的精炼版本,专为强化学习(Reinforcement Learning, RL)模型的训练打造,以提升模型的推理能力。现有研究已证实,该数据集是通过强化学习训练提升模型推理能力的最优数据集。
# 数据集来源
本数据集源自专注于医学推理任务的原始MedReason数据集。但原始数据集与基准评测数据集(benchmarking datasets)存在大量重叠,因此需要进行数据净化处理。
# 数据净化流程
为确保强化学习训练所用数据集的完整性与可靠性,研究人员采用了严格的两步净化流程:
## 8-gram 去重净化
遵循 open-r1 方法学,通过8元组序列识别并移除与评测数据集(evaluation datasets)的重叠内容。此步骤可确保数据集不含可能导致评测结果偏倚的序列。
## 模糊去重净化
采用 s1k 方法,设置严格的80%相似度阈值,进一步移除近乎重复或高度相似的样本。该额外步骤可保证数据集与评测数据集的重叠度极低,确保数据集的纯净性。
# 结论
本次数据净化流程确保 MedReason-RL 数据集未受基准评测数据集的污染,使其成为在医学场景下训练强化学习模型以提升推理能力的可靠且高质量的资源。
提供机构:
maas
创建时间:
2025-07-04



