Multi-subject-RLVR
收藏魔搭社区2025-12-04 更新2025-04-05 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Multi-subject-RLVR
下载链接
链接失效反馈官方服务:
资源简介:
Multi-subject data for paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
we use a multi-subject multiple-choice QA dataset ExamQA (Yu et al., 2021).
Originally written in Chinese, ExamQA covers at least 48 first-level subjects.
We remove the distractors and convert each instance into a free-form QA pair.
This dataset consists of 638k college-level instances, with both questions and objective answers written by domain experts for examination purposes.
We also use GPT-4o-mini to translate questions and options into English.
For evaluation, we randomly sample 6,000 questions from ExamQA as the test set, while the remaining questions are used as the training pool.
Since subject labels are not provided for each QA pair, we use GPT-4o-mini to classify them into one of 48 subjects or mark them as unclassified if uncertain.
Excluding unclassified instances (15.8% of the test data), the most frequent subjects include basic medicine, law, economics, management, civil engineering, mathematics, computer science and technology, psychology, and
chemistry.
For ease of analysis, we further categorize these subjects into four broad fields (STEM, social sciences, humanities, and applied sciences).
## Citation
```bibtex
@article{su2025expanding,
title={Expanding RL with Verifiable Rewards Across Diverse Domains},
author={Su, Yi and Yu, Dian and Song, Linfeng and Li, Juntao and Mi, Haitao and Tu, Zhaopeng and Zhang, Min and Yu, Dong},
journal={arXiv preprint arXiv:2503.23829},
year={2025}
}
```
本数据集对应论文《Expanding RL with Verifiable Rewards Across Diverse Domains》(中文译名为《面向多元领域的可验证奖励强化学习(Reinforcement Learning,RL)扩展》)。我们采用了多主题多项选择问答(Question Answering,QA)数据集ExamQA(Yu等,2021)。该数据集最初以中文编写,涵盖至少48个一级学科。我们移除了干扰选项,并将每条样本转换为自由格式问答对。该数据集共包含63.8万个大学层级的样本,其中问题与标准答案均由对应领域的专家为考试场景撰写。我们还使用GPT-4o-mini将问题与选项翻译成英文。在评估环节,我们从ExamQA中随机抽取6000条问题作为测试集,剩余问题则作为训练池。由于每条问答对未附带学科标签,我们使用GPT-4o-mini将其分类至48个学科之一;若分类置信度不足,则标记为未分类。剔除未分类样本(占测试集的15.8%)后,占比最高的学科包括基础医学、法学、经济学、管理学、土木工程、数学、计算机科学与技术、心理学与化学。为便于分析,我们进一步将这些学科划分为四大领域:科学、技术、工程与数学(STEM)、社会科学、人文科学与应用科学。
## 引用
bibtex
@article{su2025expanding,
title={Expanding RL with Verifiable Rewards Across Diverse Domains},
author={Su, Yi and Yu, Dian and Song, Linfeng and Li, Juntao and Mi, Haitao and Tu, Zhaopeng and Zhang, Min and Yu, Dong},
journal={arXiv preprint arXiv:2503.23829},
year={2025}
}
提供机构:
maas
创建时间:
2025-04-02



