five

RLVR-GSM-MATH-IF-Mixed-Constraints

收藏
魔搭社区2026-01-06 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
下载链接
链接失效反馈
官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> # GSM/MATH/IF Data - RLVR Formatted *Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data.* This dataset contains data formatted for use with [open-instruct](https://github.com/allenai/open-instruct) - specifically reinforcement learning with verifiable rewards. It was used to train the final Tulu 3 models with RL, and contains the following subsets: - **GSM8k** (7,473 samples): The [GSM8k train set](https://huggingface.co/datasets/openai/gsm8k) formatted for use with RLVR and open-instruct. MIT License. - **MATH** (7,500 samples): The [MATH train set](https://github.com/hendrycks/math) formatted for use with RLVR and open-instruct. MIT License. - **IF Prompts** (14,973 samples): Prompts with verifiable constraints generated by sampling from the [Tulu 2 SFT mixture](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture) and randomly adding constraints from [IFEval](https://github.com/Rohan2002/IFEval). ODC-BY license. Part of the Tulu 3 release, for which you can see models [here](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5) and datasets [here](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372). ## Dataset Structure Each example in the dataset contains the standard instruction-tuning data points as follow: - messages (list): inputs used to prompt the model (after chat template formatting). - ground_truth (str): the answer for the given sample. - dataset (str): For GSM8k and MATH, the answer to the question. For IF prompts, the arguments to be passed to the verifying function, as a json blob. - constraint_type (str): the constraint present in the prompt. - constraint (str): the constraint described in plain english. ## Citation ``` @misc{lambert2024tulu3pushingfrontiers, title={Tulu 3: Pushing Frontiers in Open Language Model Post-Training}, author={Nathan Lambert and Jacob Morrison and Valentina Pyatkin and Shengyi Huang and Hamish Ivison and Faeze Brahman and Lester James V. Miranda and Alisa Liu and Nouha Dziri and Shane Lyu and Yuling Gu and Saumya Malik and Victoria Graf and Jena D. Hwang and Jiangjiang Yang and Ronan Le Bras and Oyvind Tafjord and Chris Wilhelm and Luca Soldaini and Noah A. Smith and Yizhong Wang and Pradeep Dasigi and Hannaneh Hajishirzi}, year={2024}, eprint={2411.15124}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.15124}, } ```

<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> # GSM/MATH/IF 数据——RLVR 格式化版本 *请注意,本数据集整体采用ODC-BY-1.0许可证授权,数据集各子集可能适用不同的许可协议。* 本数据集为适配[open-instruct](https://github.com/allenai/open-instruct)框架(即**可验证奖励强化学习(Reinforcement Learning with Verifiable Rewards, RLVR)**)而格式化制作,用于通过强化学习训练最终的Tulu 3模型,包含以下子集: - **GSM8K(7473条样本)**:为适配RLVR与open-instruct框架格式化后的[GSM8K训练集](https://huggingface.co/datasets/openai/gsm8k),采用MIT许可证。 - **MATH(7500条样本)**:为适配RLVR与open-instruct框架格式化后的[MATH训练集](https://github.com/hendrycks/math),采用MIT许可证。 - **IF提示(14973条样本)**:通过从[Tulu 2 SFT混合数据集](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture)中采样,并随机添加来自[IFEval](https://github.com/Rohan2002/IFEval)的约束条件生成的带有可验证约束的提示文本,采用ODC-BY许可证。 本数据集属于Tulu 3发布套件,相关模型可查阅[此处](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5),数据集集合可查阅[此处](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372)。 ## 数据集结构 数据集中每条样本均包含标准的指令微调数据字段,具体如下: - `messages`(列表类型):经对话模板格式化后用于提示模型的输入序列。 - `ground_truth`(字符串类型):对应样本的标准答案。 - `dataset`(字符串类型):对于GSM8K与MATH子集,该字段存储问题的答案;对于IF提示子集,该字段为待传入验证函数的参数,以JSON字符串形式存储。 - `constraint_type`(字符串类型):提示文本中包含的约束类型。 - `constraint`(字符串类型):以自然语言描述的约束内容。 ## 引用 @misc{lambert2024tulu3pushingfrontiers, title={Tulu 3:开拓开源大语言模型后训练前沿}, author={Nathan Lambert and Jacob Morrison and Valentina Pyatkin and Shengyi Huang and Hamish Ivison and Faeze Brahman and Lester James V. Miranda and Alisa Liu and Nouha Dziri and Shane Lyu and Yuling Gu and Saumya Malik and Victoria Graf and Jena D. Hwang and Jiangjiang Yang and Ronan Le Bras and Oyvind Tafjord and Chris Wilhelm and Luca Soldaini and Noah A. Smith and Yizhong Wang and Pradeep Dasigi and Hannaneh Hajishirzi}, year={2024}, eprint={2411.15124}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.15124}, }
提供机构:
maas
创建时间:
2025-05-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作