RLVR-GSM
收藏魔搭社区2026-04-28 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/LLM-Research/RLVR-GSM
下载链接
链接失效反馈官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# GSM8k Data - RLVR Formatted
This dataset contains the GSM8k dataset formatted for use with [open-instruct](https://github.com/allenai/open-instruct) - specifically reinforcement learning with verifiable rewards.
Part of the Tulu 3 release, for which you can see models [here](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5) and datasets [here](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372).
## Dataset Structure
Each example in the dataset contains the standard instruction-tuning data points as follow:
- messages (list): inputs used to prompt the model (after chat template formatting).
- ground_truth (str): the answer for the given sample.
- dataset (str): the name of the dataset, which determines which verifiable function is used.
# GSM8K数据集——RLVR格式化版本
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
本数据集为适配[open-instruct](https://github.com/allenai/open-instruct)框架的格式化版本GSM8K数据集,专为带可验证奖励的强化学习(Reinforcement Learning with Verifiable Rewards,RLVR)场景设计。
本数据集为Tulu 3发布套件的组成部分,您可通过以下链接查看该套件对应的模型[此处](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5)与数据集[此处](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372)。
## 数据集结构
数据集中的每个样本均包含标准的指令微调数据点,格式如下:
- messages(列表):用于提示模型的输入(已完成对话模板格式化)
- ground_truth(字符串):对应样本的标准答案
- dataset(字符串):数据集名称,用于确定所使用的可验证奖励函数
提供机构:
maas
创建时间:
2024-11-23



