RLVR-GSM

Name: RLVR-GSM
Creator: maas
Published: 2026-04-28 16:18:04
License: 暂无描述

魔搭社区2026-04-28 更新2024-11-30 收录

下载链接：

https://modelscope.cn/datasets/LLM-Research/RLVR-GSM

下载链接

链接失效反馈

官方服务：

资源简介：

<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> # GSM8k Data - RLVR Formatted This dataset contains the GSM8k dataset formatted for use with [open-instruct](https://github.com/allenai/open-instruct) - specifically reinforcement learning with verifiable rewards. Part of the Tulu 3 release, for which you can see models [here](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5) and datasets [here](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372). ## Dataset Structure Each example in the dataset contains the standard instruction-tuning data points as follow: - messages (list): inputs used to prompt the model (after chat template formatting). - ground_truth (str): the answer for the given sample. - dataset (str): the name of the dataset, which determines which verifiable function is used.

# GSM8K数据集——RLVR格式化版本 <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> 本数据集为适配[open-instruct](https://github.com/allenai/open-instruct)框架的格式化版本GSM8K数据集，专为带可验证奖励的强化学习（Reinforcement Learning with Verifiable Rewards，RLVR）场景设计。本数据集为Tulu 3发布套件的组成部分，您可通过以下链接查看该套件对应的模型[此处](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5)与数据集[此处](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372)。 ## 数据集结构数据集中的每个样本均包含标准的指令微调数据点，格式如下： - messages（列表）：用于提示模型的输入（已完成对话模板格式化） - ground_truth（字符串）：对应样本的标准答案 - dataset（字符串）：数据集名称，用于确定所使用的可验证奖励函数

提供机构：

maas

创建时间：

2024-11-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集