RLVR-IFeval
收藏魔搭社区2026-04-30 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/LLM-Research/RLVR-IFeval
下载链接
链接失效反馈官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# IF Data - RLVR Formatted
This dataset contains instruction following data formatted for use with [open-instruct](https://github.com/allenai/open-instruct) - specifically reinforcement learning with verifiable rewards.
Prompts with verifiable constraints generated by sampling from the [Tulu 2 SFT mixture](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture) and randomly adding constraints from [IFEval](https://github.com/Rohan2002/IFEval).
Part of the Tulu 3 release, for which you can see models [here](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5) and datasets [here](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372).
## Dataset Structure
Each example in the dataset contains the standard instruction-tuning data points as follow:
- messages (list): inputs used to prompt the model (after chat template formatting).
- ground_truth (str): the arguments to be passed to the verifying function, as a json blob.
- dataset (str): the dataset the sample belongs to.
- constraint_type (str): the constraint present in the prompt.
- constraint (str): the constraint described in plain english.
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3横幅" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# IF数据集——RLVR(Reinforcement Learning with Verifiable Rewards)格式化版本
本数据集包含适配[open-instruct](https://github.com/allenai/open-instruct)的指令跟随数据,专门用于带有可验证奖励的强化学习场景。
本数据集的带可验证约束的提示词,通过从[Tulu 2 SFT混合数据集(Tulu 2 SFT mixture)](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture)中采样,并随机添加来自[IFEval](https://github.com/Rohan2002/IFEval)的约束生成而来。
本数据集属于Tulu 3发布套件的一部分,您可通过[此处](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5)查看其配套模型,通过[此处](https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372)查看其配套数据集。
## 数据集结构
数据集中的每个样本均包含标准的指令微调数据点,具体格式如下:
- messages(列表类型):用于提示模型的输入内容(已完成对话模板格式化)。
- ground_truth(字符串类型):需传入验证函数的参数,为JSON blob格式的字符串。
- dataset(字符串类型):该样本所属的数据集。
- constraint_type(字符串类型):提示词中包含的约束类型。
- constraint(字符串类型):以自然语言描述的约束内容。
提供机构:
maas
创建时间:
2024-11-23



