guru-RL-92k-extra-info-compressed

Name: guru-RL-92k-extra-info-compressed
Creator: maas
Published: 2025-12-05 16:39:04
License: 暂无描述

魔搭社区2025-12-05 更新2025-06-21 收录

下载链接：

https://modelscope.cn/datasets/LLM360/guru-RL-92k-extra-info-compressed

下载链接

链接失效反馈

官方服务：

资源简介：

# Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective ## Note for this extra-info-compressed data version! The dataset provided in this repository is specifically intended for use with the latest release of VeRL ([v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0)). Since VeRL `rl_dataset.py` processes datasets as datasets.Dataset, it is essential that **the structure of all Parquet files remains fully consistent.** This repository is designed to meet that requirement. In this repo, the structure of all Parquet files across diverse tasks has been unified by nesting all task-specific keys under the `extra_info` field. Additionally, both the `extra_info` and `reward_model` fields store compressed JSON-formatted strings to ensure the entire dataset can be efficiently stored within Parquet files. The practioner's guide to use guru dataset is: 1. If you use [Reasoning360 repo](https://github.com/LLM360/Reasoning360) (a fork of VeRL) directly, use [guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k). 2. If you use the official [VeRL](https://github.com/volcengine/verl?tab=readme-ov-file), use this [guru-RL-92k-extra-info-compressed](https://huggingface.co/datasets/LLM360/guru-RL-92k-extra-info-compressed). The reward computations(provided by [llm-reasoner](https://github.com/maitrix-org/llm-reasoners)) involve decompression and deserialization of compressed info, making it slightly slower than in the original Guru dataset. ## Dataset Description **Guru** is a curated six-domain dataset for training large language models (LLM) for complex reasoning with reinforcement learning (RL). The dataset contains 91.9K high-quality samples spanning six diverse reasoning-intensive domains, processed through a comprehensive five-stage curation pipeline to ensure both domain diversity and reward verifiability. ### Dataset Summary Guru addresses the critical need for robust cross-domain reasoning capabilities in LLMs by providing a carefully balanced collection of problems across **math, coding, science, logic, simulation, and tabular reasoning**. Each sample has been filtered for quality and equipped with automated verification mechanisms, making it ideal for RL applications. ### Key Features - **Cross-Domain Coverage**: Six reasoning domains for LLM reasoning research and skill development - **Quality Assurance**: Five-stage curation pipeline with deduplication and heuristic filtering - **RL-Ready**: Domain-specific reward functions for reliable evaluation - **Difficulty Calibration**: Samples filtered to maintain appropriate challenge levels ### Data Structure The dataset is stored in Parquet format for efficient access and processing. Each sample contains at least the following fields: 1. **data_source** - Type: String - Description: Identifier indicating the origin dataset and domain for mapping specific reward functions 2. **prompt** - Type: List of message objects - Contains: - content: The actual text content - role: Message role (e.g., "user") 3. **ability** - Type: String - Description: The primary reasoning skill tested 4. **apply_chat_template** - Type: Boolean - Description: Flag for chat formatting 5. **qwen2.5_7b_pass_rate** - Type: Float - Description: Pass rate with Qwen 2.5-7B model 6. **qwen3_30b_pass_rate** - Type: Float - Description: Pass rate with Qwen 3-30B model 7. **extra_info** - Type: Dictionary - Description: Supplementary information for reward computing - Note: Detailed structures vary from tasks 8. **reward_model** - Type: Dictionary - Contains: - ground_truth: Compressed answer/verification data - Note: Detailed structures vary from tasks ### Domains and Statistics | Domain | Datasets Included | Final Sample Count | Key Focus Areas | |--------|------------------|-------------------|-----------------| | **Math** | OR1, DAPO, DeepScaler | 54.4K | Competition problems, symbolic reasoning | | **Code** | LeetCode, TACO-Verified, PrimeIntellect, LiveCodeBench | 18.1K | Programming challenges, algorithm design | | **Science** | WebInstruct-Verified | 3.6K | University/PhD-level physics, chemistry, biology | | **Logic** | ARC-AGI, BARC, Custom puzzles | 6.3K | Symbolic reasoning, constraint satisfaction | | **Simulation** | Code I/O (PyEdu) | 3.7K | Code behavior prediction without execution | | **Table** | HiTab, MultiHierTT | 6.1K | Single and multi-table reasoning | **Total Samples**: 91.9K (filtered from 684.3K raw samples) ### Dataset Sources | Domain | Dataset | Source | |--------|---------|--------| | **Math** | OR1 | [Skywork-OR1 (2025)](https://github.com/SkyworkAI/Skywork-O1-Open) | | | DAPO | [DAPO Dataset](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) | | | DeepScaler | [DeepScaleR Dataset](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) | | **Code** | LeetCode | [LeetCode Dataset](https://huggingface.co/datasets/greengerong/leetcode) | | | TACO-Verified | [TACO Dataset](https://huggingface.co/datasets/BAAI/TACO) | | | PrimeIntellect | [PrimeIntellect Dataset](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-1) | | | LiveCodeBench (history) | [LiveCodeBench](https://github.com/LiveCodeBench/LiveCodeBench) | | **Science** | WebInstruct-Verified | [WebInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified) | | **Logic** | Zebra Puzzle | - | | | Ordering Puzzle | - | | | Graph Puzzle | - | | | ARC-AGI-1/2 | [ARC-AGI Dataset](https://arcprize.org/arc-agi) | | | BARC | [BARC Dataset](https://huggingface.co/barc0) | | **Simulation** | Code I/O (PyEdu) | [CodeIO-PyEdu Dataset](https://huggingface.co/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning) | | **Table** | HiTab | [HiTab Dataset](https://github.com/microsoft/HiTab) | | | MultiHierTT | [MultiHierTT Dataset](https://github.com/psunlpgroup/MultiHiertt) | ## Citation If you find this dataset helpful in your research, please consider citing: ```bibtex @misc{cheng2025revisiting, title = {Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective}, author = {Zhoujun Cheng and Shibo Hao and Tianyang Liu and Fan Zhou and Yutao Xie and Feng Yao and Yuexin Bian and Yonghao Zhuang and Nilabjo Dey and Yuheng Zha and Yi Gu and Kun Zhou and Yuqi Wang and Yuan Li and Richard Fan and Jianshu She and Chengqian Gao and Abulhair Saparov and Haonan Li and Taylor W. Killian and Mikhail Yurochkin and Zhengzhong Liu and Eric P. Xing and Zhiting Hu}, journal = {arXiv preprint arXiv:2506.14965}, year = {2025}, doi = {10.48550/arXiv.2506.14965}, url = {https://arxiv.org/abs/2506.14965} } ``` *This dataset card follows the Hugging Face dataset card template and provides comprehensive information about the Guru dataset structure, creation process, and intended use cases.*

# 从跨域视角重新审视用于大语言模型推理的强化学习 ## 针对此额外信息压缩版数据集的说明本仓库提供的数据集专为配合VeRL的最新版本（[v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0)）使用而设计。由于VeRL的`rl_dataset.py`将数据集处理为datasets.Dataset格式，因此**所有Parquet文件的结构必须完全保持一致**。本仓库正是为满足这一需求而开发。在本仓库中，通过将所有任务专属字段嵌套在`extra_info`字段下，统一了不同任务的所有Parquet文件结构。此外，`extra_info`与`reward_model`字段均存储压缩后的JSON格式字符串，以确保整个数据集可高效存储于Parquet文件中。 Guru数据集的使用指南如下： 1. 若直接使用[Reasoning360仓库](https://github.com/LLM360/Reasoning360)（VeRL的一个分支），请使用[guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k)数据集。 2. 若使用官方[VeRL](https://github.com/volcengine/verl?tab=readme-ov-file)，请使用本[guru-RL-92k-extra-info-compressed](https://huggingface.co/datasets/LLM360/guru-RL-92k-extra-info-compressed)数据集。由[llm-reasoners](https://github.com/maitrix-org/llm-reasoners)提供的奖励计算流程需要对压缩信息进行解压与反序列化，因此其运行速度略慢于原始Guru数据集。 ## 数据集描述 **Guru**是一个经过精心整理的六领域数据集，用于通过强化学习（Reinforcement Learning，RL）训练大语言模型（Large Language Model，LLM）以完成复杂推理任务。该数据集包含9.19万个高质量样本，覆盖6个高度依赖推理的多样化领域，且通过一套完整的五阶段整理流程进行处理，以确保领域多样性与奖励可验证性。 ### 数据集摘要 Guru针对大语言模型亟需具备鲁棒的跨域推理能力这一需求，提供了经过精心均衡的问题集，覆盖**数学、代码、科学、逻辑、模拟与表格推理**六大领域。每个样本均经过质量筛选，并配备了自动化验证机制，非常适合用于强化学习相关应用。 ### 核心特性 - **跨领域覆盖**：涵盖六大推理领域，适用于大语言模型推理研究与能力提升 - **质量保障**：采用五阶段整理流程，包含去重与启发式筛选环节 - **适配强化学习**：提供领域专属奖励函数，可实现可靠的模型评估 - **难度校准**：对样本进行筛选以保持恰当的任务挑战水平 ### 数据结构本数据集采用Parquet格式存储，以实现高效的访问与处理。每个样本至少包含以下字段： 1. **data_source** - 类型：字符串 - 描述：用于标识数据集来源与所属领域的标识符，以便匹配对应的专属奖励函数 2. **prompt** - 类型：消息对象列表 - 包含内容： - content：实际的文本内容 - role：消息角色（例如"user"） 3. **ability** - 类型：字符串 - 描述：样本测试的核心推理技能 4. **apply_chat_template** - 类型：布尔值 - 描述：用于标识是否需要应用对话模板的标记 5. **qwen2.5_7b_pass_rate** - 类型：浮点数 - 描述：使用Qwen 2.5-7B模型测试的样本通过率 6. **qwen3_30b_pass_rate** - 类型：浮点数 - 描述：使用Qwen 3-30B模型测试的样本通过率 7. **extra_info** - 类型：字典 - 描述：用于奖励计算的补充信息 - 备注：具体结构因任务而异 8. **reward_model** - 类型：字典 - 包含内容： - ground_truth：压缩后的答案/验证数据 - 备注：具体结构因任务而异 ### 领域与统计数据 | 领域 | 包含的数据集 | 最终样本数 | 核心聚焦领域 | |--------|------------------|-------------------|-----------------| | **数学** | OR1、DAPO、DeepScaler | 54.4K | 竞赛试题、符号推理 | | **代码** | LeetCode、TACO-Verified、PrimeIntellect、LiveCodeBench | 18.1K | 编程挑战、算法设计 | | **科学** | WebInstruct-Verified | 3.6K | 大学/博士级物理、化学、生物学问题 | | **逻辑** | ARC-AGI、BARC、自定义谜题 | 6.3K | 符号推理、约束满足问题 | | **模拟** | Code I/O（PyEdu） | 3.7K | 无需执行的代码行为预测 | | **表格** | HiTab、MultiHierTT | 6.1K | 单表与多表推理 | **总样本量**：9.19万（从68.43万原始样本中筛选得到） ### 数据集来源 | 领域 | 数据集 | 来源 | |--------|---------|--------| | **数学** | OR1 | [Skywork-OR1 (2025)](https://github.com/SkyworkAI/Skywork-O1-Open) | | | DAPO | [DAPO数据集](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) | | | DeepScaler | [DeepScaleR数据集](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) | | **代码** | LeetCode | [LeetCode数据集](https://huggingface.co/datasets/greengerong/leetcode) | | | TACO-Verified | [TACO数据集](https://huggingface.co/datasets/BAAI/TACO) | | | PrimeIntellect | [PrimeIntellect数据集](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-1) | | | LiveCodeBench（历史版本） | [LiveCodeBench](https://github.com/LiveCodeBench/LiveCodeBench) | | **科学** | WebInstruct-Verified | [WebInstruct数据集](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified) | | **逻辑** | 斑马谜题 | - | | | 排序谜题 | - | | | 图谜题 | - | | | ARC-AGI-1/2 | [ARC-AGI数据集](https://arcprize.org/arc-agi) | | | BARC | [BARC数据集](https://huggingface.co/barc0) | | **模拟** | Code I/O（PyEdu） | [CodeIO-PyEdu数据集](https://huggingface.co/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning) | | **表格** | HiTab | [HiTab数据集](https://github.com/microsoft/HiTab) | | | MultiHierTT | [MultiHierTT数据集](https://github.com/psunlpgroup/MultiHiertt) | ## 引用若您的研究中使用了本数据集，请引用如下文献： bibtex @misc{cheng2025revisiting, title = {Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective}, author = {Zhoujun Cheng and Shibo Hao and Tianyang Liu and Fan Zhou and Yutao Xie and Feng Yao and Yuexin Bian and Yonghao Zhuang and Nilabjo Dey and Yuheng Zha and Yi Gu and Kun Zhou and Yuqi Wang and Yuan Li and Richard Fan and Jianshu She and Chengqian Gao and Abulhair Saparov and Haonan Li and Taylor W. Killian and Mikhail Yurochkin and Zhengzhong Liu and Eric P. Xing and Zhiting Hu}, journal = {arXiv preprint arXiv:2506.14965}, year = {2025}, doi = {10.48550/arXiv.2506.14965}, url = {https://arxiv.org/abs/2506.14965} } *本数据集卡片遵循Hugging Face数据集卡片模板，提供了关于Guru数据集的结构、构建流程与预期使用场景的完整信息。*

提供机构：

maas

创建时间：

2025-06-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集