Julian2002/RLVR-Math-16k

Name: Julian2002/RLVR-Math-16k
Creator: Julian2002
Published: 2026-03-23 14:45:24
License: 暂无描述

Hugging Face2026-03-23 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Julian2002/RLVR-Math-16k

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en task_categories: - text-generation tags: - math - reasoning - rlhf - rlvr - grpo size_categories: - 10K<n<100K --- # RLVR-Math-16k A curated math reasoning dataset for **RLVR (Reinforcement Learning with Verifiable Rewards)** training. ## Dataset Summary | Split | Samples | |-------|--------:| | train | 16,384 | | test | 842 | | **Total** | **17,226** | ## Source Datasets ### train | Source | Samples | |--------|--------:| | hiyouga/math12k | 10,476 | | nlile/NuminaMath-1.5-RL-Verifiable/amc_aime | 3,075 | | nlile/NuminaMath-1.5-RL-Verifiable/olympiads | 2,833 | ### test | Source | Samples | |--------|--------:| | hiyouga/math12k | 500 | | math-ai/minervamath | 272 | | math-ai/amc23 | 40 | | math-ai/aime25 | 30 | ### Training Sources - [hiyouga/math12k](https://huggingface.co/datasets/hiyouga/math12k): MATH competition problems (converted from OpenAI PRM800K) - [nlile/NuminaMath-1.5-RL-Verifiable](https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable): AMC/AIME and Olympiad competition problems ### Test Sources - [hiyouga/math12k](https://huggingface.co/datasets/hiyouga/math12k): MATH500 - [math-ai/minervamath](https://huggingface.co/datasets/math-ai/minervamath): Minerva Math - [math-ai/aime25](https://huggingface.co/datasets/math-ai/aime25): AIME 2025 - [math-ai/amc23](https://huggingface.co/datasets/math-ai/amc23): AMC 2023 ## Data Format Each sample follows the verl-compatible chat format: ```json { "data_source": "source_dataset_id", "prompt": [ {"role": "system", "content": "..."}, {"role": "user", "content": "math problem text"} ], "ability": "math", "reward_model": {"style": "rule", "ground_truth": "answer"}, "extra_info": {"split": "train/test", "index": 0} } ``` ## Preprocessing **Training data filters:** - Source filter: only competition-level problems (olympiads, amc_aime) - Length filter: problem <= 2000 chars, solution <= 3000 chars - Test set deduplication: removed overlapping problems with all test benchmarks - Stratified sampling by source category - Answer parsability: verified via [math-verify](https://github.com/huggingface/Math-Verify) to ensure reliable reward signals **Test data:** standard benchmarks used as-is (no filtering applied). ## Intended Use This dataset is designed for RLVR math reasoning training (e.g., DAPO, REINFORCE++) with rule-based reward verification.

提供机构：

Julian2002

5,000+

优质数据集

54 个

任务类型

进入经典数据集