five

Julian2002/RLVR-Math-16k

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Julian2002/RLVR-Math-16k
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en task_categories: - text-generation tags: - math - reasoning - rlhf - rlvr - grpo size_categories: - 10K<n<100K --- # RLVR-Math-16k A curated math reasoning dataset for **RLVR (Reinforcement Learning with Verifiable Rewards)** training. ## Dataset Summary | Split | Samples | |-------|--------:| | train | 16,384 | | test | 842 | | **Total** | **17,226** | ## Source Datasets ### train | Source | Samples | |--------|--------:| | hiyouga/math12k | 10,476 | | nlile/NuminaMath-1.5-RL-Verifiable/amc_aime | 3,075 | | nlile/NuminaMath-1.5-RL-Verifiable/olympiads | 2,833 | ### test | Source | Samples | |--------|--------:| | hiyouga/math12k | 500 | | math-ai/minervamath | 272 | | math-ai/amc23 | 40 | | math-ai/aime25 | 30 | ### Training Sources - [hiyouga/math12k](https://huggingface.co/datasets/hiyouga/math12k): MATH competition problems (converted from OpenAI PRM800K) - [nlile/NuminaMath-1.5-RL-Verifiable](https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable): AMC/AIME and Olympiad competition problems ### Test Sources - [hiyouga/math12k](https://huggingface.co/datasets/hiyouga/math12k): MATH500 - [math-ai/minervamath](https://huggingface.co/datasets/math-ai/minervamath): Minerva Math - [math-ai/aime25](https://huggingface.co/datasets/math-ai/aime25): AIME 2025 - [math-ai/amc23](https://huggingface.co/datasets/math-ai/amc23): AMC 2023 ## Data Format Each sample follows the verl-compatible chat format: ```json { "data_source": "source_dataset_id", "prompt": [ {"role": "system", "content": "..."}, {"role": "user", "content": "math problem text"} ], "ability": "math", "reward_model": {"style": "rule", "ground_truth": "answer"}, "extra_info": {"split": "train/test", "index": 0} } ``` ## Preprocessing **Training data filters:** - Source filter: only competition-level problems (olympiads, amc_aime) - Length filter: problem <= 2000 chars, solution <= 3000 chars - Test set deduplication: removed overlapping problems with all test benchmarks - Stratified sampling by source category - Answer parsability: verified via [math-verify](https://github.com/huggingface/Math-Verify) to ensure reliable reward signals **Test data:** standard benchmarks used as-is (no filtering applied). ## Intended Use This dataset is designed for RLVR math reasoning training (e.g., DAPO, REINFORCE++) with rule-based reward verification.
提供机构:
Julian2002
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作