five

OpenDataArena/ODA-Math-460k

收藏
Hugging Face2026-01-21 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/OpenDataArena/ODA-Math-460k
下载链接
链接失效反馈
官方服务:
资源简介:
ODA-Math-460k是一个大规模数学推理数据集,从表现最佳的开源数学语料库(通过OpenDataArena排行榜筛选)中精选而来,并经过去重、基准去污染、基于LLM的过滤和验证器支持的响应蒸馏等步骤进行优化。该数据集旨在针对一个可学习但具有挑战性的难度区间:对较小模型来说并非易事,但对更强的推理模型来说是可解的。数据集包含约460K个经过筛选和验证的问题,格式为问题→逐步解决方案(推理轨迹)→最终答案,目标是高效提高数学推理和竞赛风格问题解决的能力。

ODA-Math-460k is a large-scale math reasoning dataset curated from top-performing open mathematics corpora (selected via the OpenDataArena leaderboard) and refined through deduplication, benchmark decontamination, LLM-based filtering, and verifier-backed response distillation. It targets a learnable but challenging difficulty band: non-trivial for smaller models yet solvable by stronger reasoning models. The dataset contains ~460K problems (after selection and verification pipeline) in the format of Problem → Step-by-step solution (reasoning trace) → Final answer, with the goal of efficiently improving mathematical reasoning and competition-style problem solving via high-quality, validated solutions.
提供机构:
OpenDataArena
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作