Sellopale/OpenMathReasoning
收藏Hugging Face2025-12-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Sellopale/OpenMathReasoning
下载链接
链接失效反馈官方服务:
资源简介:
OpenMathReasoning是一个大规模数学推理数据集,用于训练大型语言模型(LLMs)。该数据集包含来自AoPS论坛的30.6万个独特数学问题,其中包括320万条长链思维(CoT)解决方案、170万条工具集成推理(TIR)解决方案以及56.6万个从多个候选方案中选择最有希望解决方案的样本(GenSelect)。此外,还包括来自AoPS论坛的19.3万个问题(仅问题,无解决方案)。数据集使用Qwen2.5-32B-Instruct预处理问题,并使用DeepSeek-R1和QwQ-32B生成解决方案。该数据集是我们在AIMO-2 Kaggle比赛中获胜的基础。
OpenMathReasoning is a large-scale math reasoning dataset for training large language models (LLMs). This dataset contains 306K unique mathematical problems sourced from AoPS forums with 3.2M long chain-of-thought (CoT) solutions, 1.7M long tool-integrated reasoning (TIR) solutions, and 566K samples that select the most promising solution out of many candidates (GenSelect). Additionally, it includes 193K problems sourced from AoPS forums (problems only, no solutions). The dataset uses Qwen2.5-32B-Instruct to preprocess problems and DeepSeek-R1 and QwQ-32B to generate solutions. This dataset was the foundation of our winning submission to the AIMO-2 Kaggle competition.
提供机构:
Sellopale



