NuminaMath-CoT
收藏数据集卡片 for NuminaMath CoT
数据集描述
数据集概述
大约包含86万个数学问题,每个解决方案都以思维链(Chain of Thought, CoT)的方式格式化。数据集的来源包括中国高中数学练习题、美国和国际数学奥林匹克竞赛问题。数据主要从在线考试试卷PDF和数学讨论论坛收集。处理步骤包括(a)从原始PDF进行OCR识别,(b)分割成问题-解决方案对,(c)翻译成英语,(d)重新对齐以生成CoT推理格式,以及(e)最终答案格式化。
来源细分
| 来源 | 样本数量 |
|---|---|
| aops_forum | 30201 |
| amc_aime | 4072 |
| cn_k12 | 276591 |
| gsm8k | 7345 |
| math | 7478 |
| olympiads | 150581 |
| orca_math | 153334 |
| synthetic_amc | 62111 |
| synthetic_math | 167895 |
| 总计 | 859608 |
许可信息
数据集在Creative Commons NonCommercial (CC BY-NC 4.0)许可下可用。
引用信息
@misc{numina_math_datasets, author = {Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu}, title = {NuminaMath}, year = {2024}, publisher = {Numina}, journal = {Hugging Face repository}, howpublished = {url{https://huggingface.co/AI-MO/NuminaMath-CoT}} }




