NuminaMath-CoT
收藏魔搭社区2026-05-16 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/NuminaMath-CoT
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for NuminaMath CoT
## Dataset Description
- **Homepage:** https://projectnumina.ai
- **Repository:** https://github.com/project-numina/aimo-progress-prize
- **Paper:** https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf
- **Leaderboard:**
- **Point of Contact:** [Jia Li](jia@projectnumina.ai)
### Dataset Summary
Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner. The sources of the dataset range from Chinese high school math exercises to US and international mathematics olympiad competition problems. The data were primarily collected from online exam paper PDFs and mathematics discussion forums. The processing steps include (a) OCR from the original PDFs, (b) segmentation into problem-solution pairs, (c) Translation into English, (d) realignment to produce a CoT reasoning format, and (e) final answer formatting.
### Source breakdown
| Source | Number of Samples |
| --- | --- |
| aops_forum | 30201 |
| amc_aime | 4072 |
| cn_k12 | 276591 |
| gsm8k | 7345 |
| math | 7478 |
| olympiads | 150581 |
| orca_math | 153334 |
| synthetic_amc | 62111 |
| synthetic_math | 167895 |
| **Total** | **859608** |
### Licensing Information
The dataset is available under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
### Citation Information
```
@misc{numina_math_datasets,
author = {Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu},
title = {NuminaMath},
year = {2024},
publisher = {Numina},
journal = {Hugging Face repository},
howpublished = {\url{[https://huggingface.co/AI-MO/NuminaMath-CoT](https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf)}}
}
```
# NuminaMath CoT 数据集卡片(Dataset Card)
## 数据集描述(Dataset Description)
- **主页(Homepage):** https://projectnumina.ai
- **代码仓库(Repository):** https://github.com/project-numina/aimo-progress-prize
- **论文(Paper):** https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf
- **排行榜(Leaderboard):**
- **联系人(Point of Contact):** [Jia Li](jia@projectnumina.ai)
### 数据集概览(Dataset Summary)
约包含86万道数学题目,每道题的解答均采用思维链(Chain of Thought, CoT)格式进行组织。本数据集的来源涵盖中国高中数学练习题、美国及国际数学奥林匹克竞赛试题,主要从在线考试试卷PDF文档与数学讨论论坛中采集。数据处理流程包括:(a) 对原始PDF进行光学字符识别(Optical Character Recognition, OCR);(b) 分割为题目-解答对;(c) 翻译为英文;(d) 重新对齐以生成思维链推理格式;(e) 格式化最终答案。
### 数据源细分(Source breakdown)
| 数据源(Source) | 样本数量(Number of Samples) |
| --- | --- |
| aops_forum | 30201 |
| amc_aime | 4072 |
| cn_k12 | 276591 |
| gsm8k | 7345 |
| math | 7478 |
| olympiads | 150581 |
| orca_math | 153334 |
| synthetic_amc | 62111 |
| synthetic_math | 167895 |
| **总计(Total)** | **859608** |
### 许可信息(Licensing Information)
本数据集采用[Apache许可证2.0版(Apache License, Version 2.0)](https://www.apache.org/licenses/LICENSE-2.0)发布。
### 引用信息(Citation Information)
@misc{numina_math_datasets,
author = {Jia LI and Edward Beeching and Lewis Tunstall and Ben Lipkin and Roman Soletskyi and Shengyi Costa Huang and Kashif Rasul and Longhui Yu and Albert Jiang and Ziju Shen and Zihan Qin and Bin Dong and Li Zhou and Yann Fleureau and Guillaume Lample and Stanislas Polu},
title = {NuminaMath},
year = {2024},
publisher = {Numina},
journal = {Hugging Face repository},
howpublished = {url{[https://huggingface.co/AI-MO/NuminaMath-CoT](https://github.com/project-numina/aimo-progress-prize/blob/main/report/numina_dataset.pdf)}}
}
提供机构:
maas
创建时间:
2024-07-22
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



