abhishekalfredtoppo/gsm8k
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/abhishekalfredtoppo/gsm8k
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language_creators:
- crowdsourced
language:
- en
license:
- mit
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-generation
task_ids: []
paperswithcode_id: gsm8k
pretty_name: Grade School Math 8K
tags:
- math-word-problems
dataset_info:
- config_name: main
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 3963202
num_examples: 7473
- name: test
num_bytes: 713732
num_examples: 1319
download_size: 2725633
dataset_size: 4676934
- config_name: socratic
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 5198108
num_examples: 7473
- name: test
num_bytes: 936859
num_examples: 1319
download_size: 3164254
dataset_size: 6134967
configs:
- config_name: main
data_files:
- split: train
path: main/train-*
- split: test
path: main/test-*
- config_name: socratic
data_files:
- split: train
path: socratic/train-*
- split: test
path: socratic/test-*
---
# Dataset Card for GSM8K
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-instances)
- [Data Splits](#data-instances)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
## Dataset Description
- **Homepage:** https://openai.com/blog/grade-school-math/
- **Repository:** https://github.com/openai/grade-school-math
- **Paper:** https://arxiv.org/abs/2110.14168
- **Leaderboard:** [Needs More Information]
- **Point of Contact:** [Needs More Information]
### Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
- These problems take between 2 and 8 steps to solve.
- Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the final answer.
- A bright middle school student should be able to solve every problem: from the paper, "Problems require no concepts beyond the level of early Algebra, and the vast majority of problems can be solved without explicitly defining a variable."
- Solutions are provided in natural language, as opposed to pure math expressions. From the paper: "We believe this is the most generally useful data format, and we expect it to shed light on the properties of large language models’ internal monologues""
### Supported Tasks and Leaderboards
This dataset is generally used to test logic and math in language modelling.
It has been used for many benchmarks, including the [LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
### Languages
The text in the dataset is in English. The associated BCP-47 code is `en`.
## Dataset Structure
### Data Instances
For the `main` configuration, each instance contains a string for the grade-school level math question and a string for the corresponding answer with multiple steps of reasoning and calculator annotations (explained [here](https://github.com/openai/grade-school-math#calculation-annotations)).
```python
{
'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',
'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72',
}
```
For the `socratic` configuration, each instance contains a string for a grade-school level math question, a string for the corresponding answer with multiple steps of reasoning, calculator annotations (explained [here](https://github.com/openai/grade-school-math#calculation-annotations)), and *Socratic sub-questions*.
```python
{
'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',
'answer': 'How many clips did Natalia sell in May? ** Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nHow many clips did Natalia sell altogether in April and May? ** Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72',
}
```
### Data Fields
The data fields are the same among `main` and `socratic` configurations and their individual splits.
- question: The question string to a grade school math problem.
- answer: The full solution string to the `question`. It contains multiple steps of reasoning with calculator annotations and the final numeric solution.
### Data Splits
| name |train|validation|
|--------|----:|---------:|
|main | 7473| 1319|
|socratic| 7473| 1319|
## Dataset Creation
### Curation Rationale
[Needs More Information]
### Source Data
#### Initial Data Collection and Normalization
From the paper, appendix A:
> We initially collected a starting set of a thousand problems and natural language solutions by hiring freelance contractors on Upwork (upwork.com). We then worked with Surge AI (surgehq.ai), an NLP data labeling platform, to scale up our data collection. After collecting the full dataset, we asked workers to re-solve all problems, with no workers re-solving problems they originally wrote. We checked whether their final answers agreed with the original solutions, and any problems that produced disagreements were either repaired or discarded. We then performed another round of agreement checks on a smaller subset of problems, finding that 1.7% of problems still produce disagreements among contractors. We estimate this to be the fraction of problems that contain breaking errors or ambiguities. It is possible that a larger percentage of problems contain subtle errors.
#### Who are the source language producers?
[Needs More Information]
### Annotations
#### Annotation process
[Needs More Information]
#### Who are the annotators?
Surge AI (surgehq.ai)
### Personal and Sensitive Information
[Needs More Information]
## Considerations for Using the Data
### Social Impact of Dataset
[Needs More Information]
### Discussion of Biases
[Needs More Information]
### Other Known Limitations
[Needs More Information]
## Additional Information
### Dataset Curators
[Needs More Information]
### Licensing Information
The GSM8K dataset is licensed under the [MIT License](https://opensource.org/licenses/MIT).
### Citation Information
```bibtex
@article{cobbe2021gsm8k,
title={Training Verifiers to Solve Math Word Problems},
author={Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John},
journal={arXiv preprint arXiv:2110.14168},
year={2021}
}
```
### Contributions
Thanks to [@jon-tow](https://github.com/jon-tow) for adding this dataset.
annotations_creators:
- 众包(crowdsourced)
language_creators:
- 众包(crowdsourced)
language:
- 英语(en)
license:
- MIT许可证(mit)
multilinguality:
- 单语言(monolingual)
size_categories:
- 1000 < 样本数 < 10000
source_datasets:
- 原创数据集(original)
task_categories:
- 文本生成(text-generation)
task_ids:
- 无
paperswithcode_id: gsm8k
pretty_name: 中小学数学8K(Grade School Math 8K)
tags:
- 数学应用题(math-word-problems)
dataset_info:
- config_name: main
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 3963202
num_examples: 7473
- name: test
num_bytes: 713732
num_examples: 1319
download_size: 2725633
dataset_size: 4676934
- config_name: socratic
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 5198108
num_examples: 7473
- name: test
num_bytes: 936859
num_examples: 1319
download_size: 3164254
dataset_size: 6134967
configs:
- config_name: main
data_files:
- split: train
path: main/train-*
- split: test
path: main/test-*
- config_name: socratic
data_files:
- split: train
path: socratic/train-*
- split: test
path: socratic/test-*
# GSM8K数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务与基准榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集管理者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
## 数据集描述
- **主页:** https://openai.com/blog/grade-school-math/
- **代码仓库:** https://github.com/openai/grade-school-math
- **论文:** https://arxiv.org/abs/2110.14168
- **基准榜:** [待补充更多信息]
- **联系人:** [待补充更多信息]
### 数据集概述
GSM8K(中小学数学8K,Grade School Math 8K)是包含8.5K个高质量、语言风格多样的中小学数学应用题数据集。本数据集旨在支持需要多步推理的基础数学问题问答任务。
- 这些问题的求解步骤介于2到8步之间。
- 解决方案主要通过一系列基础算术运算(+ − ×÷)完成初等计算以得到最终答案。
- 优秀的中学生能够解答所有问题:根据原论文所述,"所有问题所需的知识均不超过早期代数水平,且绝大多数问题无需显式定义变量即可求解"。
- 解决方案以自然语言形式呈现,而非纯数学表达式。原论文提到:"我们认为这是最通用的实用数据格式,且有望为大语言模型(Large Language Model)的内部独白特性提供研究视角"
### 支持任务与基准榜
本数据集通常用于测试语言建模中的逻辑与数学能力,已被应用于包括[大语言模型基准榜(LLM Leaderboard)](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)在内的多个基准测试。
### 语言
数据集文本均为英语,对应的BCP-47代码为`en`。
## 数据集结构
### 数据实例
对于`main`配置,每个数据实例包含一个中小学数学问题的字符串,以及一个包含多步推理和计算器标注(calculator annotations)的对应答案字符串(标注规则详见[此处](https://github.com/openai/grade-school-math#calculation-annotations))。
python
{
'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',
'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.
Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.
#### 72',
}
对于`socratic`配置,每个数据实例包含一个中小学数学问题的字符串、一个包含多步推理和计算器标注(calculator annotations)的对应答案字符串(标注规则详见[此处](https://github.com/openai/grade-school-math#calculation-annotations)),以及*苏格拉底式子问题(Socratic sub-questions)*。
python
{
'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',
'answer': 'How many clips did Natalia sell in May? ** Natalia sold 48/2 = <<48/2=24>>24 clips in May.
How many clips did Natalia sell altogether in April and May? ** Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.
#### 72',
}
### 数据字段
`main`和`socratic`配置及其各个划分的数据字段均保持一致。
- question: 中小学数学问题的提问字符串。
- answer: 对应`question`的完整解决方案字符串,包含带计算器标注的多步推理过程与最终数值解。
### 数据划分
| 配置名称 | 训练集样本数 | 验证集样本数 |
|--------|----:|---------:|
|main | 7473| 1319|
|socratic| 7473| 1319|
## 数据集构建
### 构建初衷
[待补充更多信息]
### 源数据
#### 初始数据收集与标准化
根据原论文附录A所述:"我们最初通过Upwork(upwork.com)平台雇佣自由职业者,收集了1000个问题与自然语言解决方案的初始数据集。随后我们与NLP数据标注平台Surge AI(surgehq.ai)合作,扩大了数据收集规模。完成全量数据集收集后,我们要求标注人员重新解答所有问题,且原作者不得重新解答自己编写的问题。我们会检查最终答案与原始解决方案是否一致,存在分歧的问题将被修复或丢弃。随后我们对较小规模的问题子集进行了第二轮一致性检查,发现仍有1.7%的问题存在标注人员意见分歧。我们估计这部分问题存在轻微错误或歧义,可能有更高比例的问题包含细微错误。"
#### 源文本创作者是谁?
[待补充更多信息]
### 标注
#### 标注流程
[待补充更多信息]
#### 标注人员是谁?
Surge AI(surgehq.ai)
### 个人与敏感信息
[待补充更多信息]
## 数据集使用注意事项
### 数据集的社会影响
[待补充更多信息]
### 偏差讨论
[待补充更多信息]
### 其他已知局限性
[待补充更多信息]
## 附加信息
### 数据集管理者
[待补充更多信息]
### 许可信息
GSM8K数据集采用[MIT许可证(MIT License)](https://opensource.org/licenses/MIT)进行许可。
### 引用信息
bibtex
@article{cobbe2021gsm8k,
title={Training Verifiers to Solve Math Word Problems},
author={Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John},
journal={arXiv preprint arXiv:2110.14168},
year={2021}
}
### 贡献
感谢[@jon-tow](https://github.com/jon-tow)添加本数据集。
提供机构:
abhishekalfredtoppo



