juletxara/mgsm

Name: juletxara/mgsm
Creator: juletxara
Published: 2023-05-09 16:46:31
License: 暂无描述

Hugging Face2023-05-09 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/juletxara/mgsm

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - found language_creators: - found - expert-generated language: - en - es - fr - de - ru - zh - ja - th - sw - bn license: - cc-by-sa-4.0 multilinguality: - multilingual size_categories: - 1K<n<10K source_datasets: - extended|gsm8k task_categories: - text2text-generation task_ids: [] paperswithcode_id: multi-task-language-understanding-on-mgsm pretty_name: Multilingual Grade School Math Benchmark (MGSM) tags: - math-word-problems dataset_info: - config_name: en features: - name: question dtype: string - name: answer dtype: string - name: answer_number dtype: int32 - name: equation_solution dtype: string splits: - name: train num_bytes: 3963202 num_examples: 8 - name: test num_bytes: 713732 num_examples: 250 download_size: 4915944 dataset_size: 4676934 - config_name: es features: - name: question dtype: string - name: answer dtype: string - name: answer_number dtype: int32 - name: equation_solution dtype: string splits: - name: train num_bytes: 3963202 num_examples: 8 - name: test num_bytes: 713732 num_examples: 250 download_size: 4915944 dataset_size: 4676934 --- # Dataset Card for MGSM ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** https://openai.com/blog/grade-school-math/ - **Repository:** https://github.com/openai/grade-school-math - **Paper:** https://arxiv.org/abs/2110.14168 - **Leaderboard:** [Needs More Information] - **Point of Contact:** [Needs More Information] ### Dataset Summary Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems, proposed in the paper [Language models are multilingual chain-of-thought reasoners](http://arxiv.org/abs/2210.03057). The same 250 problems from [GSM8K](https://arxiv.org/abs/2110.14168) are each translated via human annotators in 10 languages. The 10 languages are: - Spanish - French - German - Russian - Chinese - Japanese - Thai - Swahili - Bengali - Telugu GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning. You can find the input and targets for each of the ten languages (and English) as `.tsv` files. We also include few-shot exemplars that are also manually translated from each language in `exemplars.py`. ### Supported Tasks and Leaderboards [Needs More Information] ### Languages The same 250 problems from [GSM8K](https://arxiv.org/abs/2110.14168) are each translated via human annotators in 10 languages. The 10 languages are: - Spanish - French - German - Russian - Chinese - Japanese - Thai - Swahili - Bengali - Telugu ## Dataset Structure ### Data Instances Each instance in the train split contains: - a string for the grade-school level math question - a string for the corresponding answer with chain-of-thought steps. - the numeric solution to the question - the equation solution to the question ```python {'question': 'Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?', 'answer': 'Step-by-Step Answer: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.', 'answer_number': 11, 'equation_solution': '5 + 6 = 11.'} ``` Each instance in the test split contains: - a string for the grade-school level math question - the numeric solution to the question ```python {'question': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", 'answer': None, 'answer_number': 18, 'equation_solution': None} ``` ### Data Fields The data fields are the same among `train` and `test` splits. - question: The question string to a grade school math problem. - answer: The full solution string to the `question`. It contains multiple steps of reasoning with calculator annotations and the final numeric solution. - answer_number: The numeric solution to the `question`. - equation_solution: The equation solution to the `question`. ### Data Splits - The train split includes 8 few-shot exemplars that are also manually translated from each language. - The test split includes the same 250 problems from GSM8K translated via human annotators in 10 languages. | name |train|test | |--------|----:|---------:| |en | 8 | 250 | |es | 8 | 250 | |fr | 8 | 250 | |de | 8 | 250 | |ru | 8 | 250 | |zh | 8 | 250 | |ja | 8 | 250 | |th | 8 | 250 | |sw | 8 | 250 | |bn | 8 | 250 | |te | 8 | 250 | ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization From the paper: > We initially collected a starting set of a thousand problems and natural language solutions by hiring freelance contractors on Upwork (upwork.com). We then worked with Surge AI (surgehq.ai), an NLP data labeling platform, to scale up our data collection. After collecting the full dataset, we asked workers to re-solve all problems, with no workers re-solving problems they originally wrote. We checked whether their final answers agreed with the original solu- tions, and any problems that produced disagreements were either repaired or discarded. We then performed another round of agreement checks on a smaller subset of problems, finding that 1.7% of problems still produce disagreements among contractors. We estimate this to be the fraction of problems that con- tain breaking errors or ambiguities. It is possible that a larger percentage of problems contain subtle errors. #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? Surge AI (surgehq.ai) ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators [Needs More Information] ### Licensing Information The GSM8K dataset is licensed under the [MIT License](https://opensource.org/licenses/MIT). ### Citation Information ```bibtex @article{cobbe2021gsm8k, title={Training Verifiers to Solve Math Word Problems}, author={Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John}, journal={arXiv preprint arXiv:2110.14168}, year={2021} } @misc{shi2022language, title={Language Models are Multilingual Chain-of-Thought Reasoners}, author={Freda Shi and Mirac Suzgun and Markus Freitag and Xuezhi Wang and Suraj Srivats and Soroush Vosoughi and Hyung Won Chung and Yi Tay and Sebastian Ruder and Denny Zhou and Dipanjan Das and Jason Wei}, year={2022}, eprint={2210.03057}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ### Contributions Thanks to [@juletx](https://github.com/juletx) for adding this dataset.

annotations_creators: - 公开搜集（found） language_creators: - 公开搜集（found） - 专家生成（expert-generated） language: - 英语（en） - 西班牙语（es） - 法语（fr） - 德语（de） - 俄语（ru） - 中文（zh） - 日语（ja） - 泰语（th） - 斯瓦希里语（sw） - 孟加拉语（bn） license: - CC BY-SA 4.0（cc-by-sa-4.0） multilinguality: - 多语言（multilingual） size_categories: - 1000 < 样本数 < 10000 source_datasets: - 扩展版|GSM8K（extended|gsm8k） task_categories: - 文本到文本生成（text2text-generation） task_ids: [] paperswithcode_id: multi-task-language-understanding-on-mgsm pretty_name: 多语言小学算数基准（Multilingual Grade School Math Benchmark, MGSM） tags: - 数学应用题（math-word-problems） dataset_info: - config_name: 英语（en） features: - 名称：问题（question），数据类型：字符串（string） - 名称：答案（answer），数据类型：字符串（string） - 名称：数值答案（answer_number），数据类型：int32 - 名称：方程式解答（equation_solution），数据类型：字符串（string） splits: - 名称：训练集（train），字节数：3963202，样本数：8 - 名称：测试集（test），字节数：713732，样本数：250 download_size: 4915944 dataset_size: 4676934 - config_name: 西班牙语（es） features: - 名称：问题（question），数据类型：字符串（string） - 名称：答案（answer），数据类型：字符串（string） - 名称：数值答案（answer_number），数据类型：int32 - 名称：方程式解答（equation_solution），数据类型：字符串（string） splits: - 名称：训练集（train），字节数：3963202，样本数：8 - 名称：测试集（test），字节数：713732，样本数：250 download_size: 4915944 dataset_size: 4676934 ## MGSM数据集卡片 ## 目录 - [数据集概述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [使用语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [注释](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) ## 数据集描述 - **主页**：https://openai.com/blog/grade-school-math/ - **仓库**：https://github.com/openai/grade-school-math - **论文**：https://arxiv.org/abs/2110.14168 - **排行榜**：[待补充更多信息] - **联系人**：[待补充更多信息] ### 数据集摘要多语言小学算数基准（MGSM，Multilingual Grade School Math Benchmark）是一项小学算数问题基准数据集，源自论文《语言模型作为多语言思维链（chain-of-thought）推理器》（*Language models are multilingual chain-of-thought reasoners*，arXiv:2210.03057）。该数据集将来自GSM8K（Grade School Math 8K，https://arxiv.org/abs/2110.14168）的250道原题通过人工标注者翻译成10种语言，这10种语言分别为：西班牙语、法语、德语、俄语、中文、日语、泰语、斯瓦希里语、孟加拉语、泰卢固语。 GSM8K（即小学算数8K）是一个包含8500道高质量、语言多样化的小学数学应用题数据集，其构建目的是支持需要多步推理的基础数学问题问答任务。用户可获取10种语言（及英语）的输入与目标文件，格式为`.tsv`。我们还在`exemplars.py`中提供了经人工翻译的少样本（few-shot）示范样本。 ### 支持任务与排行榜 [待补充更多信息] ### 使用语言该数据集将来自GSM8K（Grade School Math 8K，https://arxiv.org/abs/2110.14168）的250道原题通过人工标注者翻译成10种语言，这10种语言分别为：西班牙语、法语、德语、俄语、中文、日语、泰语、斯瓦希里语、孟加拉语、泰卢固语。 ## 数据集结构 ### 数据实例训练集的每个实例包含： - 对应小学算数水平的问题字符串 - 包含思维链步骤的对应答案字符串 - 该问题的数值解 - 该问题的方程式解示例Python代码格式： python {'question': '问题：罗杰有5个网球。他又买了2罐网球。每罐有3个网球。他现在总共有多少个网球？', 'answer': '分步解答：罗杰一开始有5个球。2罐每罐3个网球，共计6个网球。5 + 6 = 11。答案是11。', 'answer_number': 11, 'equation_solution': '5 + 6 = 11.'} 测试集的每个实例包含： - 对应小学算数水平的问题字符串 - 该问题的数值解示例Python代码格式： python {'question': "珍妮特的鸭子每天下16个蛋。她每天早上早餐吃3个，每天用4个蛋给朋友烤松饼。她每天将剩余的鸡蛋以每个2美元的价格在农贸市场出售。她每天在农贸市场能赚多少美元？", 'answer': None, 'answer_number': 18, 'equation_solution': None} ### 数据字段训练集与测试集的字段定义一致： - `question`：小学算数问题的文本字符串 - `answer`：该问题的完整解答字符串，包含多步推理过程（带标注的计算步骤）与最终数值解 - `answer_number`：该问题的数值解 - `equation_solution`：该问题的方程式解答 ### 数据划分 - 训练集包含8条少样本（few-shot）示范样本，均为各语言的人工翻译版本 - 测试集包含经10种语言人工翻译的250道GSM8K原题 | 配置名称 | 训练集样本数 | 测试集样本数 | |--------|----:|---------:| | 英语（en） | 8 | 250 | | 西班牙语（es） | 8 | 250 | | 法语（fr） | 8 | 250 | | 德语（de） | 8 | 250 | | 俄语（ru） | 8 | 250 | | 中文（zh） | 8 | 250 | | 日语（ja） | 8 | 250 | | 泰语（th） | 8 | 250 | | 斯瓦希里语（sw） | 8 | 250 | | 孟加拉语（bn） | 8 | 250 | | 泰卢固语（te） | 8 | 250 | ## 数据集构建 ### 构建初衷 [待补充更多信息] ### 源数据 #### 初始数据收集与标准化引自原论文： > 我们最初通过在Upwork（upwork.com）平台雇佣自由职业者，收集了1000道问题与自然语言解答的初始数据集。随后我们与NLP数据标注平台Surge AI（surgehq.ai）合作，扩大了数据收集规模。完成全量数据集收集后，我们要求标注者重新解答所有问题，且不允许标注者修改自己最初编写的解答。我们检查标注者的最终答案与原始解答是否一致，存在分歧的问题要么被修复，要么被丢弃。随后我们对小部分样本进行了第二轮一致性检查，发现仍有1.7%的样本存在标注者间分歧。我们估计该比例即为存在致命错误或歧义的问题占比，可能有更高比例的样本存在细微错误。 #### 源语言生成者是谁？ [待补充更多信息] ### 注释 #### 注释流程 [待补充更多信息] #### 标注者是谁？ Surge AI（surgehq.ai） ### 个人与敏感信息 [待补充更多信息] ## 数据集使用注意事项 ### 数据集的社会影响 [待补充更多信息] ### 偏差讨论 [待补充更多信息] ### 其他已知局限性 [待补充更多信息] ## 附加信息 ### 数据集维护者 [待补充更多信息] ### 许可信息 GSM8K数据集采用[MIT许可协议（MIT License）](https://opensource.org/licenses/MIT)进行授权。 ### 引用信息 bibtex @article{cobbe2021gsm8k, title={"训练验证器以求解数学应用题"}, author={Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John}, journal={arXiv预印本 arXiv:2110.14168}, year={2021} } @misc{shi2022language, title={"语言模型作为多语言思维链推理器"}, author={Freda Shi and Mirac Suzgun and Markus Freitag and Xuezhi Wang and Suraj Srivats and Soroush Vosoughi and Hyung Won Chung and Yi Tay and Sebastian Ruder and Denny Zhou and Dipanjan Das and Jason Wei}, year={2022}, eprint={2210.03057}, archivePrefix={arXiv}, primaryClass={cs.CL} } ### 贡献感谢[@juletx](https://github.com/juletx)贡献此数据集。

提供机构：

juletxara

原始信息汇总

数据集概述

名称: Multilingual Grade School Math Benchmark (MGSM)

描述: MGSM是一个多语言的基准数据集，包含250个小学数学问题，这些问题从GSM8K数据集翻译而来，并由人工注释者在10种语言中进行翻译。

语言: 英语、西班牙语、法语、德语、俄语、中文、日语、泰语、斯瓦希里语、孟加拉语

许可证: cc-by-sa-4.0

多语言性: 多语言

大小: 1K<n<10K

源数据集: 扩展自GSM8K

任务类别: 文本到文本生成

标签: 数学问题

数据集结构

配置:

en: 英语
es: 西班牙语
fr: 法语
de: 德语
ru: 俄语
zh: 中文
ja: 日语
th: 泰语
sw: 斯瓦希里语
bn: 孟加拉语

特征:

question: 字符串，小学数学问题
answer: 字符串，问题的完整解答
answer_number: 整数，问题的数值解答
equation_solution: 字符串，问题的方程解答

数据分割:

train: 8个示例，每个语言
test: 250个问题，每个语言

数据集创建

注释者: Surge AI (surgehq.ai)

源数据收集: 初始数据收集通过Upwork平台，后续通过Surge AI平台扩展。

许可证: MIT License

引用信息: bibtex @article{cobbe2021gsm8k, title={Training Verifiers to Solve Math Word Problems}, author={Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John}, journal={arXiv preprint arXiv:2110.14168}, year={2021} } @misc{shi2022language, title={Language Models are Multilingual Chain-of-Thought Reasoners}, author={Freda Shi and Mirac Suzgun and Markus Freitag and Xuezhi Wang and Suraj Srivats and Soroush Vosoughi and Hyung Won Chung and Yi Tay and Sebastian Ruder and Denny Zhou and Dipanjan Das and Jason Wei}, year={2022}, eprint={2210.03057}, archivePrefix={arXiv}, primaryClass={cs.CL} }

搜集汇总

数据集介绍

构建方式

MGSM数据集的构建基于GSM8K数据集，通过人工注释者将250个数学问题翻译成10种不同的语言，包括西班牙语、法语、德语、俄语、中文、日语、泰语、斯瓦希里语、孟加拉语和泰卢固语。这些翻译工作确保了数据集的多语言性和高质量。此外，数据集还包括了少样本示例，这些示例同样由人工翻译，以增强数据集的多样性和实用性。

特点

MGSM数据集的主要特点是其多语言性和高质量的翻译。数据集涵盖了10种语言，每种语言都包含250个经过人工翻译的数学问题，这些问题均源自GSM8K数据集。此外，数据集还提供了详细的解答步骤和方程式解决方案，这有助于模型学习和理解多步骤的数学推理过程。

使用方法

MGSM数据集适用于多种自然语言处理任务，特别是文本生成和多任务语言理解。用户可以通过加载数据集的训练和测试分割来训练和评估模型。数据集的每个实例包含一个问题、一个详细的解答步骤、一个数值答案和一个方程式解决方案，这些信息可以用于训练模型进行多步骤的数学推理。

背景与挑战

背景概述

MGSM（Multilingual Grade School Math Benchmark）数据集是一个针对小学数学问题的多语言基准测试集，由OpenAI的研究团队创建。该数据集的核心研究问题是如何在多语言环境下评估和提升语言模型在数学问题上的推理能力。MGSM基于GSM8K数据集，通过人工翻译将250个数学问题扩展到10种语言，包括西班牙语、法语、德语、俄语、中文、日语、泰语、斯瓦希里语、孟加拉语和泰卢固语。这一数据集的创建旨在支持多语言环境下的数学问题解答任务，特别是那些需要多步骤推理的问题。MGSM的发布对于推动多语言自然语言处理技术的发展具有重要意义，尤其是在教育领域和跨文化交流中。

当前挑战

MGSM数据集在构建过程中面临多项挑战。首先，多语言翻译的准确性和一致性是一个主要问题，因为不同语言的语法和表达方式差异较大，确保翻译后的问题在逻辑和数学上的一致性是一个复杂的过程。其次，数据集的标注质量也是一个关键挑战，特别是在多步骤推理问题的标注上，需要确保每个步骤的逻辑正确性和完整性。此外，数据集的规模和多样性也是一个挑战，尽管MGSM已经扩展到10种语言，但如何在更多语言和文化背景下验证其有效性仍然是一个开放的问题。最后，数据集的使用和评估标准也需要进一步明确，以确保其在不同应用场景中的可靠性和有效性。

常用场景

经典使用场景

MGSM数据集的经典使用场景主要集中在多语言环境下的小学数学问题解答。通过提供多语言的数学问题及其详细解答，该数据集支持模型在不同语言环境中进行数学推理和问题解答的训练与评估。

解决学术问题

MGSM数据集解决了多语言环境下数学问题解答的学术研究问题。它通过提供多语言的数学问题及其详细解答，帮助研究人员评估和改进模型在不同语言环境中的数学推理能力，从而推动多语言自然语言处理技术的发展。

衍生相关工作

MGSM数据集衍生的相关工作主要集中在多语言数学问题解答模型的开发与评估。例如，基于该数据集的研究工作已经提出了多种多语言数学问题解答模型，并通过实验验证了这些模型在不同语言环境中的有效性。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集