five

pharaouk/math-orca-arch

收藏
Hugging Face2024-03-04 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/pharaouk/math-orca-arch
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含约20万个小学生数学应用题,所有答案均由Azure GPT4-Turbo生成。数据集旨在增强语言模型的数学能力,适用于数学问题解决任务。数据集由Microsoft策划,语言为英语,采用MIT许可证。数据集的结构包括问题和答案两个字段,且仅包含训练集。数据集的创建过程涉及使用Azure GPT-4 Turbo扩展问题集,并生成相应的答案。

This dataset comprises approximately 200,000 mathematical word problems for primary school students. All answers are generated by Azure GPT-4 Turbo. This dataset is designed to enhance the mathematical reasoning capabilities of language models and is suitable for mathematical problem-solving tasks. Curated by Microsoft, the dataset is in English and licensed under the MIT License. Its structure includes two fields: "question" and "answer", and it only contains the training set. The dataset creation process involves using Azure GPT-4 Turbo to expand the problem set and generate corresponding answers.
提供机构:
pharaouk
原始信息汇总

数据集概述

数据集描述

  • 数据集名称: orca-math-word-problems-200k
  • 创建者: Microsoft
  • 语言: 英语
  • 许可证: MIT
  • 任务类别: 问答
  • 标签: 数学
  • 数据集大小: 100K<n<1M

数据集结构

数据实例

每个数据实例包含一个问题及其对应的答案。例如:

python {question: In a highly contested election having multiple candidates, Mr. Jackson, one of the losing candidates, received 3,485,782 votes, which accounted for precisely 38.7 percent of all votes. To have achieved a victory, he would have needed to secure at least 51 percent of all votes. Approximately, what percent of the remaining unsecured votes would Mr. Jackson have needed to accumulate to reach this victory threshold?, answer: "First, lets find out the total number of votes cast in the election. Since Mr. Jackson received 38.7% of all votes, and that amounted to 3,485,782 votes, we can set up the following equation to find the total number of votes (T):

0.387 * T = 3,485,782

Now, solve for T:

T = 3,485,782 / 0.387 T ≈ 9,000,467 votes (total number of votes cast)

To win, Mr. Jackson would have needed 51% of the total votes. Lets calculate that amount:

0.51 * T = 0.51 * 9,000,467 0.51 * T ≈ 4,590,238 votes needed to win

Now, lets find out how many more votes Mr. Jackson needed to reach this winning threshold:

Votes needed to win - Votes Mr. Jackson received = Additional votes needed 4,590,238 - 3,485,782 = 1,104,456 additional votes needed

Now, lets find out what percentage of the remaining unsecured votes this number represents. The remaining unsecured votes are the votes that were not for Mr. Jackson, which is 100% - 38.7% = 61.3% of the total votes.

61.3% of the total votes is the remaining unsecured votes:

0.613 * T = 0.613 * 9,000,467 0.613 * T ≈ 5,514,686 votes were unsecured

Now, well calculate the percentage of these unsecured votes that the additional votes needed represent:

(Additional votes needed / Unsecured votes) * 100 = Percentage of unsecured votes needed (1,104,456 / 5,514,686) * 100 ≈ 20.03%

So, Mr. Jackson would have needed approximately 20.03% of the remaining unsecured votes to reach the victory threshold of 51%."}

数据字段

  • question: 包含问题的字符串。
  • answer: 包含对应答案的字符串。

数据分割

数据集分为训练集,具体如下:

  • train: 200,035 行

数据集的 DatasetDict 结构如下:

python DatasetDict({ train: Dataset({ features: [question, answer], num_rows: 200035 }) })

数据集创建

源数据

数据收集和处理

详细信息请参考论文 Orca-Math: Unlocking the potential of SLMs in Grade School Math

源数据生产者

Microsoft

标注过程

使用 Azure GPT-4 Turbo 扩展问题集,并生成答案。

个人和敏感信息

数据集中不包含个人、敏感或私人信息。

偏差、风险和限制

数据集仅包含英语的数学应用题。

引用

如果您的研究使用了该数据集,请按以下格式引用:

@misc{mitra2024orcamath, title={Orca-Math: Unlocking the potential of SLMs in Grade School Math}, author={Arindam Mitra and Hamed Khanpour and Corby Rosset and Ahmed Awadallah}, year={2024}, eprint={2402.14830}, archivePrefix={arXiv}, primaryClass={cs.CL} }

数据集卡片联系

Arindam Mitra

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作