pharaouk/math-orca-arch
收藏数据集概述
数据集描述
- 数据集名称: orca-math-word-problems-200k
- 创建者: Microsoft
- 语言: 英语
- 许可证: MIT
- 任务类别: 问答
- 标签: 数学
- 数据集大小: 100K<n<1M
数据集结构
数据实例
每个数据实例包含一个问题及其对应的答案。例如:
python {question: In a highly contested election having multiple candidates, Mr. Jackson, one of the losing candidates, received 3,485,782 votes, which accounted for precisely 38.7 percent of all votes. To have achieved a victory, he would have needed to secure at least 51 percent of all votes. Approximately, what percent of the remaining unsecured votes would Mr. Jackson have needed to accumulate to reach this victory threshold?, answer: "First, lets find out the total number of votes cast in the election. Since Mr. Jackson received 38.7% of all votes, and that amounted to 3,485,782 votes, we can set up the following equation to find the total number of votes (T):
0.387 * T = 3,485,782
Now, solve for T:
T = 3,485,782 / 0.387 T ≈ 9,000,467 votes (total number of votes cast)
To win, Mr. Jackson would have needed 51% of the total votes. Lets calculate that amount:
0.51 * T = 0.51 * 9,000,467 0.51 * T ≈ 4,590,238 votes needed to win
Now, lets find out how many more votes Mr. Jackson needed to reach this winning threshold:
Votes needed to win - Votes Mr. Jackson received = Additional votes needed 4,590,238 - 3,485,782 = 1,104,456 additional votes needed
Now, lets find out what percentage of the remaining unsecured votes this number represents. The remaining unsecured votes are the votes that were not for Mr. Jackson, which is 100% - 38.7% = 61.3% of the total votes.
61.3% of the total votes is the remaining unsecured votes:
0.613 * T = 0.613 * 9,000,467 0.613 * T ≈ 5,514,686 votes were unsecured
Now, well calculate the percentage of these unsecured votes that the additional votes needed represent:
(Additional votes needed / Unsecured votes) * 100 = Percentage of unsecured votes needed (1,104,456 / 5,514,686) * 100 ≈ 20.03%
So, Mr. Jackson would have needed approximately 20.03% of the remaining unsecured votes to reach the victory threshold of 51%."}
数据字段
question: 包含问题的字符串。answer: 包含对应答案的字符串。
数据分割
数据集分为训练集,具体如下:
train: 200,035 行
数据集的 DatasetDict 结构如下:
python DatasetDict({ train: Dataset({ features: [question, answer], num_rows: 200035 }) })
数据集创建
源数据
数据收集和处理
详细信息请参考论文 Orca-Math: Unlocking the potential of SLMs in Grade School Math。
源数据生产者
Microsoft
标注过程
使用 Azure GPT-4 Turbo 扩展问题集,并生成答案。
个人和敏感信息
数据集中不包含个人、敏感或私人信息。
偏差、风险和限制
数据集仅包含英语的数学应用题。
引用
如果您的研究使用了该数据集,请按以下格式引用:
@misc{mitra2024orcamath, title={Orca-Math: Unlocking the potential of SLMs in Grade School Math}, author={Arindam Mitra and Hamed Khanpour and Corby Rosset and Ahmed Awadallah}, year={2024}, eprint={2402.14830}, archivePrefix={arXiv}, primaryClass={cs.CL} }



