hails/agieval-sat-math
收藏Hugging Face2024-01-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/hails/agieval-sat-math
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: query
dtype: string
- name: choices
sequence: string
- name: gold
sequence: int64
splits:
- name: test
num_bytes: 110388
num_examples: 220
download_size: 57020
dataset_size: 110388
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
---
# Dataset Card for "agieval-sat-math"
Dataset taken from https://github.com/microsoft/AGIEval and processed as in that repo, following dmayhem93/agieval-* datasets on the HF hub.
This dataset contains the contents of the SAT-Math subtask of AGIEval, as accessed in https://github.com/ruixiangcui/AGIEval/commit/5c77d073fda993f1652eaae3cf5d04cc5fd21d40 .
Citation:
```
@misc{zhong2023agieval,
title={AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models},
author={Wanjun Zhong and Ruixiang Cui and Yiduo Guo and Yaobo Liang and Shuai Lu and Yanlin Wang and Amin Saied and Weizhu Chen and Nan Duan},
year={2023},
eprint={2304.06364},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
Please make sure to cite all the individual datasets in your paper when you use them. We provide the relevant citation information below:
```
@inproceedings{ling-etal-2017-program,
title = "Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems",
author = "Ling, Wang and
Yogatama, Dani and
Dyer, Chris and
Blunsom, Phil",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P17-1015",
doi = "10.18653/v1/P17-1015",
pages = "158--167",
abstract = "Solving algebraic word problems requires executing a series of arithmetic operations{---}a program{---}to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.",
}
@inproceedings{hendrycksmath2021,
title={Measuring Mathematical Problem Solving With the MATH Dataset},
author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},
journal={NeurIPS},
year={2021}
}
@inproceedings{Liu2020LogiQAAC,
title={LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning},
author={Jian Liu and Leyang Cui and Hanmeng Liu and Dandan Huang and Yile Wang and Yue Zhang},
booktitle={International Joint Conference on Artificial Intelligence},
year={2020}
}
@inproceedings{zhong2019jec,
title={JEC-QA: A Legal-Domain Question Answering Dataset},
author={Zhong, Haoxi and Xiao, Chaojun and Tu, Cunchao and Zhang, Tianyang and Liu, Zhiyuan and Sun, Maosong},
booktitle={Proceedings of AAAI},
year={2020},
}
@article{Wang2021FromLT,
title={From LSAT: The Progress and Challenges of Complex Reasoning},
author={Siyuan Wang and Zhongkun Liu and Wanjun Zhong and Ming Zhou and Zhongyu Wei and Zhumin Chen and Nan Duan},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
year={2021},
volume={30},
pages={2201-2216}
}
```
---
数据集信息:
特征字段:
- 字段名:query,数据类型:字符串
- 字段名:choices,数据类型:字符串序列
- 字段名:gold,数据类型:64位整数序列
数据集划分:
- 划分名称:test,字节占用:110388,样本数:220
下载大小:57020
数据集总大小:110388
配置项:
- 配置名称:default,数据文件:
- 对应划分:test,文件路径:data/test-*
---
# 「agieval-sat-math」数据集卡片
本数据集源自https://github.com/microsoft/AGIEval,按照该仓库的处理流程进行预处理,遵循Hugging Face Hub上的dmayhem93/agieval-*系列数据集规范。
本数据集包含学术能力评估测试数学(SAT-Math)子任务的全部内容,对应提交版本为https://github.com/ruixiangcui/AGIEval/commit/5c77d073fda993f1652eaae3cf5d04cc5fd21d40。
## 引用信息
@misc{zhong2023agieval,
title={AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models},
author={Wanjun Zhong and Ruixiang Cui and Yiduo Guo and Yaobo Liang and Shuai Lu and Yanlin Wang and Amin Saied and Weizhu Chen and Nan Duan},
year={2023},
eprint={2304.06364},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
使用本数据集时,请在论文中引用所有相关独立数据集。以下为相关引用信息:
@inproceedings{ling-etal-2017-program,
title = "Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems",
author = "Ling, Wang and
Yogatama, Dani and
Dyer, Chris and
Blunsom, Phil",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P17-1015",
doi = "10.18653/v1/P17-1015",
pages = "158--167",
abstract = "Solving algebraic word problems requires executing a series of arithmetic operations{---}a program{---}to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.",
}
@inproceedings{hendrycksmath2021,
title={Measuring Mathematical Problem Solving With the MATH Dataset},
author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},
journal={NeurIPS},
year={2021}
}
@inproceedings{Liu2020LogiQAAC,
title={LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning},
author={Jian Liu and Leyang Cui and Hanmeng Liu and Dandan Huang and Yile Wang and Yue Zhang},
booktitle={International Joint Conference on Artificial Intelligence},
year={2020},
}
@inproceedings{zhong2019jec,
title={JEC-QA: A Legal-Domain Question Answering Dataset},
author={Zhong, Haoxi and Xiao, Chaojun and Tu, Cunchao and Zhang, Tianyang and Liu, Zhiyuan and Sun, Maosong},
booktitle={Proceedings of AAAI},
year={2020},
}
@article{Wang2021FromLT,
title={From LSAT: The Progress and Challenges of Complex Reasoning},
author={Siyuan Wang and Zhongkun Liu and Wanjun Zhong and Ming Zhou and Zhongyu Wei and Zhumin Chen and Nan Duan},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
year={2021},
volume={30},
pages={2201-2216}
}
提供机构:
hails
原始信息汇总
数据集概述
数据集信息
- 特征:
query: 字符串类型choices: 字符串序列gold: 整数序列
- 分割:
test: 包含220个样本,总字节数为110388
- 下载大小: 57020字节
- 数据集大小: 110388字节
配置
- 配置名称:
default - 数据文件:
test: 路径为data/test-*
引用
@misc{zhong2023agieval, title={AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models}, author={Wanjun Zhong and Ruixiang Cui and Yiduo Guo and Yaobo Liang and Shuai Lu and Yanlin Wang and Amin Saied and Weizhu Chen and Nan Duan}, year={2023}, eprint={2304.06364}, archivePrefix={arXiv}, primaryClass={cs.CL} }
搜集汇总
数据集介绍

构建方式
该数据集hails/agieval-sat-math的构建,是基于AGIEval的SAT-Math子任务内容,来源于https://github.com/microsoft/AGIEval,并按照该仓库的处理方式进行整理。数据集包含查询(query)、选项(choices)和正确答案(gold)三个字段,其中查询为字符串类型,选项和正确答案分别为字符串序列和整型序列。数据集经过处理后,形成了测试集(test),包含220个样本,数据大小为110388字节。
特点
hails/agieval-sat-math数据集的特点在于,它是针对代数文字问题解决任务的专门数据集,包含了丰富的代数问题及其解答过程。该数据集不仅提供了问题与答案,还提供了生成答案的过程性解释,这对于评估基础模型在数学问题解决方面的能力具有重要意义。此外,数据集的构建严格遵循了原始数据集的处理规范,确保了数据的一致性和可靠性。
使用方法
在使用hails/agieval-sat-math数据集时,用户需要先通过HuggingFace的库进行下载。下载后,用户可以根据数据集的划分,加载测试集进行模型的评估。该数据集适用于自然语言处理和机器学习领域,特别是在数学问题解决和程序诱导方面的研究和应用。用户在引用数据集时,应遵循数据集提供的引用格式,确保学术规范的遵守。
背景与挑战
背景概述
AGIEval-SAT-Math数据集源自于对基础模型进行人类中心评估的需求,由Zhong Wanjun等研究人员于2023年提出。该数据集是对AGIEval的SAT-Math子任务的整理与处理,旨在为评估数学问题解决能力提供标准化的基准。其核心研究问题聚焦于如何准确评价模型在解决数学问题上的性能,对自然语言处理领域,特别是在程序诱导与数学问题解答方面产生了显著影响。
当前挑战
在数据集构建过程中,研究团队面临了多项挑战。首先,如何确保数据集中的数学问题及其解答的准确性与合理性,是构建过程中的一个重要挑战。其次,由于数学问题的解答通常涉及复杂的逻辑推理,如何设计适应此类问题的评估机制也是一个关键挑战。此外,在数据集的实际应用中,还需解决如何有效利用该数据集进行模型训练与评估的问题,以提升模型在数学问题解决领域的表现。
常用场景
经典使用场景
在自然语言处理与数学教育交叉领域,hails/agieval-sat-math数据集被广泛用于评估基础模型解决数学问题的能力。该数据集特别适用于设计算法,这些算法能够通过理解和生成自然语言及数学表达式,来解决SAT数学题目。
解决学术问题
该数据集有效地解决了如何构建和评估能够模拟人类解决数学问题过程的算法模型这一学术难题。它为研究人员提供了一个标准化的测试平台,使得他们可以在统一的评价标准下,比较不同模型的性能和效果。
衍生相关工作
基于此数据集,研究者们已经衍生出多项相关工作,包括对基础模型的改进、数学问题解决策略的研究,以及结合自然语言处理和符号推理的教育技术探索。这些工作推动了数学教育领域的技术进步,为智能教育软件的开发提供了理论基础和实践指导。
以上内容由遇见数据集搜集并总结生成



