five

DocMath-Eval

收藏
魔搭社区2026-05-02 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/yale-nlp/DocMath-Eval
下载链接
链接失效反馈
官方服务:
资源简介:
## DocMath-Eval [**🌐 Homepage**](https://docmath-eval.github.io/) | [**🤗 Dataset**](https://huggingface.co/datasets/yale-nlp/DocMath-Eval) | [**📖 arXiv**](https://arxiv.org/abs/2311.09805) | [**GitHub**](https://github.com/yale-nlp/DocMath-Eval) The data for the paper [DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents](https://arxiv.org/abs/2311.09805). **DocMath-Eval** is a comprehensive benchmark focused on numerical reasoning within specialized domains. It requires the model to comprehend long and specialized documents and perform numerical reasoning to answer the given question. <p align="center"> <img src="figures/overview.png" width="100%"> </p> ## DocMath-Eval Dataset All the data examples were divided into four subsets: - **simpshort**, which is reannotated from [TAT-QA](https://aclanthology.org/2021.acl-long.254/) and [FinQA](https://aclanthology.org/2021.emnlp-main.300/), necessitates simple numerical reasoning over short document with one table - **simplong**, which is reannotated from [MultiHiertt](https://aclanthology.org/2022.acl-long.454/), necessitates simple numerical reasoning over long document with multiple tables; - **compshort**, which is reannotated from [TAT-HQA](https://aclanthology.org/2022.acl-long.5/), necessitates complex numerical reasoning over short document with one table; - **complong**, which is annotated from scratch by our team, necessitates complex numerical reasoning over long document with multiple tables. For each subset, we provide the *testmini* and *test* splits. You can download this dataset by the following command: ```python from datasets import load_dataset dataset = load_dataset("yale-nlp/DocMath-Eval") # print the first example on the complong testmini set print(dataset["complong-testmini"][0]) ``` The dataset is provided in json format and contains the following attributes: ``` { "question_id": [string] The question id "source": [string] The original source of the example (for simpshort, simplong, and compshort sets) "original_question_id": [string] The original question id (for simpshort, simplong, and compshort sets) "question": [string] The question text "paragraphs": [list] List of paragraphs and tables within the document "table_evidence": [list] List of indices in 'paragraphs' that are used as table evidence for the question "paragraph_evidence": [list] List of indices in 'paragraphs' that are used as text evidence for the question "python_solution": [string] Python-format and executable solution. This feature is hidden for the test set "ground_truth": [float] Executed result of 'python_solution'. This feature is hidden for the test set } ``` ## Contact For any issues or questions, kindly email us at: Yilun Zhao (yilun.zhao@yale.edu). ## Citation If you use the **DocMath-Eval** benchmark in your work, please kindly cite the paper: ``` @misc{zhao2024docmatheval, title={DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents}, author={Yilun Zhao and Yitao Long and Hongjun Liu and Ryo Kamoi and Linyong Nan and Lyuhao Chen and Yixin Liu and Xiangru Tang and Rui Zhang and Arman Cohan}, year={2024}, eprint={2311.09805}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.09805}, } ```

# DocMath-Eval [**🌐 主页**](https://docmath-eval.github.io/) | [**🤗 数据集**](https://huggingface.co/datasets/yale-nlp/DocMath-Eval) | [**📖 arXiv论文**](https://arxiv.org/abs/2311.09805) | [**GitHub仓库**](https://github.com/yale-nlp/DocMath-Eval) 本数据集配套论文为《DocMath-Eval:评估大语言模型(Large Language Model, LLM)在理解长文本专业文档中的数学推理能力》。**DocMath-Eval** 是一款聚焦于专业领域数值推理的综合性基准测试集,要求模型理解长文本专业文档并完成数值推理以回答给定问题。 <p align="center"> <img src="figures/overview.png" width="100%"> </p> ## DocMath-Eval 数据集 所有数据样本被划分为四个子集: - **simpshort**:从 [TAT-QA](https://aclanthology.org/2021.acl-long.254/) 与 [FinQA](https://aclanthology.org/2021.emnlp-main.300/) 重新标注得到,需要针对包含单张表格的短文档完成简单数值推理; - **simplong**:从 [MultiHiertt](https://aclanthology.org/2022.acl-long.454/) 重新标注得到,需要针对包含多张表格的长文档完成简单数值推理; - **compshort**:从 [TAT-HQA](https://aclanthology.org/2022.acl-long.5/) 重新标注得到,需要针对包含单张表格的短文档完成复杂数值推理; - **complong**:由我们团队从头标注得到,需要针对包含多张表格的长文档完成复杂数值推理。 每个子集均提供 *testmini* 与 *test* 划分。 您可通过以下命令下载该数据集: python from datasets import load_dataset dataset = load_dataset("yale-nlp/DocMath-Eval") # 打印 complong testmini 集合中的第一个样本 print(dataset["complong-testmini"][0]) 该数据集以JSON格式提供,包含以下字段: { "question_id": [字符串] 问题唯一标识符 "source": [字符串] 样本的原始来源(仅适用于simpshort、simplong与compshort子集) "original_question_id": [字符串] 原始问题标识符(仅适用于simpshort、simplong与compshort子集) "question": [字符串] 问题文本 "paragraphs": [列表] 文档内段落与表格的列表 "table_evidence": [列表] 存储该问题所引用表格证据在`paragraphs`列表中对应索引的列表 "paragraph_evidence": [列表] 存储该问题所引用文本证据在`paragraphs`列表中对应索引的列表 "python_solution": [字符串] 符合Python格式且可执行的解题代码。该字段在测试集上处于隐藏状态 "ground_truth": [浮点数] `python_solution`的执行结果。该字段在测试集上处于隐藏状态 } ## 联系方式 如有任何问题或疑问,请致信我们:赵逸伦(Yilun Zhao),邮箱:yilun.zhao@yale.edu。 ## 引用 若您在工作中使用 **DocMath-Eval** 基准测试集,请引用该论文: @misc{zhao2024docmatheval, title={DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents}, author={Yilun Zhao and Yitao Long and Hongjun Liu and Ryo Kamoi and Linyong Nan and Lyuhao Chen and Yixin Liu and Xiangru Tang and Rui Zhang and Arman Cohan}, year={2024}, eprint={2311.09805}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.09805}, }
提供机构:
maas
创建时间:
2025-01-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作