DocMath-Eval

Name: DocMath-Eval
Creator: maas
Published: 2026-05-02 09:09:42
License: 暂无描述

魔搭社区2026-05-02 更新2025-02-01 收录

下载链接：

https://modelscope.cn/datasets/yale-nlp/DocMath-Eval

下载链接

链接失效反馈

官方服务：

资源简介：

## DocMath-Eval [**🌐 Homepage**](https://docmath-eval.github.io/) | [**🤗 Dataset**](https://huggingface.co/datasets/yale-nlp/DocMath-Eval) | [**📖 arXiv**](https://arxiv.org/abs/2311.09805) | [**GitHub**](https://github.com/yale-nlp/DocMath-Eval) The data for the paper [DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents](https://arxiv.org/abs/2311.09805). **DocMath-Eval** is a comprehensive benchmark focused on numerical reasoning within specialized domains. It requires the model to comprehend long and specialized documents and perform numerical reasoning to answer the given question. <p align="center"> <img src="figures/overview.png" width="100%"> </p> ## DocMath-Eval Dataset All the data examples were divided into four subsets: - **simpshort**, which is reannotated from [TAT-QA](https://aclanthology.org/2021.acl-long.254/) and [FinQA](https://aclanthology.org/2021.emnlp-main.300/), necessitates simple numerical reasoning over short document with one table - **simplong**, which is reannotated from [MultiHiertt](https://aclanthology.org/2022.acl-long.454/), necessitates simple numerical reasoning over long document with multiple tables; - **compshort**, which is reannotated from [TAT-HQA](https://aclanthology.org/2022.acl-long.5/), necessitates complex numerical reasoning over short document with one table; - **complong**, which is annotated from scratch by our team, necessitates complex numerical reasoning over long document with multiple tables. For each subset, we provide the *testmini* and *test* splits. You can download this dataset by the following command: ```python from datasets import load_dataset dataset = load_dataset("yale-nlp/DocMath-Eval") # print the first example on the complong testmini set print(dataset["complong-testmini"][0]) ``` The dataset is provided in json format and contains the following attributes: ``` { "question_id": [string] The question id "source": [string] The original source of the example (for simpshort, simplong, and compshort sets) "original_question_id": [string] The original question id (for simpshort, simplong, and compshort sets) "question": [string] The question text "paragraphs": [list] List of paragraphs and tables within the document "table_evidence": [list] List of indices in 'paragraphs' that are used as table evidence for the question "paragraph_evidence": [list] List of indices in 'paragraphs' that are used as text evidence for the question "python_solution": [string] Python-format and executable solution. This feature is hidden for the test set "ground_truth": [float] Executed result of 'python_solution'. This feature is hidden for the test set } ``` ## Contact For any issues or questions, kindly email us at: Yilun Zhao (yilun.zhao@yale.edu). ## Citation If you use the **DocMath-Eval** benchmark in your work, please kindly cite the paper: ``` @misc{zhao2024docmatheval, title={DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents}, author={Yilun Zhao and Yitao Long and Hongjun Liu and Ryo Kamoi and Linyong Nan and Lyuhao Chen and Yixin Liu and Xiangru Tang and Rui Zhang and Arman Cohan}, year={2024}, eprint={2311.09805}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.09805}, } ```

# DocMath-Eval [**🌐 主页**](https://docmath-eval.github.io/) | [**🤗 数据集**](https://huggingface.co/datasets/yale-nlp/DocMath-Eval) | [**📖 arXiv论文**](https://arxiv.org/abs/2311.09805) | [**GitHub仓库**](https://github.com/yale-nlp/DocMath-Eval) 本数据集配套论文为《DocMath-Eval：评估大语言模型（Large Language Model, LLM）在理解长文本专业文档中的数学推理能力》。**DocMath-Eval** 是一款聚焦于专业领域数值推理的综合性基准测试集，要求模型理解长文本专业文档并完成数值推理以回答给定问题。 <p align="center"> <img src="figures/overview.png" width="100%"> </p> ## DocMath-Eval 数据集所有数据样本被划分为四个子集： - **simpshort**：从 [TAT-QA](https://aclanthology.org/2021.acl-long.254/) 与 [FinQA](https://aclanthology.org/2021.emnlp-main.300/) 重新标注得到，需要针对包含单张表格的短文档完成简单数值推理； - **simplong**：从 [MultiHiertt](https://aclanthology.org/2022.acl-long.454/) 重新标注得到，需要针对包含多张表格的长文档完成简单数值推理； - **compshort**：从 [TAT-HQA](https://aclanthology.org/2022.acl-long.5/) 重新标注得到，需要针对包含单张表格的短文档完成复杂数值推理； - **complong**：由我们团队从头标注得到，需要针对包含多张表格的长文档完成复杂数值推理。每个子集均提供 *testmini* 与 *test* 划分。您可通过以下命令下载该数据集： python from datasets import load_dataset dataset = load_dataset("yale-nlp/DocMath-Eval") # 打印 complong testmini 集合中的第一个样本 print(dataset["complong-testmini"][0]) 该数据集以JSON格式提供，包含以下字段： { "question_id": [字符串] 问题唯一标识符 "source": [字符串] 样本的原始来源（仅适用于simpshort、simplong与compshort子集） "original_question_id": [字符串] 原始问题标识符（仅适用于simpshort、simplong与compshort子集） "question": [字符串] 问题文本 "paragraphs": [列表] 文档内段落与表格的列表 "table_evidence": [列表] 存储该问题所引用表格证据在`paragraphs`列表中对应索引的列表 "paragraph_evidence": [列表] 存储该问题所引用文本证据在`paragraphs`列表中对应索引的列表 "python_solution": [字符串] 符合Python格式且可执行的解题代码。该字段在测试集上处于隐藏状态 "ground_truth": [浮点数] `python_solution`的执行结果。该字段在测试集上处于隐藏状态 } ## 联系方式如有任何问题或疑问，请致信我们：赵逸伦（Yilun Zhao），邮箱：yilun.zhao@yale.edu。 ## 引用若您在工作中使用 **DocMath-Eval** 基准测试集，请引用该论文： @misc{zhao2024docmatheval, title={DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents}, author={Yilun Zhao and Yitao Long and Hongjun Liu and Ryo Kamoi and Linyong Nan and Lyuhao Chen and Yixin Liu and Xiangru Tang and Rui Zhang and Arman Cohan}, year={2024}, eprint={2311.09805}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.09805}, }

提供机构：

maas

创建时间：

2025-01-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集