DocMath-Eval
收藏魔搭社区2026-05-02 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/yale-nlp/DocMath-Eval
下载链接
链接失效反馈官方服务:
资源简介:
## DocMath-Eval
[**🌐 Homepage**](https://docmath-eval.github.io/) | [**🤗 Dataset**](https://huggingface.co/datasets/yale-nlp/DocMath-Eval) | [**📖 arXiv**](https://arxiv.org/abs/2311.09805) | [**GitHub**](https://github.com/yale-nlp/DocMath-Eval)
The data for the paper [DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents](https://arxiv.org/abs/2311.09805).
**DocMath-Eval** is a comprehensive benchmark focused on numerical reasoning within specialized domains. It requires the model to comprehend long and specialized documents and perform numerical reasoning to answer the given question.
<p align="center">
<img src="figures/overview.png" width="100%">
</p>
## DocMath-Eval Dataset
All the data examples were divided into four subsets:
- **simpshort**, which is reannotated from [TAT-QA](https://aclanthology.org/2021.acl-long.254/) and [FinQA](https://aclanthology.org/2021.emnlp-main.300/), necessitates simple numerical reasoning over short document with one table
- **simplong**, which is reannotated from [MultiHiertt](https://aclanthology.org/2022.acl-long.454/), necessitates simple numerical reasoning over long document with multiple tables;
- **compshort**, which is reannotated from [TAT-HQA](https://aclanthology.org/2022.acl-long.5/), necessitates complex numerical reasoning over short document with one table;
- **complong**, which is annotated from scratch by our team, necessitates complex numerical reasoning over long document with multiple tables.
For each subset, we provide the *testmini* and *test* splits.
You can download this dataset by the following command:
```python
from datasets import load_dataset
dataset = load_dataset("yale-nlp/DocMath-Eval")
# print the first example on the complong testmini set
print(dataset["complong-testmini"][0])
```
The dataset is provided in json format and contains the following attributes:
```
{
"question_id": [string] The question id
"source": [string] The original source of the example (for simpshort, simplong, and compshort sets)
"original_question_id": [string] The original question id (for simpshort, simplong, and compshort sets)
"question": [string] The question text
"paragraphs": [list] List of paragraphs and tables within the document
"table_evidence": [list] List of indices in 'paragraphs' that are used as table evidence for the question
"paragraph_evidence": [list] List of indices in 'paragraphs' that are used as text evidence for the question
"python_solution": [string] Python-format and executable solution. This feature is hidden for the test set
"ground_truth": [float] Executed result of 'python_solution'. This feature is hidden for the test set
}
```
## Contact
For any issues or questions, kindly email us at: Yilun Zhao (yilun.zhao@yale.edu).
## Citation
If you use the **DocMath-Eval** benchmark in your work, please kindly cite the paper:
```
@misc{zhao2024docmatheval,
title={DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents},
author={Yilun Zhao and Yitao Long and Hongjun Liu and Ryo Kamoi and Linyong Nan and Lyuhao Chen and Yixin Liu and Xiangru Tang and Rui Zhang and Arman Cohan},
year={2024},
eprint={2311.09805},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2311.09805},
}
```
# DocMath-Eval
[**🌐 主页**](https://docmath-eval.github.io/) | [**🤗 数据集**](https://huggingface.co/datasets/yale-nlp/DocMath-Eval) | [**📖 arXiv论文**](https://arxiv.org/abs/2311.09805) | [**GitHub仓库**](https://github.com/yale-nlp/DocMath-Eval)
本数据集配套论文为《DocMath-Eval:评估大语言模型(Large Language Model, LLM)在理解长文本专业文档中的数学推理能力》。**DocMath-Eval** 是一款聚焦于专业领域数值推理的综合性基准测试集,要求模型理解长文本专业文档并完成数值推理以回答给定问题。
<p align="center">
<img src="figures/overview.png" width="100%">
</p>
## DocMath-Eval 数据集
所有数据样本被划分为四个子集:
- **simpshort**:从 [TAT-QA](https://aclanthology.org/2021.acl-long.254/) 与 [FinQA](https://aclanthology.org/2021.emnlp-main.300/) 重新标注得到,需要针对包含单张表格的短文档完成简单数值推理;
- **simplong**:从 [MultiHiertt](https://aclanthology.org/2022.acl-long.454/) 重新标注得到,需要针对包含多张表格的长文档完成简单数值推理;
- **compshort**:从 [TAT-HQA](https://aclanthology.org/2022.acl-long.5/) 重新标注得到,需要针对包含单张表格的短文档完成复杂数值推理;
- **complong**:由我们团队从头标注得到,需要针对包含多张表格的长文档完成复杂数值推理。
每个子集均提供 *testmini* 与 *test* 划分。
您可通过以下命令下载该数据集:
python
from datasets import load_dataset
dataset = load_dataset("yale-nlp/DocMath-Eval")
# 打印 complong testmini 集合中的第一个样本
print(dataset["complong-testmini"][0])
该数据集以JSON格式提供,包含以下字段:
{
"question_id": [字符串] 问题唯一标识符
"source": [字符串] 样本的原始来源(仅适用于simpshort、simplong与compshort子集)
"original_question_id": [字符串] 原始问题标识符(仅适用于simpshort、simplong与compshort子集)
"question": [字符串] 问题文本
"paragraphs": [列表] 文档内段落与表格的列表
"table_evidence": [列表] 存储该问题所引用表格证据在`paragraphs`列表中对应索引的列表
"paragraph_evidence": [列表] 存储该问题所引用文本证据在`paragraphs`列表中对应索引的列表
"python_solution": [字符串] 符合Python格式且可执行的解题代码。该字段在测试集上处于隐藏状态
"ground_truth": [浮点数] `python_solution`的执行结果。该字段在测试集上处于隐藏状态
}
## 联系方式
如有任何问题或疑问,请致信我们:赵逸伦(Yilun Zhao),邮箱:yilun.zhao@yale.edu。
## 引用
若您在工作中使用 **DocMath-Eval** 基准测试集,请引用该论文:
@misc{zhao2024docmatheval,
title={DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents},
author={Yilun Zhao and Yitao Long and Hongjun Liu and Ryo Kamoi and Linyong Nan and Lyuhao Chen and Yixin Liu and Xiangru Tang and Rui Zhang and Arman Cohan},
year={2024},
eprint={2311.09805},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2311.09805},
}
提供机构:
maas
创建时间:
2025-01-29



