five

ENTRANT: A Large Financial Dataset for Table Understanding

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10667087
下载链接
链接失效反馈
官方服务:
资源简介:
Tabular data is a way to structure, organize and present information conveniently and effectively. Real-world tables present data in two dimensions by arranging cells in matrices that summarize information and facilitate side-by-side comparisons. Recent research efforts aim to train large models with machine learning methods to understand structured tables, a process that enables knowledge transfer in various downstream tasks. Model pre-training, though, requires large tabular datasets, conveniently formatted to reflect cell and table properties and characteristics. This paper presents a financial dataset, called ENTRANT that comprises millions of tables. The tables are transformed to reflect cell attributes, as well as positional and hierarchical information. Hence, they facilitate, among other things, pre-training tasks for table understanding with deep learning methods. The dataset provides table and cell information along with the corresponding metadata in a machine-readable JSON format. Furthermore, we have automated all data processing and curation in a free and open-access project. Moreover, we have technically validated the dataset, through unit testing of high code coverage. Finally, we demonstrate the use of the dataset in a pre-training task of a state-of-the-art model, which is also used for downstream cell classification.
创建时间:
2024-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作