FATURA
收藏arXiv2023-11-20 更新2024-06-21 收录
下载链接:
https://zenodo.org/record/8261508
下载链接
链接失效反馈官方服务:
资源简介:
FATURA是由数字研究中心创建的一个多布局发票图像数据集,包含10,000张发票图像,每张图像具有50种不同的布局,是目前最大的公开可访问的发票文档图像数据集。该数据集通过精心设计,旨在解决发票分析中的多样性和复杂性问题。数据集的创建过程涉及从真实发票模板中提取布局信息,并通过随机生成文本内容来增加数据的多样性。FATURA数据集的应用领域广泛,包括财务、医疗、法律和行政等多个领域,旨在通过提供高质量的训练数据来提高文档分析和理解模型的性能。
FATURA is a multi-layout invoice image dataset developed by the Digital Research Center. It comprises 10,000 invoice images, each featuring 50 distinct layouts, making it the largest publicly accessible invoice document image dataset to date. Meticulously crafted to address the diversity and complexity challenges inherent in invoice analysis, the dataset is constructed by extracting layout information from real-world invoice templates and generating random textual content to augment data diversity. The FATURA dataset has wide-ranging applications across finance, healthcare, legal and administrative fields, and it aims to enhance the performance of document analysis and understanding models by providing high-quality training data.
提供机构:
数字研究中心
创建时间:
2023-11-20



