KVP10k
收藏arXiv2024-05-01 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2405.00505v1
下载链接
链接失效反馈官方服务:
资源简介:
KVP10k是由IBM研究院以色列分部和苏黎世分部联合创建的一个全面的数据集,专注于商业文档中的关键值对提取。该数据集包含10707个丰富注释的图像,旨在解决非预定义关键值对提取的挑战。数据集通过广泛的网络爬虫和公共文件收集,确保了数据来源的多样性和真实性。KVP10k的应用领域广泛,包括但不限于自动化数据录入、快速信息检索和决策支持,旨在提高企业运营效率和竞争力。
KVP10k is a comprehensive dataset jointly developed by the Israel and Zurich Research Labs of IBM, focusing on key-value pair extraction from business documents. It contains 10,707 richly annotated images, and is designed to address the challenges of extracting non-predefined key-value pairs. The dataset is collected via extensive web crawling and public document sources, ensuring the diversity and authenticity of its data sources. KVP10k has a wide range of application scenarios, including but not limited to automated data entry, rapid information retrieval and decision support, aiming to enhance enterprise operational efficiency and competitiveness.
提供机构:
IBM研究院以色列分部,海法大学校区,卡梅尔山,海法3498825,以色列 IBM研究院苏黎世分部,Säumerstrasse 4,8803 Rüschlikon,瑞士
创建时间:
2024-05-01



