five

Open-Tables, ICT-TD

收藏
arXiv2023-11-09 更新2024-06-21 收录
下载链接:
http://ieee-dataport.org/documents/table-detection-dataset-visually-rich-documents
下载链接
链接失效反馈
官方服务:
资源简介:
Open-Tables数据集是由渥太华大学电气工程与计算机科学学院的研究人员整合并清理多个高质量标注的公开数据集(如ICDAR2013, ICDAR2017, ICDAR2019, Marmot, TNCR)而成,旨在提供一个更大且标注一致的数据集,以更可靠地评估模型性能。ICT-TD数据集则专注于ICT领域,通过收集175,682个PDF文档并从中随机抽样5,000个进行手动标注,以包含该领域特有的复杂表格结构和内容。这两个数据集均旨在解决现有数据集在复杂性和领域适用性方面的不足,特别是在跨领域设置中评估模型性能的需求。

The Open-Tables dataset was compiled and cleaned by researchers from the School of Electrical Engineering and Computer Science, University of Ottawa, integrating multiple high-quality annotated public datasets including ICDAR2013, ICDAR2017, ICDAR2019, Marmot, and TNCR. It aims to provide a larger, consistently annotated dataset for more reliable evaluation of model performance. The ICT-TD dataset focuses on the ICT domain. It collects 175,682 PDF documents and randomly samples 5,000 of them for manual annotation to cover the complex table structures and content unique to this field. Both datasets are designed to address the limitations of existing datasets in terms of complexity and domain applicability, especially the demand for evaluating model performance in cross-domain settings.
提供机构:
渥太华大学电气工程与计算机科学学院
创建时间:
2023-05-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作