five

PubTables-1M: Detection Dataset

收藏
datasetninja.com2025-03-25 收录
下载链接:
https://datasetninja.com/pubtables-1m
下载链接
链接失效反馈
官方服务:
资源简介:
This is a Detection part of Microsoft PubTables-1M dataset, which is designed to address the limitations for table structure inference and extraction from unstructured documents. It comprises nearly one million tables extracted from scientific articles, offering support for multiple input modalities. Crucially, it includes detailed header and location information for table structures, enhancing its utility for diverse modeling approaches. The dataset not only quantifies improvements in training performance but also provides a more reliable estimate of model performance during evaluation for table structure recognition.

本数据集为微软PubTables-1M数据集的检测部分,旨在解决从非结构化文档中推断和提取表格结构的局限性。该数据集包含近百万张从科学文献中提取的表格,支持多种输入模式。尤为关键的是,它提供了详细的表头和结构位置信息,极大地提升了其在多样化建模方法中的应用价值。该数据集不仅量化了训练性能的提升,还为表格结构识别阶段的模型性能评估提供了更为可靠的估计。
提供机构:
datasetninja.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作