PubTables-1M: Detection Dataset
收藏datasetninja.com2025-03-25 收录
下载链接:
https://datasetninja.com/pubtables-1m
下载链接
链接失效反馈官方服务:
资源简介:
This is a Detection part of Microsoft PubTables-1M dataset, which is designed to address the limitations for table structure inference and extraction from unstructured documents. It comprises nearly one million tables extracted from scientific articles, offering support for multiple input modalities. Crucially, it includes detailed header and location information for table structures, enhancing its utility for diverse modeling approaches. The dataset not only quantifies improvements in training performance but also provides a more reliable estimate of model performance during evaluation for table structure recognition.
本数据集为微软PubTables-1M数据集的检测部分,旨在解决从非结构化文档中推断和提取表格结构的局限性。该数据集包含近百万张从科学文献中提取的表格,支持多种输入模式。尤为关键的是,它提供了详细的表头和结构位置信息,极大地提升了其在多样化建模方法中的应用价值。该数据集不仅量化了训练性能的提升,还为表格结构识别阶段的模型性能评估提供了更为可靠的估计。
提供机构:
datasetninja.com



