ds4sd/DocLayNet-v1.2
收藏Hugging Face2025-02-10 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/ds4sd/DocLayNet-v1.2
下载链接
链接失效反馈官方服务:
资源简介:
DocLayNet v1.2是一个用于文档布局分割的大型人工标注数据集,包含80863个独立页面的11个不同类别标签的边界框标注。这些页面涵盖了金融报告、科学文章、法律法规、政府招标、手册和专利等六个文档类别。数据集以其人工标注、布局多样性、详细的标签集、冗余标注和预定义的数据集分割而著称。
DocLayNet v1.2 is a large human-annotated dataset for document layout segmentation, containing bounding box annotations for 11 distinct class labels on 80863 unique pages across six document categories including financial reports, scientific articles, laws and regulations, government tenders, manuals, and patents. The dataset is notable for its human annotation, layout variability, detailed label set, redundant annotations, and predefined dataset splits.
提供机构:
ds4sd



