adopd/adopd2024
收藏Hugging Face2025-07-13 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/adopd/adopd2024
下载链接
链接失效反馈官方服务:
资源简介:
ADOPD是一个为文档图像理解而设计的大规模数据集,包含120,000张图像,支持英语、中文、日语和其他语言。数据集通过结合大规模预训练模型和人工在环精炼过程,引入了一种新颖的数据驱动文档分类发现框架。它支持四种核心任务:文档实体区域分割、OCR文本块检测与分组、文档级语义标签预测和生成抽象字幕。
ADOPD is a large-scale dataset designed for document image understanding, containing 120,000 images supporting English, Chinese, Japanese, and other languages. It introduces a novel data-driven document taxonomy discovery framework that combines large-scale pretrained models with a human-in-the-loop refinement process. The dataset supports four core tasks: segmenting entity regions in documents, detecting and grouping OCR text blocks, predicting high-level semantic tags for documents, and generating abstracted captions.
提供机构:
adopd



