分栏区域检测数据集
收藏魔搭社区2025-12-04 更新2024-11-16 收录
下载链接:
https://modelscope.cn/datasets/irhawks/column-det
下载链接
链接失效反馈官方服务:
资源简介:
分栏区域检测任务主要以包含页面的一个文档图像的版芯区域作为分析对象,分析版芯内部的分栏的情况,在许多报纸、学术论文、杂志以及网页等来源的文档图像中,经常会使用多栏布局紧凑地展示出版物的内容,并且可能在引入标题、插图和标题的时候,掺入一些多栏元素。这种做法能够提高页面的利用率,但是也需要我们进一步细化标注标准,避免出现模糊的定义。原则上,分栏区域检测也可以定义在页眉、页脚和边注等区域当中。
The column region detection task primarily takes the print block area of a full-page document image as the core analysis object, aiming to identify and analyze the column layout within the print block. In document images from various sources including newspapers, academic papers, magazines and web pages, multi-column layouts are widely adopted to compactly display the content of publications. When adding headings, illustrations and corresponding captions, some multi-column elements may also be incorporated into the layout. This approach effectively improves page space utilization, but it also necessitates further refinement of annotation criteria to eliminate ambiguous definitions. In principle, column region detection can also be extended to cover areas such as headers, footers and marginal notes.
提供机构:
maas
创建时间:
2024-11-05
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集专注于文档图像中文本正文区域的分栏检测任务,用于分析多栏布局并识别出多个单栏区域。它通常与页面结构分析并行运行,有助于确定文档的阅读顺序,适用于报纸、学术论文等多栏文档类型。
以上内容由遇见数据集搜集并总结生成



