公式块检测数据集
收藏魔搭社区2025-11-27 更新2024-11-16 收录
下载链接:
https://modelscope.cn/datasets/irhawks/math-block-det
下载链接
链接失效反馈官方服务:
资源简介:
公式块检测任务会将数学公式连同公式的编号(有些文献公式的编号在左,有些文献中数学公式的编号在右,还有一些公式,编号并不是阿拉伯数字)合在一起。公式块检测主要用于后续将数学公式与其编号关联起来,从而更高质量地重建文档内容。公式块的边界有两种方式。一种是实际的边界,将公式及其编号文字所在的区域盖住即可。另外一种是与所在栏对齐,宽度与所在栏相同。目前的数据集和模型采用第二种。
Formula block detection task integrates mathematical formulas together with their corresponding equation numbers. It should be noted that the numbering of formulas may appear on the left in some literature, on the right in others, and some formulas may even use non-Arabic numerals as their numbering. The primary purpose of formula block detection is to associate mathematical formulas with their respective numbers in subsequent processing, thereby enabling higher-quality reconstruction of document content. There are two ways to define the boundaries of a formula block: one is the physical boundary, which only needs to cover the area where the formula and its numbering text are located; the other is to align with the hosting column, with the width matching that of the column. Currently, most existing datasets and models adopt the second boundary definition.
提供机构:
maas
创建时间:
2024-11-05
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集专注于文档图像中的公式块检测任务,旨在识别数学公式及其编号的关联区域,以支持文档内容的高质量重建。它采用与列对齐的边界定义方法,并与其他文档分析任务如页面结构分析和布局结构分析相关联。
以上内容由遇见数据集搜集并总结生成



