Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping - Validation split
收藏Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://radar.kit.edu/radar/en/dataset/GPweoqzKRRZGOAEE
下载链接
链接失效反馈官方服务:
资源简介:
Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d.
诸多商业流程中会涉及打印表单,例如发票与收据,这类表单通常需经人工数字化处理,以实现数据的持久化检索与存储。硬件扫描仪成本高昂且灵活性不足,智能手机正愈发广泛地被应用于表单数字化工作。在此场景下,处理算法需要应对各类常见环境干扰因素,例如阴影与褶皱。当前主流前沿方法基于原始图像与校正网格(rectification meshes)的配对数据,训练有监督的图像去形变模型。现有研究结果显示,这类去形变模型的预测精度表现可观,但仍会产生校正误差,进而导致信息检索效果未达最优。本文探讨了利用发票模板形式的附加结构化信息,优化去形变模型性能的可行性。本文提出两项核心贡献:其一,构建了名为Inv3D的全新数据集,该数据集包含合成与实拍的高分辨率发票图像,配套提供结构化模板、校正网格以及丰富的逐像素监督信号;其二,提出一种新型图像去形变算法,该算法对当前前沿方法GeoTr进行扩展,通过注意力机制利用结构化模板信息。本文通过包含DewarpNet在内的多种模型实现开展了充分的评估,结果表明,利用结构化模板可有效提升图像去形变任务的性能。在本文构建的全新基准数据集上,所提算法在所有评估指标上均取得了更优性能,其中局部畸变校正精度提升了26.1%。本研究已将全新数据集与全部代码公开,可通过https://felixhertlein.github.io/inv3d获取。
创建时间:
2024-01-31



