藏文文献版面分析训练测试数据集

国家基础学科公共科学数据中心2024-03-05 收录

下载链接：

https://www.nbsdc.cn/general/dataDetail?id=64f0826abb16e06dfdc78c6d&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

通过激光扫描仪扫描或收集已扫描拍照的藏文文献原始图片，采用开源的标注工具，在原始图片上进行版面信息标注。共扫描、收集并标注藏文文献版面分析标注数据6000张。每个数据样本包含原始图片.jpg文件和文本块位置标注信息.xml文件，数据完整。

Original images of Tibetan documents were either scanned using laser scanners or collected from pre-scanned photographs. Open-source annotation tools were employed to annotate layout information on these original images. In total, 6000 annotated samples for Tibetan document layout analysis were obtained through scanning, collection and annotation work. Each data sample contains a .jpg-format original image file and a .xml-format annotation file that records the positional information of text blocks, with the entire dataset being complete and intact.

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个用于藏文文献版面分析任务的训练和测试资源，包含6000张扫描或收集的藏文文献原始图片，每张图片都配有文本块位置标注的XML文件，总数据量为1.34GB。它旨在支持计算机应用和人工智能领域的研究，特别是藏文古籍的数字化处理，数据完整且来自国家重点研发计划项目。

以上内容由遇见数据集搜集并总结生成