藏文标准体文字识别训练测试数据集

国家基础学科公共科学数据中心2024-03-05 收录

下载链接：

https://www.nbsdc.cn/general/dataDetail?id=64f08269bb16e06dfdc78c69&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

通过激光扫描仪扫描或收集已扫描拍照的木刻本藏文古籍文献原始图片，采用开源的标注工具，在原始图片上进行文本块位置标注和对应位置文本内容的标注。共扫描、收集并标注藏文文献木刻本数据9400张。每个数据样本包含原始图片.jpg文件、图片对应的文本数据.txt文件、文本块位置标注信息.xml文件，数据完整。

Raw images of Tibetan ancient woodblock-printed books were acquired via laser scanning or collected from pre-existing scanned photographs. Open-source annotation tools were used to annotate both the positions of text blocks and the corresponding text content on these raw images. A total of 9400 datasets of Tibetan woodblock-printed ancient books were scanned, collected and annotated. Each dataset sample includes three files: the original .jpg image, the corresponding text data file in .txt format, and the annotation file containing text block position information in .xml format. The dataset is fully complete and standardized.

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集包含9400张藏文古籍文献木刻本的扫描图片，每张图片配有文本数据文件和位置标注信息，总数据量为18.49GB。数据集适用于藏文标准体文字识别的研究和测试，支持计算机应用和人工智能领域的相关研究。

以上内容由遇见数据集搜集并总结生成