Tibetan Modern U-chen Print (TMUP) 0.1: Training Data for a Transkribus HTR Model for Modern Tibetan Printed Texts
收藏DataCite Commons2024-10-23 更新2024-07-13 收录
下载链接:
https://repository.crossasia.org/receive/crossasia_mods_00000352
下载链接
链接失效反馈官方服务:
资源简介:
Tibetan Modern U-chen Print 0.1 (TMUP 0.1) is the first Transkribus model for printed Tibetan language publications in Uchen (དབུ་ཅན་ dbu can) script (Model ID 60669). It has been trained mainly on texts that were published in the PRC between the 1950s and 1980s. The repository contains 522 pages in 20 documents in jpg-format alongside transcriptions in Transkribus pageXML. The training set consists of 470 pages; the validation set consists of 52 (10%) automatically selected pages. No base model was used. The model was developed by Franz Xaver Erhard (Leipzig University) and Xiaoying 笑影 (Leipzig University) for the Divergent Discourses project (DFG/AHRC).
提供机构:
Fachinformationsdienst (FID) Asien
创建时间:
2024-03-15



