PACTib - PArsed Corpus of Tibetan (11th-21st c.)

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/12104249

下载链接

链接失效反馈

官方服务：

资源简介：

This PArsed Corpus of Tibetan (PACTib) contains >5000 historical Tibetan texts (>82m words) from over 10 different centuries. The original texts are from the Buddhist Digital Resource Center (BDRC) automatically enriched with linguistic annotation in the form of segmentation (tokenisation), Part-of-Speech Tags and constituency parses. Files in this deposit are:- a csv file with an overview of all texts with metadata linking file IDs + date ranges- segmented & POS-tagged txt files (using the ACTib segmenter & tagger)- parsed txt files (using the ACTib parser - forth.) Note that only the dated files are part of this collection. More information about the corpus can be found in:Meelen, M., & Roux, É. (2020). Meta-dating the PArsed Corpus of Tibetan (PACTib). In Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories (pp. 31-42).

创建时间：

2024-06-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集