PACTib - PArsed Corpus of Tibetan (11th-21st c.)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12104249
下载链接
链接失效反馈官方服务:
资源简介:
This PArsed Corpus of Tibetan (PACTib) contains >5000 historical Tibetan texts (>82m words) from over 10 different centuries. The original texts are from the Buddhist Digital Resource Center (BDRC) automatically enriched with linguistic annotation in the form of segmentation (tokenisation), Part-of-Speech Tags and constituency parses. Files in this deposit are:- a csv file with an overview of all texts with metadata linking file IDs + date ranges- segmented & POS-tagged txt files (using the ACTib segmenter & tagger)- parsed txt files (using the ACTib parser - forth.)
Note that only the dated files are part of this collection. More information about the corpus can be found in:Meelen, M., & Roux, É. (2020). Meta-dating the PArsed Corpus of Tibetan (PACTib). In Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories (pp. 31-42).
创建时间:
2024-06-18



