five

The Annotated Corpus of Classical Tibetan (ACTib), Part I - Segmented version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL.

收藏
Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/823707
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus is a part-of-speech tagged version of Wallman, Jeff, Rowinski, Zach, Ngawang Trinley, Tomlinson, Chris, & Keutzer, Kurt. (2017). Collection of Tibetan etexts compiled by the Buddhist Digital Resource Center [Data set]. Zenodo. http://doi.org/10.5281/zenodo.821218 using the training data of Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878 using the memory based tagger of https://languagemachines.github.io/mbt/ Please note that the files are not post-processed or manually corrected and that a small number of files in the KarmaDelek directory were still annotated, although the original xml-input was corrupted already.
创建时间:
2023-06-28
二维码
社区交流群
二维码
科研交流群
商业服务