five

Pre-trained Language Model Based Tibetan Text Classification Method

收藏
科学数据银行2021-12-09 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=4bd1ee7b3a47401b8c9708d177397d82
下载链接
链接失效反馈
官方服务:
资源简介:
Tibetan text classification is a basic task in Tibetan natural language processing. Based on large-scale pre-trained language model and fine-tuning is the current mainstream text classification model. However, Tibetan lacks open source large-scale text and pre-trained language model. To solve the above problems, this paper crawls a large-scale Tibetan text dataset, and trains a Tibetan pre-trained language model (bert-base-tibetan) based on the corpus. On the basis of this model, the experimental results on a variety of text classification models based on neural network show that the pre-trained language model can significantly improve the performance of Tibetan text classification (F1 value is increased by 9.3% on average), which verifies the value of the Tibetan pre-trained language model in Tibetan text classification task and other related Tibetan natural language processing tasks.
提供机构:
Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences; Congjun Long
创建时间:
2021-12-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作