Pre-trained Language Model Based Tibetan Text Classification Method

Name: Pre-trained Language Model Based Tibetan Text Classification Method
Creator: Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences; Congjun Long
Published: 2021-12-09 00:00:00
License: 暂无描述

科学数据银行2021-12-09 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/en/detail?dataSetId=4bd1ee7b3a47401b8c9708d177397d82

下载链接

链接失效反馈

官方服务：

资源简介：

Tibetan text classification is a basic task in Tibetan natural language processing. Based on large-scale pre-trained language model and fine-tuning is the current mainstream text classification model. However, Tibetan lacks open source large-scale text and pre-trained language model. To solve the above problems, this paper crawls a large-scale Tibetan text dataset, and trains a Tibetan pre-trained language model (bert-base-tibetan) based on the corpus. On the basis of this model, the experimental results on a variety of text classification models based on neural network show that the pre-trained language model can significantly improve the performance of Tibetan text classification (F1 value is increased by 9.3% on average), which verifies the value of the Tibetan pre-trained language model in Tibetan text classification task and other related Tibetan natural language processing tasks.

提供机构：

Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences; Congjun Long

创建时间：

2021-12-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集