Gesar Epic Entity Extraction Method Incorporating BERT Syllable Embedding
收藏科学数据银行2021-12-09 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=c2e789b696554702a340aeec5e70cb5c
下载链接
链接失效反馈官方服务:
资源简介:
In deep learning-based Gesar epic named entity recognition, vectorized representation is a central and crucial step. While the traditional syllable vector representation is too homogeneous, which leads to the optimal performance of the downstream task. To address this problem, this paper proposes the BERT-BILSTM-CRF method with Tibetan syllables as the basic unit. BERT, as a multi-layer representation learning, allows for enhanced semantic representation of Tibetan syllables and dynamic generation of syllable vectors based on contextual features through representation learning of Tibetan syllables, and thus more accurate identification of Gesar epic named entities. What is shown experimentally is that the method works well on the Gesar Classic corpus of the Four Descending Histories. The accuracy, recall and F-values were 98.56%, 98.67% and 98.11% respectively, and it was elicited that the line to entity ratio of the Gesar epic was 3:1, involving an equal number of complete lines and entities of the epic. This reflects the fact that the Gesar epic is a very rich dataset of named entities in Tibetan texts, thus demonstrating that named entities are part of the appeal of the epic.
提供机构:
HUANKE You; HUAQUE Cairang; DANZHENG Ji
创建时间:
2021-12-08



