TibNER:Tibetan Named Entity Recognition Dataset

Name: TibNER:Tibetan Named Entity Recognition Dataset
Creator: Science Data Bank
Published: 2025-04-27 22:19:16
License: 暂无描述

DataCite Commons2025-04-27 更新2025-04-16 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=0cb7427b4933474f817fc028cc038af0

下载链接

链接失效反馈

官方服务：

资源简介：

Structured linguistic resources are an important foundation for natural language processing. Currently, due to the lack of open-source datasets, the research on Tibetan named entity recognition progresses slowly and the results accumulate less. Based on this, this paper semi-automatically constructs a Tibetan named entity recognition dataset (TibNER) using an entity dictionary. In order to ensure the quality of the dataset, the automatic annotation results are manually proofread.TibNER contains 20,096 sentences, with an average sentence length of 44.2069 syllables, and the annotated entities include names of people, places, and organizations, with a total number of 43,678 in the three types of entities.In order to validate the validity of the dataset, this paper conducts a comparative test on three types of mainstream sequence annotation models, with an F1 value of up to 80.60%. After the study, this data provides data construction experience for low-resource languages, and provides certain data basis for studies such as Tibetan named entity recognition.

提供机构：

Science Data Bank

创建时间：

2024-05-24

搜集汇总

数据集介绍