Chinese-tibetan bilingual named entity recognition dataset of Tibetan traditional festivals
收藏科学数据银行2022-09-26 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=fbf48909bf8c41d4bf1dd8c86baf0816
下载链接
链接失效反馈官方服务:
资源简介:
In terms of Chinese language, this study crawls the news obtained from the People's Daily website using festival names as search terms as initial data, and manually reviews and removes the news items that contain festival names but have nothing to do with the festival. The data collection time is April 22, 2022. Tibetan language aspect, this study used hainan Tibetan autonomous prefecture in qinghai province of Tibetan information technology research center development cloud hid ཡ ོ ང ས ་ འ ཛ ི ན search engine (https://www.yongzin.com) is similar to Chinese, this study took up in table 1 festival called search word search results, but considering the precision of results, This study selected the People's Daily online Tibetan edition (http://tibet.people.com.cn/), China's Tibet network (http://tb.tibet.cn/), the voice of China Tibet net (http://www.vtibet.cn/), hidden to the sun (http://ti.zangd Iyg.com/), Tibet, China news network (http://tb.chinatibetnews.com/) as a crawler to climb in the white list of web sites, namely all the Tibetan news corpora are from the website.Four Tibetan students whose native language is Tibetan and four Han students whose native language is Chinese were recruited to annotate the news data. Before the formal annotation, in order to ensure the annotation effect, this study conducted unified training for the annotators. The annotation software used was YEDDA[31], and the annotated entities included Tibetan festival ontology, Tibetan festival related events, Tibetan festival related articles and Tibetan festival related places.
提供机构:
SparklingDeng
创建时间:
2022-09-03



