five

KazNERD

收藏
arXiv2022-04-07 更新2024-06-21 收录
下载链接:
https://github.com/IS2AI/KazNERD
下载链接
链接失效反馈
官方服务:
资源简介:
KazNERD是由纳扎尔巴耶夫大学的智能系统和人工智能研究所开发的数据集,专注于哈萨克语命名实体识别。该数据集包含112,702个来自电视新闻的句子,共有136,333个命名实体标注,覆盖25个实体类别。数据集的创建过程涉及两位哈萨克语母语者的手动标注,并遵循IOB2标注方案。KazNERD的应用领域主要集中在自动文本理解和机器翻译,旨在解决哈萨克语在命名实体识别领域的资源稀缺问题。

KazNERD is a dataset developed by the Institute of Intelligent Systems and Artificial Intelligence at Nazarbayev University, focusing on Kazakh language named entity recognition. This dataset contains 112,702 sentences sourced from television news, with a total of 136,333 annotated named entities spanning 25 entity categories. The dataset was manually annotated by two native Kazakh speakers and follows the IOB2 annotation scheme. KazNERD is primarily applied in automatic text understanding and machine translation, aiming to address the resource scarcity issue of Kazakh in the field of named entity recognition.
提供机构:
智能系统和人工智能研究所
创建时间:
2021-11-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作