five

TLUNIFIED-NER

收藏
arXiv2023-11-13 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/ljvmiranda921/tlunified-ner
下载链接
链接失效反馈
官方服务:
资源简介:
TLUNIFIED-NER是一个专为Tagalog语言设计的命名实体识别数据集,由菲律宾的研究者开发。该数据集包含约7.8k份文档,涵盖三种实体类型:人、组织和地点,数据来源于新闻报道。创建过程中,由三位母语为Tagalog的标注者进行迭代标注,确保了高质量的标注结果。TLUNIFIED-NER旨在填补菲律宾语言资源中的空白,特别是在命名实体识别领域的资源稀缺问题。该数据集的应用领域包括自然语言处理和信息提取,特别是在需要从文本中提取结构化信息的场景中。

TLUNIFIED-NER is a named entity recognition (NER) dataset tailored specifically for the Tagalog language, developed by researchers based in the Philippines. The dataset comprises approximately 7.8 thousand documents spanning three entity categories: person, organization, and location, with all data sourced from news reports. During the dataset construction process, three annotators who are native Tagalog speakers performed iterative annotation to ensure high-quality annotation results. TLUNIFIED-NER aims to fill the gap in Philippine language resources, particularly addressing the scarcity of available resources in the field of named entity recognition. Its application fields include natural language processing and information extraction, especially in scenarios requiring structured information extraction from text.
提供机构:
菲律宾
创建时间:
2023-11-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作