yoruba_gv_ner
收藏huggingface.co2025-03-22 收录
下载链接:
https://huggingface.co/datasets/ajesujoba/yoruba_gv_ner
下载链接
链接失效反馈官方服务:
资源简介:
The Yoruba GV NER dataset is a labeled dataset for named entity recognition in Yoruba. The texts were obtained from
Yoruba Global Voices News articles https://yo.globalvoices.org/ . We concentrate on
four types of named entities: persons [PER], locations [LOC], organizations [ORG], and dates & time [DATE].
The Yoruba GV NER data files contain 2 columns separated by a tab ('\t'). Each word has been put on a separate line and
there is an empty line after each sentences i.e the CoNLL format. The first item on each line is a word, the second
is the named entity tag. The named entity tags have the format I-TYPE which means that the word is inside a phrase
of type TYPE. For every multi-word expression like 'New York', the first word gets a tag B-TYPE and the subsequent words
have tags I-TYPE, a word with tag O is not part of a phrase. The dataset is in the BIO tagging scheme.
For more details, see https://www.aclweb.org/anthology/2020.lrec-1.335/
该约鲁巴GV NER数据集是一款用于约鲁巴语命名实体识别的标注数据集。数据文本源自约鲁巴全球声音新闻文章(https://yo.globalvoices.org/)。本数据集专注于四种命名实体类型:人物[PER]、地理位置[LOC]、组织[ORG]以及日期与时间[DATE]。约鲁巴GV NER数据文件包含两列,以制表符(' ')分隔。每个单词单独占据一行,并在每个句子之后空一行,即符合CoNLL格式。每行的第一个项目是单词,第二个是命名实体标签。命名实体标签的格式为I-TYPE,意指该单词位于类型为TYPE的短语之内。对于诸如'New York'等多词表达式,第一个单词获得B-TYPE标签,随后的单词则具有I-TYPE标签,标签为O的单词不属于任何短语。该数据集遵循BIO标注方案。欲了解更多详细信息,请参阅https://www.aclweb.org/anthology/2020.lrec-1.335/。
提供机构:
huggingface.co



