Llamacha/ner_quechua_iic

Name: Llamacha/ner_quechua_iic
Creator: Llamacha
Published: 2022-10-02 14:19:29
License: 暂无描述

Hugging Face2022-10-02 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Llamacha/ner_quechua_iic

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language: - qu license: - apache-2.0 size_categories: - n<1K source_datasets: - original task_categories: - token-classification task_ids: - named-entity-recognition --- # Dataset Card for WikiANN ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Paper:** The original datasets come from Introducing QuBERT: A Large Monolingual Corpus and BERT Model for Southern Quechua [paper](https://aclanthology.org/2022.deeplo-1.1.pdf) by Rodolfo Zevallos et al. (2022). - **Point of Contact:** [Rodolfo Zevallos](mailto:rodolfojoel.zevallos@upf.edu) ### Dataset Summary NER_Quechua_IIC is a named entity recognition dataset consisting of dictionary texts provided by the Peruvian Ministry of Education, annotated with LOC (location), PER (person) and ORG (organization) tags in the IOB2 format. ### Supported Tasks and Leaderboards - `named-entity-recognition`: The dataset can be used to train a model for named entity recognition in Quechua languages.

提供机构：

Llamacha

原始信息汇总

数据集概述

名称: WikiANN
语言: Quechua (qu)
许可: Apache-2.0
数据集大小: 小于1000条记录
数据来源: 原始数据
任务类别: 词元分类
任务ID: 命名实体识别

数据集详细信息

数据集摘要: NER_Quechua_IIC是一个命名实体识别数据集，包含由秘鲁教育部提供的字典文本，标注有LOC（位置）、PER（人物）和ORG（组织）标签，采用IOB2格式。
支持的任务和排行榜:
- 命名实体识别: 该数据集可用于训练Quechua语言的命名实体识别模型。
数据实例: 未提供具体信息。
数据字段: 未提供具体信息。
数据分割: 未提供具体信息。
数据集创建:
- 来源数据: 数据来自秘鲁教育部的字典文本。
- 标注: 通过众包方式进行标注。
- 个人和敏感信息: 未提供具体信息。
使用数据时的考虑:
- 数据集的社会影响: 未提供具体信息。
- 偏见讨论: 未提供具体信息。
- 其他已知限制: 未提供具体信息。
附加信息:
- 数据集管理者: 未提供具体信息。
- 许可信息: 遵循Apache-2.0许可。
- 引用信息: 未提供具体信息。
- 贡献: 未提供具体信息。

5,000+

优质数据集

54 个

任务类型

进入经典数据集