community-datasets/caner
收藏Hugging Face2024-01-16 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/community-datasets/caner
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language_creators:
- expert-generated
language:
- ar
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- token-classification
task_ids:
- named-entity-recognition
pretty_name: CANER
dataset_info:
features:
- name: token
dtype: string
- name: ner_tag
dtype:
class_label:
names:
'0': Allah
'1': Book
'2': Clan
'3': Crime
'4': Date
'5': Day
'6': Hell
'7': Loc
'8': Meas
'9': Mon
'10': Month
'11': NatOb
'12': Number
'13': O
'14': Org
'15': Para
'16': Pers
'17': Prophet
'18': Rlig
'19': Sect
'20': Time
splits:
- name: train
num_bytes: 5095617
num_examples: 258240
download_size: 1459014
dataset_size: 5095617
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for CANER
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
- **Repository:** [Classical-Arabic-Named-Entity-Recognition-Corpus](https://github.com/RamziSalah)
- **Paper:** [Researchgate](https://www.researchgate.net/publication/330075080_BUILDING_THE_CLASSICAL_ARABIC_NAMED_ENTITY_RECOGNITION_CORPUS_CANERCORPUS)
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
The Classical Arabic Named Entity Recognition corpus is a new corpus of tagged data that can be useful for handling the issues in recognition of Arabic named entities.
### Supported Tasks and Leaderboards
- Named Entity Recognition
### Languages
Classical Arabic
## Dataset Structure
### Data Instances
An example from the dataset:
```
{'ner_tag': 1, 'token': 'الجامع'}
```
Where 1 stands for "Book"
### Data Fields
- `id`: id of the sample
- `token`: the tokens of the example text
- `ner_tag`: the NER tags of each token
The NER tags correspond to this list:
```
"Allah",
"Book",
"Clan",
"Crime",
"Date",
"Day",
"Hell",
"Loc",
"Meas",
"Mon",
"Month",
"NatOb",
"Number",
"O",
"Org",
"Para",
"Pers",
"Prophet",
"Rlig",
"Sect",
"Time"
```
### Data Splits
Training splits only
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
Ramzi Salah and Lailatul Qadri Zakaria
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
[More Information Needed]
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
@article{article,
author = {Salah, Ramzi and Zakaria, Lailatul},
year = {2018},
month = {12},
pages = {},
title = {BUILDING THE CLASSICAL ARABIC NAMED ENTITY RECOGNITION CORPUS (CANERCORPUS)},
volume = {96},
journal = {Journal of Theoretical and Applied Information Technology}
}
### Contributions
Thanks to [@KMFODA](https://github.com/KMFODA) for adding this dataset.
提供机构:
community-datasets
原始信息汇总
数据集卡片 for CANER
数据集描述
数据集摘要
古典阿拉伯命名实体识别语料库(CANER)是一个新的标记数据语料库,可用于处理阿拉伯命名实体识别中的问题。
支持的任务和排行榜
- 命名实体识别
语言
古典阿拉伯语
数据集结构
数据实例
数据集中的一个示例:
{ner_tag: 1, token: الجامع}
其中 1 代表 "Book"
数据字段
id: 样本的IDtoken: 示例文本的标记ner_tag: 每个标记的NER标签
NER标签对应以下列表:
"Allah", "Book", "Clan", "Crime", "Date", "Day", "Hell", "Loc", "Meas", "Mon", "Month", "NatOb", "Number", "O", "Org", "Para", "Pers", "Prophet", "Rlig", "Sect", "Time"
数据分割
仅包含训练分割
数据集创建
注释者
Ramzi Salah 和 Lailatul Qadri Zakaria
额外信息
引用信息
@article{article, author = {Salah, Ramzi and Zakaria, Lailatul}, year = {2018}, month = {12}, pages = {}, title = {BUILDING THE CLASSICAL ARABIC NAMED ENTITY RECOGNITION CORPUS (CANERCORPUS)}, volume = {96}, journal = {Journal of Theoretical and Applied Information Technology} }
贡献
感谢 @KMFODA 添加此数据集。



