community-datasets/caner

Name: community-datasets/caner
Creator: community-datasets
Published: 2024-01-16 13:38:20
License: 暂无描述

Hugging Face2024-01-16 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/community-datasets/caner

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated language_creators: - expert-generated language: - ar license: - unknown multilinguality: - monolingual size_categories: - 100K<n<1M source_datasets: - original task_categories: - token-classification task_ids: - named-entity-recognition pretty_name: CANER dataset_info: features: - name: token dtype: string - name: ner_tag dtype: class_label: names: '0': Allah '1': Book '2': Clan '3': Crime '4': Date '5': Day '6': Hell '7': Loc '8': Meas '9': Mon '10': Month '11': NatOb '12': Number '13': O '14': Org '15': Para '16': Pers '17': Prophet '18': Rlig '19': Sect '20': Time splits: - name: train num_bytes: 5095617 num_examples: 258240 download_size: 1459014 dataset_size: 5095617 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for CANER ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** [Classical-Arabic-Named-Entity-Recognition-Corpus](https://github.com/RamziSalah) - **Paper:** [Researchgate](https://www.researchgate.net/publication/330075080_BUILDING_THE_CLASSICAL_ARABIC_NAMED_ENTITY_RECOGNITION_CORPUS_CANERCORPUS) - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The Classical Arabic Named Entity Recognition corpus is a new corpus of tagged data that can be useful for handling the issues in recognition of Arabic named entities. ### Supported Tasks and Leaderboards - Named Entity Recognition ### Languages Classical Arabic ## Dataset Structure ### Data Instances An example from the dataset: ``` {'ner_tag': 1, 'token': 'الجامع'} ``` Where 1 stands for "Book" ### Data Fields - `id`: id of the sample - `token`: the tokens of the example text - `ner_tag`: the NER tags of each token The NER tags correspond to this list: ``` "Allah", "Book", "Clan", "Crime", "Date", "Day", "Hell", "Loc", "Meas", "Mon", "Month", "NatOb", "Number", "O", "Org", "Para", "Pers", "Prophet", "Rlig", "Sect", "Time" ``` ### Data Splits Training splits only ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? Ramzi Salah and Lailatul Qadri Zakaria ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information [More Information Needed] ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information @article{article, author = {Salah, Ramzi and Zakaria, Lailatul}, year = {2018}, month = {12}, pages = {}, title = {BUILDING THE CLASSICAL ARABIC NAMED ENTITY RECOGNITION CORPUS (CANERCORPUS)}, volume = {96}, journal = {Journal of Theoretical and Applied Information Technology} } ### Contributions Thanks to [@KMFODA](https://github.com/KMFODA) for adding this dataset.

提供机构：

community-datasets

原始信息汇总

数据集卡片 for CANER

数据集描述

数据集摘要

古典阿拉伯命名实体识别语料库（CANER）是一个新的标记数据语料库，可用于处理阿拉伯命名实体识别中的问题。

支持的任务和排行榜

命名实体识别

语言

古典阿拉伯语

数据集结构

数据实例

数据集中的一个示例：

{ner_tag: 1, token: الجامع}

其中 1 代表 "Book"

数据字段

id: 样本的ID
token: 示例文本的标记
ner_tag: 每个标记的NER标签

NER标签对应以下列表：

"Allah", "Book", "Clan", "Crime", "Date", "Day", "Hell", "Loc", "Meas", "Mon", "Month", "NatOb", "Number", "O", "Org", "Para", "Pers", "Prophet", "Rlig", "Sect", "Time"

数据分割

仅包含训练分割

数据集创建

注释者

Ramzi Salah 和 Lailatul Qadri Zakaria

额外信息

引用信息

@article{article, author = {Salah, Ramzi and Zakaria, Lailatul}, year = {2018}, month = {12}, pages = {}, title = {BUILDING THE CLASSICAL ARABIC NAMED ENTITY RECOGNITION CORPUS (CANERCORPUS)}, volume = {96}, journal = {Journal of Theoretical and Applied Information Technology} }

贡献

感谢 @KMFODA 添加此数据集。

5,000+

优质数据集

54 个

任务类型

进入经典数据集