HausaNLP/Naija-Stopwords

Name: HausaNLP/Naija-Stopwords
Creator: HausaNLP
Published: 2023-06-18 15:38:04
License: 暂无描述

Hugging Face2023-06-18 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/HausaNLP/Naija-Stopwords

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 tags: - sentiment analysis, Twitter, tweets - stopwords multilinguality: - monolingual - multilingual language: - hau - ibo - pcm - yor pretty_name: NaijaStopwords --- # Naija-Stopwords Naija-Stopwords is a part of the [Naija-Senti](https://huggingface.co/datasets/HausaNLP/NaijaSenti-Twitter) project. It is a list of collected stopwords from the four most widely spoken languages in Nigeria — Hausa, Igbo, Nigerian-Pidgin, and Yorùbá. -------------------------------------------------------------------------------- ## Dataset Description - **Homepage:** https://github.com/hausanlp/NaijaSenti/tree/main/data/stopwords - **Repository:** [GitHub](https://github.com/hausanlp/NaijaSenti/tree/main/data/stopwords) - **Paper:** [NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis](https://aclanthology.org/2022.lrec-1.63/) - **Leaderboard:** N/A - **Point of Contact:** [Shamsuddeen Hassan Muhammad](shamsuddeen2004@gmail.com) ### Languages 4 most spoken Nigerian languages * Hausa (hau) * Igbo (ibo) * Nigerian Pidgin (pcm) * Yoruba (yor) ## Dataset Structure ### Data Instances List of stopwords instances in each of the four language. ``` { "word": "string" } ``` ### How to use it ```python from datasets import load_dataset # you can load specific languages (e.g., Hausa). This download train, validation and test sets. ds = load_dataset("HausaNLP/Naija-Stopwords", "hau") ``` ## Additional Information ### Dataset Curators * Shamsuddeen Hassan Muhammad * Idris Abdulmumin * Ibrahim Said Ahmad * Bello Shehu Bello ### Licensing Information This Naija-Stopwords dataset is licensed under a Creative Commons Attribution BY-NC-SA 4.0 International License ### Citation Information ``` @inproceedings{muhammad-etal-2022-naijasenti, title = "{N}aija{S}enti: A {N}igerian {T}witter Sentiment Corpus for Multilingual Sentiment Analysis", author = "Muhammad, Shamsuddeen Hassan and Adelani, David Ifeoluwa and Ruder, Sebastian and Ahmad, Ibrahim Sa{'}id and Abdulmumin, Idris and Bello, Bello Shehu and Choudhury, Monojit and Emezue, Chris Chinenye and Abdullahi, Saheed Salahudeen and Aremu, Anuoluwapo and Jorge, Al{\'\i}pio and Brazdil, Pavel", booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2022.lrec-1.63", pages = "590--602", } ``` ### Contributions > This work was carried out with support from Lacuna Fund, an initiative co-founded by The Rockefeller Foundation, Google.org, and Canada’s International Development Research Centre. The views expressed herein do not necessarily represent those of Lacuna Fund, its Steering Committee, its funders, or Meridian Institute.

提供机构：

HausaNLP

原始信息汇总

数据集概述

数据集名称

Naija-Stopwords

数据集描述

Naija-Stopwords 是 Naija-Senti 项目的一部分，收集了尼日利亚四种最广泛使用的语言的停用词：Hausa, Igbo, Nigerian-Pidgin, 和 Yorùbá。

数据集详情

许可证: cc-by-nc-sa-4.0
标签: sentiment analysis, Twitter, tweets, stopwords
多语言性: monolingual, multilingual
语言: hau, ibo, pcm, yor
美观名称: NaijaStopwords

数据集结构

数据实例: 包含四种语言的停用词列表。
数据格式: json { "word": "string" }

如何使用

python from datasets import load_dataset

加载特定语言（例如 Hausa）的数据集

ds = load_dataset("HausaNLP/Naija-Stopwords", "hau")

数据集语言

Hausa (hau)
Igbo (ibo)
Nigerian Pidgin (pcm)
Yoruba (yor)

许可证信息

Naija-Stopwords 数据集根据 Creative Commons Attribution BY-NC-SA 4.0 International License 授权。

引用信息

bibtex @inproceedings{muhammad-etal-2022-naijasenti, title = "{N}aija{S}enti: A {N}igerian {T}witter Sentiment Corpus for Multilingual Sentiment Analysis", author = "Muhammad, Shamsuddeen Hassan and Adelani, David Ifeoluwa and Ruder, Sebastian and Ahmad, Ibrahim Sa{}id and Abdulmumin, Idris and Bello, Bello Shehu and Choudhury, Monojit and Emezue, Chris Chinenye and Abdullahi, Saheed Salahudeen and Aremu, Anuoluwapo and Jorge, Al{\i}pio and Brazdil, Pavel", booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2022.lrec-1.63", pages = "590--602", }

5,000+

优质数据集

54 个

任务类型

进入经典数据集