UdS-LSV/hausa_voa_topics
收藏Hugging Face2024-08-08 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/UdS-LSV/hausa_voa_topics
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language_creators:
- found
language:
- ha
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- topic-classification
pretty_name: Hausa Voa News Topic Classification Dataset (HausaVoaTopics)
dataset_info:
features:
- name: news_title
dtype: string
- name: label
dtype:
class_label:
names:
'0': Africa
'1': Health
'2': Nigeria
'3': Politics
'4': World
splits:
- name: train
num_bytes: 144928
num_examples: 2045
- name: validation
num_bytes: 20561
num_examples: 290
- name: test
num_bytes: 41191
num_examples: 582
download_size: 124578
dataset_size: 206680
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# Dataset Card for Hausa VOA News Topic Classification dataset (hausa_voa_topics)
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** -
- **Repository:** https://github.com/uds-lsv/transfer-distant-transformer-african
- **Paper:** https://www.aclweb.org/anthology/2020.emnlp-main.204/
- **Leaderboard:** -
- **Point of Contact:** Michael A. Hedderich and David Adelani
{mhedderich, didelani} (at) lsv.uni-saarland.de
### Dataset Summary
A news headline topic classification dataset, similar to AG-news, for Hausa. The news headlines were collected from [VOA Hausa](https://www.voahausa.com/).
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
Hausa (ISO 639-1: ha)
## Dataset Structure
### Data Instances
An instance consists of a news title sentence and the corresponding topic label.
### Data Fields
- `news_title`: A news title
- `label`: The label describing the topic of the news title. Can be one of the following classes: Nigeria, Africa, World, Health or Politics.
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@michael-aloys](https://github.com/michael-aloys) for adding this dataset.
A news headline topic classification dataset, similar to AG-news, for Hausa. The news headlines were collected from VOA Hausa. The dataset includes news titles and corresponding topic labels, which are Nigeria, Africa, World, Health, and Politics. The dataset is divided into training, validation, and test sets with varying numbers of instances. The language of the dataset is Hausa, and the data was expert-generated from the original source, VOA Hausa.
提供机构:
UdS-LSV
原始信息汇总
Hausa Voa News Topic Classification Dataset (HausaVoaTopics)
数据集概述
基本信息
- 数据集名称: Hausa Voa News Topic Classification Dataset (HausaVoaTopics)
- 语言: Hausa
- 许可: 未知
- 多语言性: 单语种
- 数据集大小: 1K<n<10K
- 源数据: 原始数据
- 任务类别: 文本分类
- 任务ID: 主题分类
数据集结构
特征
- news_title: 新闻标题,数据类型为字符串。
- label: 标签,描述新闻标题的主题,数据类型为类别标签,包括以下类别:
- 0: Africa
- 1: Health
- 2: Nigeria
- 3: Politics
- 4: World
数据分割
- train: 训练集,包含2045个实例,144932字节。
- validation: 验证集,包含290个实例,20565字节。
- test: 测试集,包含582个实例,41195字节。
数据集创建
数据来源
- 新闻标题收集自VOA Hausa。
标注
- 标注创建者: 专家生成
数据集大小
- 下载大小: 195824字节
- 数据集大小: 206692字节



