five

UdS-LSV/hausa_voa_topics

收藏
Hugging Face2024-08-08 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/UdS-LSV/hausa_voa_topics
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - found language: - ha license: - unknown multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - text-classification task_ids: - topic-classification pretty_name: Hausa Voa News Topic Classification Dataset (HausaVoaTopics) dataset_info: features: - name: news_title dtype: string - name: label dtype: class_label: names: '0': Africa '1': Health '2': Nigeria '3': Politics '4': World splits: - name: train num_bytes: 144928 num_examples: 2045 - name: validation num_bytes: 20561 num_examples: 290 - name: test num_bytes: 41191 num_examples: 582 download_size: 124578 dataset_size: 206680 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- # Dataset Card for Hausa VOA News Topic Classification dataset (hausa_voa_topics) ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - - **Repository:** https://github.com/uds-lsv/transfer-distant-transformer-african - **Paper:** https://www.aclweb.org/anthology/2020.emnlp-main.204/ - **Leaderboard:** - - **Point of Contact:** Michael A. Hedderich and David Adelani {mhedderich, didelani} (at) lsv.uni-saarland.de ### Dataset Summary A news headline topic classification dataset, similar to AG-news, for Hausa. The news headlines were collected from [VOA Hausa](https://www.voahausa.com/). ### Supported Tasks and Leaderboards [More Information Needed] ### Languages Hausa (ISO 639-1: ha) ## Dataset Structure ### Data Instances An instance consists of a news title sentence and the corresponding topic label. ### Data Fields - `news_title`: A news title - `label`: The label describing the topic of the news title. Can be one of the following classes: Nigeria, Africa, World, Health or Politics. ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@michael-aloys](https://github.com/michael-aloys) for adding this dataset.

A news headline topic classification dataset, similar to AG-news, for Hausa. The news headlines were collected from VOA Hausa. The dataset includes news titles and corresponding topic labels, which are Nigeria, Africa, World, Health, and Politics. The dataset is divided into training, validation, and test sets with varying numbers of instances. The language of the dataset is Hausa, and the data was expert-generated from the original source, VOA Hausa.
提供机构:
UdS-LSV
原始信息汇总

Hausa Voa News Topic Classification Dataset (HausaVoaTopics)

数据集概述

基本信息

  • 数据集名称: Hausa Voa News Topic Classification Dataset (HausaVoaTopics)
  • 语言: Hausa
  • 许可: 未知
  • 多语言性: 单语种
  • 数据集大小: 1K<n<10K
  • 源数据: 原始数据
  • 任务类别: 文本分类
  • 任务ID: 主题分类

数据集结构

特征

  • news_title: 新闻标题,数据类型为字符串。
  • label: 标签,描述新闻标题的主题,数据类型为类别标签,包括以下类别:
    • 0: Africa
    • 1: Health
    • 2: Nigeria
    • 3: Politics
    • 4: World

数据分割

  • train: 训练集,包含2045个实例,144932字节。
  • validation: 验证集,包含290个实例,20565字节。
  • test: 测试集,包含582个实例,41195字节。

数据集创建

数据来源

标注

  • 标注创建者: 专家生成

数据集大小

  • 下载大小: 195824字节
  • 数据集大小: 206692字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作