ghananlpcommunity/ghanaian-news-sentences
收藏Hugging Face2026-02-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/ghanaian-news-sentences
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 676715008.0
num_examples: 3860295
download_size: 432387303
dataset_size: 676715008.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Ghanaian News Sentences
A corpus of ~3.86 million sentences extracted from Ghanaian online news sources.
Intended for language model training and fine-tuning on Ghanaian English — the
variety of English commonly used in Ghanaian media, which includes local
expressions, names, places, and topics specific to Ghana.
## Intended Use
- Continued pre-training or fine-tuning of language models on Ghanaian English
- Improving ASR (automatic speech recognition) transcription correction for Ghanaian news audio
- Text normalisation and error correction for Ghanaian news text
- Language modelling research focused on African English varieties
## Dataset Details
| Property | Value |
|---|---|
| Language | English (Ghanaian variety) |
| Domain | News |
| Split | train only |
| Rows | 3,860,295 |
| Size (uncompressed) | ~677 MB |
## Format
Each row contains a single `text` field with one sentence:
```json
{"text": "The Ghana Revenue Authority has announced new tax guidelines for small businesses."}
```
## Source
Sentences were extracted and deduplicated from Ghanaian online news articles.
## Community
This dataset is maintained by the [Ghana NLP Community](https://huggingface.co/ghananlpcommunity).
Contributions, feedback, and collaborations are welcome.
提供机构:
ghananlpcommunity



