five

ghananlpcommunity/ghanaian-news-sentences

收藏
Hugging Face2026-02-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/ghanaian-news-sentences
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: text dtype: string splits: - name: train num_bytes: 676715008.0 num_examples: 3860295 download_size: 432387303 dataset_size: 676715008.0 configs: - config_name: default data_files: - split: train path: data/train-* --- # Ghanaian News Sentences A corpus of ~3.86 million sentences extracted from Ghanaian online news sources. Intended for language model training and fine-tuning on Ghanaian English — the variety of English commonly used in Ghanaian media, which includes local expressions, names, places, and topics specific to Ghana. ## Intended Use - Continued pre-training or fine-tuning of language models on Ghanaian English - Improving ASR (automatic speech recognition) transcription correction for Ghanaian news audio - Text normalisation and error correction for Ghanaian news text - Language modelling research focused on African English varieties ## Dataset Details | Property | Value | |---|---| | Language | English (Ghanaian variety) | | Domain | News | | Split | train only | | Rows | 3,860,295 | | Size (uncompressed) | ~677 MB | ## Format Each row contains a single `text` field with one sentence: ```json {"text": "The Ghana Revenue Authority has announced new tax guidelines for small businesses."} ``` ## Source Sentences were extracted and deduplicated from Ghanaian online news articles. ## Community This dataset is maintained by the [Ghana NLP Community](https://huggingface.co/ghananlpcommunity). Contributions, feedback, and collaborations are welcome.
提供机构:
ghananlpcommunity
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作