dsfsi/daily-news-dikgang
收藏Hugging Face2023-10-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dsfsi/daily-news-dikgang
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
task_categories:
- text-classification
language:
- tn
size_categories:
- 1K<n<10K
---
# Daily News Dikgang
[](https://arxiv.org/abs/2310.09141)
Give Feedback 📑: [DSFSI Resource Feedback Form](https://docs.google.com/forms/d/e/1FAIpQLSf7S36dyAUPx2egmXbFpnTBuzoRulhL5Elu-N1eoMhaO7v10w/formResponse)
## About dataset
The dataset contains annotated categorised data from Dikgang - Daily News [https://dailynews.gov.bw/news-list/srccategory/10](https://dailynews.gov.bw/news-list/srccategory/10). The data is in setswana.
See the [Data Statement](DataStatementPuoBERTaDailyNewsDikgang.pdf) for foll details.
Disclaimer
-------
This dataset contains machine-readable data extracted from online news articles, from [https://dailynews.gov.bw/news-list/srccategory/10](https://dailynews.gov.bw/news-list/srccategory/10), provided by the Botswana Government. While efforts were made to ensure the accuracy and completeness of this data, there may be errors or discrepancies between the original publications and this dataset. No warranties, guarantees or representations are given in relation to the information contained in the dataset. The members of the Data Science for Societal Impact Research Group bear no responsibility and/or liability for any such errors or discrepancies in this dataset. The Botswana Government bears no responsibility and/or liability for any such errors or discrepancies in this dataset. It is recommended that users verify all information contained herein before making decisions based upon this information.
Authors
-------
- Vukosi Marivate - [@vukosi](https://twitter.com/vukosi)
- Valencia Wagner
Citation
--------
Bibtex Reference
```
@inproceedings{marivate2023puoberta,
title = {PuoBERTa: Training and evaluation of a curated language model for Setswana},
author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai},
year = {2023},
booktitle= {SACAIR 2023 (To Appear)},
keywords = {NLP},
preprint_url = {https://arxiv.org/abs/2310.09141},
dataset_url = {https://github.com/dsfsi/PuoBERTa},
software_url = {https://huggingface.co/dsfsi/PuoBERTa}
}
```
Licences
-------
The license of the News Categorisation dataset is in CC-BY-SA-4.0. the monolingual data have difference licenses depending on the news website license
* License for Data - [CC-BY-SA-4.0](LICENSE.data.md)
提供机构:
dsfsi
原始信息汇总
Daily News Dikgang 数据集概述
数据集基本信息
- 许可证: CC-BY-SA-4.0
- 任务类别: 文本分类
- 语言: 塞茨瓦纳语 (tn)
- 数据规模: 1K<n<10K
数据集描述
- 来源: 数据来自 Dikgang - Daily News,网址为 https://dailynews.gov.bw/news-list/srccategory/10。
- 内容: 数据集包含经过标注的分类数据,语言为塞茨瓦纳语。
作者
- Vukosi Marivate - @vukosi
- Valencia Wagner
引用
@inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli MotsOehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {SACAIR 2023 (To Appear)}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} }
许可证
- 数据集许可证: CC-BY-SA-4.0
- 单语数据许可证: 根据新闻网站的许可证不同而不同



