five

dsfsi/daily-news-dikgang

收藏
Hugging Face2023-10-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dsfsi/daily-news-dikgang
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 task_categories: - text-classification language: - tn size_categories: - 1K<n<10K --- # Daily News Dikgang [![arXiv](https://img.shields.io/badge/arXiv-2310.09141-b31b1b.svg)](https://arxiv.org/abs/2310.09141) Give Feedback 📑: [DSFSI Resource Feedback Form](https://docs.google.com/forms/d/e/1FAIpQLSf7S36dyAUPx2egmXbFpnTBuzoRulhL5Elu-N1eoMhaO7v10w/formResponse) ## About dataset The dataset contains annotated categorised data from Dikgang - Daily News [https://dailynews.gov.bw/news-list/srccategory/10](https://dailynews.gov.bw/news-list/srccategory/10). The data is in setswana. See the [Data Statement](DataStatementPuoBERTaDailyNewsDikgang.pdf) for foll details. Disclaimer ------- This dataset contains machine-readable data extracted from online news articles, from [https://dailynews.gov.bw/news-list/srccategory/10](https://dailynews.gov.bw/news-list/srccategory/10), provided by the Botswana Government. While efforts were made to ensure the accuracy and completeness of this data, there may be errors or discrepancies between the original publications and this dataset. No warranties, guarantees or representations are given in relation to the information contained in the dataset. The members of the Data Science for Societal Impact Research Group bear no responsibility and/or liability for any such errors or discrepancies in this dataset. The Botswana Government bears no responsibility and/or liability for any such errors or discrepancies in this dataset. It is recommended that users verify all information contained herein before making decisions based upon this information. Authors ------- - Vukosi Marivate - [@vukosi](https://twitter.com/vukosi) - Valencia Wagner Citation -------- Bibtex Reference ``` @inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {SACAIR 2023 (To Appear)}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} } ``` Licences ------- The license of the News Categorisation dataset is in CC-BY-SA-4.0. the monolingual data have difference licenses depending on the news website license * License for Data - [CC-BY-SA-4.0](LICENSE.data.md)
提供机构:
dsfsi
原始信息汇总

Daily News Dikgang 数据集概述

数据集基本信息

  • 许可证: CC-BY-SA-4.0
  • 任务类别: 文本分类
  • 语言: 塞茨瓦纳语 (tn)
  • 数据规模: 1K<n<10K

数据集描述

作者

  • Vukosi Marivate - @vukosi
  • Valencia Wagner

引用

@inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli MotsOehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {SACAIR 2023 (To Appear)}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} }

许可证

  • 数据集许可证: CC-BY-SA-4.0
  • 单语数据许可证: 根据新闻网站的许可证不同而不同
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作