CLEAR-Global/kanuri-books-corpus
收藏Hugging Face2023-10-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/CLEAR-Global/kanuri-books-corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含来自Kanuri作者(如Dr. Baba Kura Alkali Gazali, Lawan Dalama, Kaka Gana Abba, Lawan Hassan)书籍中的随机句子,主要用于创建开源语言技术。数据集包含10,281个句子和90,706个单词。使用该数据集需要遵守CC BY 4.0许可,并禁止有害使用。引用时需提及CLEAR Global和作者。
This dataset contains 10,281 randomized sentences and 90,706 words collected from books written by Kanuri authors, including Dr. Baba Kura Alkali Gazali, Lawan Dalama, Kaka Gana Abba, Lawan Hassan. The sentences are copyrighted to the original authors, and the compiled corpus is licensed under Attribution 4.0 International (CC BY 4.0). The corpus is intended for the development of open-source language technology, and usage requires adherence to the terms prohibiting harmful use and acknowledgment of CLEAR Global and the authors.
提供机构:
CLEAR-Global
原始信息汇总
数据集概述
基本信息
- 许可证: CC BY 4.0
- 语言: 卡努里语 (kr)
- 数据规模: 10K<n<100K
内容描述
- 句子数量: 10,281
- 单词数量: 90,706
版权与使用
- 句子版权: 归原作者所有
- 数据集许可证: Attribution 4.0 International (CC BY 4.0)
- 使用条件: 禁止有害使用,使用时需注明 CLEAR Global 和原作者
引用信息
-
参考文献:
Alp Öktem, Muhannad Albayk Jaam, Eric DeLuca, Grace Tang Gamayun – Language Technology for Humanitarian Response In: 2020 IEEE Global Humanitarian Technology Conference (GHTC) 2020 October 29 - November 1; Virtual.



