Imsidag-community/kabyle-corpus-hca
收藏Hugging Face2025-10-17 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/Imsidag-community/kabyle-corpus-hca
下载链接
链接失效反馈官方服务:
资源简介:
卡布勒段落语料库(HCA 阿尔及利亚)包含从公开获取的PDF文档中提取的188,620个卡布勒句子,这些文档发布在HCA Algeria上,是阿尔及利亚Amazighité高等委员会的机构仓库。文档来源于Amazighité高等委员会。该语料库是半清洗的,没有完全清洗。我们使用本地工具进行清洗,并将很快公开可用。语料库分为训练集、验证集和测试集,分别包含169,758、9,431和9,431条记录。
The Kabyle Paragraph Corpus (HCA Algeria) consists of 188,620 Kabyle sentences extracted from open-access PDFs published on HCA Algeria, the institutional repository of the Haut Commissariat à lAmazighité (Algeria). The documents originate from the Haut Commissariat à lAmazighité. This corpus is semi-cleaned and not fully cleaned. We are using a local tool for cleaning and will make it publicly available soon. The corpus is split into training, validation, and test sets, containing 169,758, 9,431, and 9,431 records respectively.
提供机构:
Imsidag-community



