MBZUAI-Paris/Darija-SFT-Mixture
收藏Hugging Face2025-05-02 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MBZUAI-Paris/Darija-SFT-Mixture
下载链接
链接失效反馈官方服务:
资源简介:
Darija-SFT-Mixture是一个为摩洛哥方言构建的混合数据集,包含翻译指令、sentiment分析、问答等多种类型的指令。该数据集通过整合现有方言资源、创建新的数据集以及翻译英文指令构建而成,用于训练Atlas-Chat-2B和Atlas-Chat-9B模型。数据集涵盖了多种来源的数据,包括DODa-10k、MADAR、NLLB-Seed等,并以ODC-BY许可证发布。
Darija-SFT-Mixture is a mixed dataset constructed for the Moroccan dialect, containing various types of instructions such as translation instructions, sentiment analysis, and question-answering. The dataset is created by consolidating existing dialect resources, creating new datasets, and translating English instructions, which is used to train the Atlas-Chat-2B and Atlas-Chat-9B models. The dataset covers data from various sources, including DODa-10k, MADAR, NLLB-Seed, etc., and is released under the ODC-BY license.
提供机构:
MBZUAI-Paris



