five

MBZUAI-Paris/Darija-SFT-Mixture

收藏
Hugging Face2025-05-02 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MBZUAI-Paris/Darija-SFT-Mixture
下载链接
链接失效反馈
官方服务:
资源简介:
Darija-SFT-Mixture是一个为摩洛哥方言构建的混合数据集,包含翻译指令、sentiment分析、问答等多种类型的指令。该数据集通过整合现有方言资源、创建新的数据集以及翻译英文指令构建而成,用于训练Atlas-Chat-2B和Atlas-Chat-9B模型。数据集涵盖了多种来源的数据,包括DODa-10k、MADAR、NLLB-Seed等,并以ODC-BY许可证发布。

Darija-SFT-Mixture is a mixed dataset constructed for the Moroccan dialect, containing various types of instructions such as translation instructions, sentiment analysis, and question-answering. The dataset is created by consolidating existing dialect resources, creating new datasets, and translating English instructions, which is used to train the Atlas-Chat-2B and Atlas-Chat-9B models. The dataset covers data from various sources, including DODa-10k, MADAR, NLLB-Seed, etc., and is released under the ODC-BY license.
提供机构:
MBZUAI-Paris
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作