M2Lingual
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/ServiceNow-AI/M2Lingual
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个完全合成的、多语言的、多轮次指令微调数据集,它包含了多样化的语言和任务集合。该数据集采用了新颖的分类法进行构建,并通过基于Evol分类法的引导生成代码进行构造。规模上,该数据集总计包含182,000个IFT(指令微调)配对,覆盖了70种语言和17个以上的自然语言处理任务。其核心任务是进行指令微调。
This dataset is a fully synthetic, multilingual, multi-turn instruction fine-tuning (IFT) dataset that encompasses a diverse set of languages and tasks. It is constructed using a novel taxonomy and guided generation code based on the Evol taxonomy. In terms of scale, the dataset contains a total of 182,000 IFT pairs, covering 70 languages and over 17 natural language processing (NLP) tasks. The core task of this dataset is instruction fine-tuning.
提供机构:
ServiceNow AI



