laihuiyuan/mCoT-MATH
收藏Hugging Face2024-06-05 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/laihuiyuan/mCoT-MATH
下载链接
链接失效反馈官方服务:
资源简介:
mCoT是一个基于MetaMathQA和MathInstruct的多语言数学推理数据集,包含约630万样本,覆盖11种语言(斯瓦希里语、孟加拉语、泰卢固语、泰语、日语、中文、俄语、西班牙语、法语、德语和英语)。该数据集用于训练一个7B参数的多语言数学推理模型,该模型在多种语言中表现出色。
mCoT is a multilingual mathematical reasoning dataset based on MetaMathQA and MathInstruct, containing approximately 6.3 million samples and covering 11 languages, namely Swahili, Bengali, Telugu, Thai, Japanese, Chinese, Russian, Spanish, French, German and English. This dataset is utilized to train a 7B-parameter multilingual mathematical reasoning model, which achieves excellent performance across various languages.
提供机构:
laihuiyuan
原始信息汇总
mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models
数据集概述
简介
基于MetaMathQA和MathInstruct,我们使用机器翻译编译了mCoT-MATH,这是第一个包含约630万个样本的大规模多语言数学CoT推理数据集,涵盖11种不同语言。
数据集规模
| 语言 | SW | BN | TE | TH | JA | ZH | RU | ES | FR | DE | EN | 总计 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mCoT-MATH | ~580K | ~580K | ~580K | ~580K | ~580K | ~580K | ~580K | ~580K | ~580K | ~580K | ~580K | ~6.3M |
引用
如果您使用本仓库的任何内容,请引用我们的论文:
@inproceedings{lai-etal-2024-mcot, title = "mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models", author = "Lai, Huiyuan and Nissim, Malvina", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics", month = aug, address = "Bangkok, Thailand", year = "2024", publisher = "Association for Computational Linguistics" }
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



