MMMLU
收藏魔搭社区2026-05-12 更新2024-09-28 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/MMMLU
下载链接
链接失效反馈官方服务:
资源简介:
# Multilingual Massive Multitask Language Understanding (MMMLU)
The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science.
We translated the MMLU’s test set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations.
This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide.
## Locales
MMMLU contains the MMLU test set translated into the following locales:
* AR_XY (Arabic)
* BN_BD (Bengali)
* DE_DE (German)
* ES_LA (Spanish)
* FR_FR (French)
* HI_IN (Hindi)
* ID_ID (Indonesian)
* IT_IT (Italian)
* JA_JP (Japanese)
* KO_KR (Korean)
* PT_BR (Brazilian Portuguese)
* SW_KE (Swahili)
* YO_NG (Yoruba)
* ZH_CN (Simplified Chinese)
## Sources
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). [*Measuring Massive Multitask Language Understanding*](https://arxiv.org/abs/2009.03300).
[OpenAI Simple Evals GitHub Repository](https://github.com/openai/simple-evals)
# 多语言大规模多任务语言理解(Multilingual Massive Multitask Language Understanding, MMMLU)
大规模多任务语言理解(Massive Multitask Language Understanding, MMLU)是AI模型通用知识掌握水平的广获认可的基准测试。其涵盖57个不同类别的宽泛主题,覆盖从初等教育知识到法律、物理、历史、计算机科学等进阶专业学科的广泛内容。
我们依托专业人工译员,将MMLU的测试集翻译为14种语言。本次翻译工作采用人工译制的方式,此举可有效提升翻译结果的可信度与准确性,对于约鲁巴语(Yoruba)这类低资源语言而言尤为关键。我们公开了这批专业人工译制的翻译文本,以及用于运行评估的代码。
本次工作彰显了我们提升AI模型多语言能力的承诺,旨在确保AI模型能够在各类语言场景下实现准确表现,尤其惠及那些代表性不足的社群。通过优先保障翻译质量,我们期望让人工智能技术更具包容性,为全球用户提供更高效的服务。
## 覆盖语种与地区代码
MMMLU包含已完成译制的MMLU测试集,对应以下语种与地区设置:
* AR_XY(阿拉伯语)
* BN_BD(孟加拉语)
* DE_DE(德语)
* ES_LA(拉美西班牙语)
* FR_FR(法语)
* HI_IN(印地语)
* ID_ID(印度尼西亚语)
* IT_IT(意大利语)
* JA_JP(日语)
* KO_KR(韩语)
* PT_BR(巴西葡萄牙语)
* SW_KE(斯瓦西里语)
* YO_NG(约鲁巴语)
* ZH_CN(简体中文)
## 数据源
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). [*衡量大规模多任务语言理解能力*](https://arxiv.org/abs/2009.03300)
[OpenAI 简易评估GitHub仓库](https://github.com/openai/simple-evals)
提供机构:
maas
创建时间:
2024-09-24



