dy-jumlu
收藏魔搭社区2025-12-03 更新2025-09-27 收录
下载链接:
https://modelscope.cn/datasets/aicxoct/dy-jumlu
下载链接
链接失效反馈官方服务:
资源简介:
# Multilingual Massive Multitask Language Understanding (JUMLU)
The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science.
We translated the MMLU’s jumlu set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations.
This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide.
## Locales
JUMLU contains the MMLU jumlu set translated into the following locales:
* DE_DE (German)
* ES_LA (Spanish)
* JA_JP (Japanese)
* ZH_CN (Simplified Chinese)
## Sources
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). [*Measuring Massive Multitask Language Understanding*](https://arxiv.org/abs/2009.03300).
[OpenAI Simple Evals GitHub Repository](https://github.com/openai/simple-evals)
# 多语言大规模多任务语言理解基准(JUMLU)
大规模多任务语言理解基准(MMLU)是一项被广泛认可的AI模型通用知识评估基准。该基准涵盖57个不同类别的丰富主题,覆盖范围从初等教育阶段的基础知识,延伸至法学、物理学、历史学、计算机科学等高阶专业学科内容。
本研究通过专业人工译员将MMLU的JUMLU子集翻译为14种语言。依托人工译员开展本次翻译工作,能够有效提升翻译结果的准确性与可信度,对于约鲁巴语(Yoruba)这类低资源语言而言尤为关键。我们现已公开这批专业人工翻译成果,以及用于运行评估的代码。
本次工作体现了我们对提升AI模型多语言能力的承诺,旨在确保AI模型在各类语言环境中均能实现准确表现,尤其关注那些代表性不足的语言社群。我们以高质量翻译为核心优先级,致力于让人工智能技术更具包容性,为全球用户提供更高效的服务。
## 语言区域
JUMLU包含已完成翻译的MMLU JUMLU子集,对应以下语言区域:
* DE_DE(德语)
* ES_LA(拉美西班牙语)
* JA_JP(日语)
* ZH_CN(简体中文)
## 参考文献
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). [*Measuring Massive Multitask Language Understanding*](https://arxiv.org/abs/2009.03300).
[OpenAI 简易评估开源代码仓库](https://github.com/openai/simple-evals)
提供机构:
maas
创建时间:
2025-09-20



