five

lq-jumlu

收藏
魔搭社区2025-09-14 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/aicxoct/lq-jumlu
下载链接
链接失效反馈
官方服务:
资源简介:
# Multilingual Massive Multitask Language Understanding (JUMLU) The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science. We translated the MMLU’s jumlu set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations. This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide. ## Locales JUMLU contains the MMLU jumlu set translated into the following locales: * DE_DE (German) * ES_LA (Spanish) * JA_JP (Japanese) * ZH_CN (Simplified Chinese) ## Sources Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). [*Measuring Massive Multitask Language Understanding*](https://arxiv.org/abs/2009.03300). [OpenAI Simple Evals GitHub Repository](https://github.com/openai/simple-evals)

# 多语言大规模多任务语言理解(JUMLU) 大规模多任务语言理解(Massive Multitask Language Understanding,简称MMLU)是广受认可的AI模型通用知识水平基准测试。该基准涵盖57个不同类别的广泛主题,覆盖从基础入门知识到法律、物理、历史、计算机科学等高端专业学科。 我们依托专业人工译者,将MMLU的JUMLU子集翻译为14种语言。本次翻译工作采用人工译制模式,能够提升译稿准确性的可信度,针对约鲁巴语这类低资源语言尤为如此。本次研究公开了专业人工译稿与评测运行代码。 本项工作体现了我们提升AI模型多语言能力的承诺,旨在确保模型在各类语言中均能准确运行,尤其兼顾代表性不足的语言社区。我们以高质量翻译为优先原则,旨在让人工智能技术更具包容性,为全球用户提供更高效的服务。 ## 语言区域 JUMLU包含已翻译为以下语言区域的MMLU JUMLU子集: * DE_DE(德语) * ES_LA(拉美西班牙语) * JA_JP(日语) * ZH_CN(简体中文) ## 数据来源 亨德里克斯(Hendrycks, D.)、伯恩斯(Burns, C.)、卡达瓦思(Kadavath, S.)、阿罗拉(Arora, A.)、巴萨特(Basart, S.)、唐(Tang, E.)、宋(Song, D.)与斯坦哈特(Steinhardt, J.)(2021):《衡量大规模多任务语言理解能力》[*Measuring Massive Multitask Language Understanding*](https://arxiv.org/abs/2009.03300)。 [OpenAI 简易评测GitHub仓库](https://github.com/openai/simple-evals)
提供机构:
maas
创建时间:
2025-08-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作