m-ArenaHard-v2.0
收藏魔搭社区2025-12-05 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/CohereLabs/m-ArenaHard-v2.0
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Card for m-ArenaHard-v2.0
This dataset is used in the paper [When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs](https://huggingface.co/papers/2506.20544).
### Dataset Details
The m-ArenaHard-v2.0 dataset is a multilingual LLM evaluation set. This is built on the LMarena (formerly LMSYS) [arena-hard-auto-v2.0](https://github.com/lmarena/arena-hard-auto/tree/main/data/arena-hard-v2.0) test dataset.
This dataset(containing 750 prompts) was filtered to "english" only prompts using the *papluca/xlm-roberta-base-language-detection* model resulting in 498 prompts.
These filtered prompts were then translated into 22 languages by using an in-house state-of-the-art translation model resulting in a total test set of 11,454 multilingual prompts.
The 23 languages included in this dataset are :
- Arabic (ar)
- Chinese (zh)
- Czech (cs)
- Dutch (nl)
- English (en)
- French (fr)
- German (de)
- Greek (el)
- Hebrew (he)
- Hindi (hi)
- Indonesian (id)
- Italian (it)
- Japanese (ja)
- Korean (ko)
- Persian (fa)
- Polish (pl)
- Portuguese (pt)
- Romanian (ro)
- Russian (ru)
- Spanish (es)
- Turkish (tr)
- Ukrainian (uk)
- Vietnamese (vi)
## Load with Datasets
To load this dataset with Datasets, you'll need to install Datasets as
`pip install datasets --upgrade`
and then use the following code:
```python
from datasets import load_dataset
dataset = load_dataset("CohereLabs/m-ArenaHard-v2.0", "en")
```
If you instead would like to load the entire dataset, you can concatenate them using the following code:
```python
from datasets import load_dataset, concatenate_datasets, get_dataset_config_names
moniker = 'CohereLabs/m-ArenaHard-v2.0'
dataset = concatenate_datasets([load_dataset(moniker,subset)['test'] for subset in get_dataset_config_names(moniker)])
```
## Dataset Structure
An instance of the data from the Korean subset looks as follows:
```python
{
'question_id': 'efe7a57ed35d476b',
'category': 'hard_prompt',
'subcategory': 'coding',
'prompt': '깊게 숨을 들이쉬고 단계별로 생각해 보세요. 사용자의 데이터 기밀성을 보장하기 위해 사용자가 전체 하드 드라이브를 암호화할 수 있도록 돕는 프로그램을 설계하고 작성하세요.',
'language': 'ko'
}
```
## Dataset Fields
The following are the fields in the dataset:
- question_id: a unique ID for the example (this will be the same across languages)
- category: prompt category from original dataset
- subcategory: finer-grained prompt category from original dataset
- prompt: text of the prompt (question or instruction)
- language: language of the prompt
All language subsets of the dataset share the same fields as above.
## Authorship
- Publishing Organization: Cohere Labs
- Industry Type: Not-for-profit - Tech
- Contact Details: https://cohere.com/research
## Licensing Information
This dataset can be used for any purpose, whether academic or commercial, under the terms of the Apache 2.0 License.
## Citation
```
@misc{khairi2025lifegivessamplesbenefits,
title={When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs},
author={Ammar Khairi and Daniel D'souza and Ye Shen and Julia Kreutzer and Sara Hooker},
year={2025},
eprint={2506.20544},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.20544},
}
```
## Disclaimer
The translation into 22 languages is performed with an in-house state-of-the-art translation model.
## m-ArenaHard-v2.0 数据集卡片
本数据集应用于论文《When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs》(链接:https://huggingface.co/papers/2506.20544)。
### 数据集详情
m-ArenaHard-v2.0 数据集是一款多语言大语言模型(Large Language Model, LLM)评测集,其构建基于 LMarena(前身为 LMSYS)发布的 arena-hard-auto-v2.0 测试数据集(链接:https://github.com/lmarena/arena-hard-auto/tree/main/data/arena-hard-v2.0)。
该数据集初始包含750条提示词(prompt),通过 *papluca/xlm-roberta-base-language-detection* 模型筛选出仅含英语的提示词,最终得到498条英语提示词。随后,我们使用自研的顶尖翻译模型将筛选后的提示词翻译为22种语言,最终构建出包含11454条多语言提示词的评测集。
本数据集涵盖以下23种语言:
- 阿拉伯语(ar)
- 汉语(zh)
- 捷克语(cs)
- 荷兰语(nl)
- 英语(en)
- 法语(fr)
- 德语(de)
- 希腊语(el)
- 希伯来语(he)
- 印地语(hi)
- 印度尼西亚语(id)
- 意大利语(it)
- 日语(ja)
- 韩语(ko)
- 波斯语(fa)
- 波兰语(pl)
- 葡萄牙语(pt)
- 罗马尼亚语(ro)
- 俄语(ru)
- 西班牙语(es)
- 土耳其语(tr)
- 乌克兰语(uk)
- 越南语(vi)
## 使用Datasets库加载数据集
若需使用Datasets库加载本数据集,请先通过以下命令安装依赖:
`pip install datasets --upgrade`
随后使用如下代码加载指定语言子集(以英语为例):
python
from datasets import load_dataset
dataset = load_dataset("CohereLabs/m-ArenaHard-v2.0", "en")
若需加载完整数据集,可通过以下代码拼接所有语言子集:
python
from datasets import load_dataset, concatenate_datasets, get_dataset_config_names
moniker = 'CohereLabs/m-ArenaHard-v2.0'
dataset = concatenate_datasets([load_dataset(moniker,subset)['test'] for subset in get_dataset_config_names(moniker)])
## 数据集结构
以下为韩语子集的单条数据样例:
python
{
'question_id': 'efe7a57ed35d476b',
'category': 'hard_prompt',
'subcategory': 'coding',
'prompt': '깊게 숨을 들이쉬고 단계별로 생각해 보세요. 사용자의 데이터 기밀성을 보장하기 위해 사용자가 전체 하드 드라이브를 암호화할 수 있도록 돕는 프로그램을 설계하고 작성하세요.',
'language': 'ko'
}
## 数据集字段
本数据集包含以下字段:
- question_id:示例的唯一标识符(各语言子集的该字段值保持一致)
- category:原始数据集的提示词分类
- subcategory:原始数据集的细粒度提示词子分类
- prompt:提示词文本(问题或指令)
- language:提示词所属语言
所有语言子集均包含上述统一字段。
## 作者信息
- 发布机构:Cohere Labs
- 行业类型:非营利性科技机构
- 联系方式:https://cohere.com/research
## 授权信息
本数据集可依据Apache 2.0开源协议条款,用于学术研究或商业用途等任何合法场景。
## 引用信息
@misc{khairi2025lifegivessamplesbenefits,
title={When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs},
author={Ammar Khairi and Daniel D'souza and Ye Shen and Julia Kreutzer and Sara Hooker},
year={2025},
eprint={2506.20544},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.20544},
}
## 免责声明
本数据集的22种语言翻译均通过自研的顶尖翻译模型完成。
提供机构:
maas
创建时间:
2025-08-01
搜集汇总
数据集介绍

背景与挑战
背景概述
m-ArenaHard-v2.0是一个多语言LLM评估数据集,包含23种语言的11,454个提示,基于LMarena的英语提示翻译而成。数据集用于评估多语言大模型,采用Apache 2.0许可,适合学术和商业用途。
以上内容由遇见数据集搜集并总结生成



