m-ArenaHard-v2.0

Name: m-ArenaHard-v2.0
Creator: maas
Published: 2025-12-05 16:44:05
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/CohereLabs/m-ArenaHard-v2.0

下载链接

链接失效反馈

官方服务：

资源简介：

## Dataset Card for m-ArenaHard-v2.0 This dataset is used in the paper [When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs](https://huggingface.co/papers/2506.20544). ### Dataset Details The m-ArenaHard-v2.0 dataset is a multilingual LLM evaluation set. This is built on the LMarena (formerly LMSYS) [arena-hard-auto-v2.0](https://github.com/lmarena/arena-hard-auto/tree/main/data/arena-hard-v2.0) test dataset. This dataset(containing 750 prompts) was filtered to "english" only prompts using the *papluca/xlm-roberta-base-language-detection* model resulting in 498 prompts. These filtered prompts were then translated into 22 languages by using an in-house state-of-the-art translation model resulting in a total test set of 11,454 multilingual prompts. The 23 languages included in this dataset are : - Arabic (ar) - Chinese (zh) - Czech (cs) - Dutch (nl) - English (en) - French (fr) - German (de) - Greek (el) - Hebrew (he) - Hindi (hi) - Indonesian (id) - Italian (it) - Japanese (ja) - Korean (ko) - Persian (fa) - Polish (pl) - Portuguese (pt) - Romanian (ro) - Russian (ru) - Spanish (es) - Turkish (tr) - Ukrainian (uk) - Vietnamese (vi) ## Load with Datasets To load this dataset with Datasets, you'll need to install Datasets as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset dataset = load_dataset("CohereLabs/m-ArenaHard-v2.0", "en") ``` If you instead would like to load the entire dataset, you can concatenate them using the following code: ```python from datasets import load_dataset, concatenate_datasets, get_dataset_config_names moniker = 'CohereLabs/m-ArenaHard-v2.0' dataset = concatenate_datasets([load_dataset(moniker,subset)['test'] for subset in get_dataset_config_names(moniker)]) ``` ## Dataset Structure An instance of the data from the Korean subset looks as follows: ```python { 'question_id': 'efe7a57ed35d476b', 'category': 'hard_prompt', 'subcategory': 'coding', 'prompt': '깊게 숨을 들이쉬고 단계별로 생각해 보세요. 사용자의 데이터 기밀성을 보장하기 위해 사용자가 전체 하드 드라이브를 암호화할 수 있도록 돕는 프로그램을 설계하고 작성하세요.', 'language': 'ko' } ``` ## Dataset Fields The following are the fields in the dataset: - question_id: a unique ID for the example (this will be the same across languages) - category: prompt category from original dataset - subcategory: finer-grained prompt category from original dataset - prompt: text of the prompt (question or instruction) - language: language of the prompt All language subsets of the dataset share the same fields as above. ## Authorship - Publishing Organization: Cohere Labs - Industry Type: Not-for-profit - Tech - Contact Details: https://cohere.com/research ## Licensing Information This dataset can be used for any purpose, whether academic or commercial, under the terms of the Apache 2.0 License. ## Citation ``` @misc{khairi2025lifegivessamplesbenefits, title={When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs}, author={Ammar Khairi and Daniel D'souza and Ye Shen and Julia Kreutzer and Sara Hooker}, year={2025}, eprint={2506.20544}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.20544}, } ``` ## Disclaimer The translation into 22 languages is performed with an in-house state-of-the-art translation model.

## m-ArenaHard-v2.0 数据集卡片本数据集应用于论文《When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs》（链接：https://huggingface.co/papers/2506.20544）。 ### 数据集详情 m-ArenaHard-v2.0 数据集是一款多语言大语言模型（Large Language Model, LLM）评测集，其构建基于 LMarena（前身为 LMSYS）发布的 arena-hard-auto-v2.0 测试数据集（链接：https://github.com/lmarena/arena-hard-auto/tree/main/data/arena-hard-v2.0）。该数据集初始包含750条提示词（prompt），通过 *papluca/xlm-roberta-base-language-detection* 模型筛选出仅含英语的提示词，最终得到498条英语提示词。随后，我们使用自研的顶尖翻译模型将筛选后的提示词翻译为22种语言，最终构建出包含11454条多语言提示词的评测集。本数据集涵盖以下23种语言： - 阿拉伯语（ar） - 汉语（zh） - 捷克语（cs） - 荷兰语（nl） - 英语（en） - 法语（fr） - 德语（de） - 希腊语（el） - 希伯来语（he） - 印地语（hi） - 印度尼西亚语（id） - 意大利语（it） - 日语（ja） - 韩语（ko） - 波斯语（fa） - 波兰语（pl） - 葡萄牙语（pt） - 罗马尼亚语（ro） - 俄语（ru） - 西班牙语（es） - 土耳其语（tr） - 乌克兰语（uk） - 越南语（vi） ## 使用Datasets库加载数据集若需使用Datasets库加载本数据集，请先通过以下命令安装依赖： `pip install datasets --upgrade` 随后使用如下代码加载指定语言子集（以英语为例）： python from datasets import load_dataset dataset = load_dataset("CohereLabs/m-ArenaHard-v2.0", "en") 若需加载完整数据集，可通过以下代码拼接所有语言子集： python from datasets import load_dataset, concatenate_datasets, get_dataset_config_names moniker = 'CohereLabs/m-ArenaHard-v2.0' dataset = concatenate_datasets([load_dataset(moniker,subset)['test'] for subset in get_dataset_config_names(moniker)]) ## 数据集结构以下为韩语子集的单条数据样例： python { 'question_id': 'efe7a57ed35d476b', 'category': 'hard_prompt', 'subcategory': 'coding', 'prompt': '깊게 숨을 들이쉬고 단계별로 생각해 보세요. 사용자의 데이터 기밀성을 보장하기 위해 사용자가 전체 하드 드라이브를 암호화할 수 있도록 돕는 프로그램을 설계하고 작성하세요.', 'language': 'ko' } ## 数据集字段本数据集包含以下字段： - question_id：示例的唯一标识符（各语言子集的该字段值保持一致） - category：原始数据集的提示词分类 - subcategory：原始数据集的细粒度提示词子分类 - prompt：提示词文本（问题或指令） - language：提示词所属语言所有语言子集均包含上述统一字段。 ## 作者信息 - 发布机构：Cohere Labs - 行业类型：非营利性科技机构 - 联系方式：https://cohere.com/research ## 授权信息本数据集可依据Apache 2.0开源协议条款，用于学术研究或商业用途等任何合法场景。 ## 引用信息 @misc{khairi2025lifegivessamplesbenefits, title={When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs}, author={Ammar Khairi and Daniel D'souza and Ye Shen and Julia Kreutzer and Sara Hooker}, year={2025}, eprint={2506.20544}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.20544}, } ## 免责声明本数据集的22种语言翻译均通过自研的顶尖翻译模型完成。

提供机构：

maas

创建时间：

2025-08-01

搜集汇总

数据集介绍