five

m-ArenaHard-v2.0

收藏
魔搭社区2025-12-05 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/CohereLabs/m-ArenaHard-v2.0
下载链接
链接失效反馈
官方服务:
资源简介:
## Dataset Card for m-ArenaHard-v2.0 This dataset is used in the paper [When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs](https://huggingface.co/papers/2506.20544). ### Dataset Details The m-ArenaHard-v2.0 dataset is a multilingual LLM evaluation set. This is built on the LMarena (formerly LMSYS) [arena-hard-auto-v2.0](https://github.com/lmarena/arena-hard-auto/tree/main/data/arena-hard-v2.0) test dataset. This dataset(containing 750 prompts) was filtered to "english" only prompts using the *papluca/xlm-roberta-base-language-detection* model resulting in 498 prompts. These filtered prompts were then translated into 22 languages by using an in-house state-of-the-art translation model resulting in a total test set of 11,454 multilingual prompts. The 23 languages included in this dataset are : - Arabic (ar) - Chinese (zh) - Czech (cs) - Dutch (nl) - English (en) - French (fr) - German (de) - Greek (el) - Hebrew (he) - Hindi (hi) - Indonesian (id) - Italian (it) - Japanese (ja) - Korean (ko) - Persian (fa) - Polish (pl) - Portuguese (pt) - Romanian (ro) - Russian (ru) - Spanish (es) - Turkish (tr) - Ukrainian (uk) - Vietnamese (vi) ## Load with Datasets To load this dataset with Datasets, you'll need to install Datasets as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset dataset = load_dataset("CohereLabs/m-ArenaHard-v2.0", "en") ``` If you instead would like to load the entire dataset, you can concatenate them using the following code: ```python from datasets import load_dataset, concatenate_datasets, get_dataset_config_names moniker = 'CohereLabs/m-ArenaHard-v2.0' dataset = concatenate_datasets([load_dataset(moniker,subset)['test'] for subset in get_dataset_config_names(moniker)]) ``` ## Dataset Structure An instance of the data from the Korean subset looks as follows: ```python { 'question_id': 'efe7a57ed35d476b', 'category': 'hard_prompt', 'subcategory': 'coding', 'prompt': '깊게 숨을 들이쉬고 단계별로 생각해 보세요. 사용자의 데이터 기밀성을 보장하기 위해 사용자가 전체 하드 드라이브를 암호화할 수 있도록 돕는 프로그램을 설계하고 작성하세요.', 'language': 'ko' } ``` ## Dataset Fields The following are the fields in the dataset: - question_id: a unique ID for the example (this will be the same across languages) - category: prompt category from original dataset - subcategory: finer-grained prompt category from original dataset - prompt: text of the prompt (question or instruction) - language: language of the prompt All language subsets of the dataset share the same fields as above. ## Authorship - Publishing Organization: Cohere Labs - Industry Type: Not-for-profit - Tech - Contact Details: https://cohere.com/research ## Licensing Information This dataset can be used for any purpose, whether academic or commercial, under the terms of the Apache 2.0 License. ## Citation ``` @misc{khairi2025lifegivessamplesbenefits, title={When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs}, author={Ammar Khairi and Daniel D'souza and Ye Shen and Julia Kreutzer and Sara Hooker}, year={2025}, eprint={2506.20544}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.20544}, } ``` ## Disclaimer The translation into 22 languages is performed with an in-house state-of-the-art translation model.

## m-ArenaHard-v2.0 数据集卡片 本数据集应用于论文《When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs》(链接:https://huggingface.co/papers/2506.20544)。 ### 数据集详情 m-ArenaHard-v2.0 数据集是一款多语言大语言模型(Large Language Model, LLM)评测集,其构建基于 LMarena(前身为 LMSYS)发布的 arena-hard-auto-v2.0 测试数据集(链接:https://github.com/lmarena/arena-hard-auto/tree/main/data/arena-hard-v2.0)。 该数据集初始包含750条提示词(prompt),通过 *papluca/xlm-roberta-base-language-detection* 模型筛选出仅含英语的提示词,最终得到498条英语提示词。随后,我们使用自研的顶尖翻译模型将筛选后的提示词翻译为22种语言,最终构建出包含11454条多语言提示词的评测集。 本数据集涵盖以下23种语言: - 阿拉伯语(ar) - 汉语(zh) - 捷克语(cs) - 荷兰语(nl) - 英语(en) - 法语(fr) - 德语(de) - 希腊语(el) - 希伯来语(he) - 印地语(hi) - 印度尼西亚语(id) - 意大利语(it) - 日语(ja) - 韩语(ko) - 波斯语(fa) - 波兰语(pl) - 葡萄牙语(pt) - 罗马尼亚语(ro) - 俄语(ru) - 西班牙语(es) - 土耳其语(tr) - 乌克兰语(uk) - 越南语(vi) ## 使用Datasets库加载数据集 若需使用Datasets库加载本数据集,请先通过以下命令安装依赖: `pip install datasets --upgrade` 随后使用如下代码加载指定语言子集(以英语为例): python from datasets import load_dataset dataset = load_dataset("CohereLabs/m-ArenaHard-v2.0", "en") 若需加载完整数据集,可通过以下代码拼接所有语言子集: python from datasets import load_dataset, concatenate_datasets, get_dataset_config_names moniker = 'CohereLabs/m-ArenaHard-v2.0' dataset = concatenate_datasets([load_dataset(moniker,subset)['test'] for subset in get_dataset_config_names(moniker)]) ## 数据集结构 以下为韩语子集的单条数据样例: python { 'question_id': 'efe7a57ed35d476b', 'category': 'hard_prompt', 'subcategory': 'coding', 'prompt': '깊게 숨을 들이쉬고 단계별로 생각해 보세요. 사용자의 데이터 기밀성을 보장하기 위해 사용자가 전체 하드 드라이브를 암호화할 수 있도록 돕는 프로그램을 설계하고 작성하세요.', 'language': 'ko' } ## 数据集字段 本数据集包含以下字段: - question_id:示例的唯一标识符(各语言子集的该字段值保持一致) - category:原始数据集的提示词分类 - subcategory:原始数据集的细粒度提示词子分类 - prompt:提示词文本(问题或指令) - language:提示词所属语言 所有语言子集均包含上述统一字段。 ## 作者信息 - 发布机构:Cohere Labs - 行业类型:非营利性科技机构 - 联系方式:https://cohere.com/research ## 授权信息 本数据集可依据Apache 2.0开源协议条款,用于学术研究或商业用途等任何合法场景。 ## 引用信息 @misc{khairi2025lifegivessamplesbenefits, title={When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs}, author={Ammar Khairi and Daniel D'souza and Ye Shen and Julia Kreutzer and Sara Hooker}, year={2025}, eprint={2506.20544}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.20544}, } ## 免责声明 本数据集的22种语言翻译均通过自研的顶尖翻译模型完成。
提供机构:
maas
创建时间:
2025-08-01
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
m-ArenaHard-v2.0是一个多语言LLM评估数据集,包含23种语言的11,454个提示,基于LMarena的英语提示翻译而成。数据集用于评估多语言大模型,采用Apache 2.0许可,适合学术和商业用途。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作