m-ArenaHard

Name: m-ArenaHard
Creator: maas
Published: 2026-01-06 16:18:38
License: 暂无描述

魔搭社区2026-01-06 更新2024-12-21 收录

下载链接：

https://modelscope.cn/datasets/CohereForAI/m-ArenaHard

下载链接

链接失效反馈

官方服务：

资源简介：

## Dataset Card for m-ArenaHard ### Dataset Details The m-ArenaHard dataset is a multilingual LLM evaluation set. This dataset was created by translating the prompts from the originally English-only LMarena (formerly LMSYS) arena-hard-auto-v0.1 test dataset using Google Translate API v3 to 22 languages. The original English-only prompts were created by Li et al. (2024) and consist of 500 challenging user queries sourced from Chatbot Arena. The authors show that these can be used to perform automatic LLM judge evaluations, which exhibit a high correlation with Chatbot Arena rankings. The 23 languages included in this dataset: - Arabic (ar) - Chinese (zh) - Czech (cs) - Dutch (nl) - English (en) - French (fr) - German (de) - Greek (el) - Hebrew (he) - Hindi (hi) - Indonesian (id) - Italian (it) - Japanese (ja) - Korean (ko) - Persian (fa) - Polish (pl) - Portuguese (pt) - Romanian (ro) - Russian (ru) - Spanish (es) - Turkish (tr) - Ukrainian (uk) - Vietnamese (vi) ### Load with Datasets To load this dataset with Datasets, you'll need to install Datasets as pip install datasets --upgrade and then use the following code: ```python from datasets import load_dataset dataset = load_dataset("CohereLabs/m_ArenaHard", "en") ``` The above code block will load only the English subset of the entire dataset. You can load other subsets by specifying other supported languages of interest or the entire dataset by leaving that argument as blank. ### Dataset Structure An instance of the data from the Persian subset looks as follows: ```python {'question_id': '328c149ed45a41c0b9d6f14659e63599', 'cluster': 'Acrobat PDF Management Tips', 'category': 'arena-hard-v0.1', 'prompt': 'چگونه نوار ابزار را در یک قطعه اضافه کنیم؟' } ``` ### Dataset Fields The following are the fields in the dataset: - question_id: a unique ID for the example - cluster: the topic of the example - category: the original dataset where the example is from - prompt: text of the prompt (question or instruction) All language subsets of the dataset share the same fields as above. ### Authorship - Publishing Organization: [Cohere Labs](https://cohere.com/research) - Industry Type: Not-for-profit - Tech - Contact Details: https://cohere.com/research/aya ### Licensing Information This dataset can be used for any purpose, whether academic or commercial, under the terms of the Apache 2.0 License. ### Citation ```bibtex @misc{dang2024ayaexpansecombiningresearch, title={Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier}, author={John Dang and Shivalika Singh and Daniel D'souza and Arash Ahmadian and Alejandro Salamanca and Madeline Smith and Aidan Peppin and Sungjin Hong and Manoj Govindassamy and Terrence Zhao and Sandra Kublik and Meor Amer and Viraat Aryabumi and Jon Ander Campos and Yi-Chern Tan and Tom Kocmi and Florian Strub and Nathan Grinsztajn and Yannis Flet-Berliac and Acyr Locatelli and Hangyu Lin and Dwarak Talupuru and Bharat Venkitesh and David Cairuz and Bowen Yang and Tim Chung and Wei-Yin Ko and Sylvie Shang Shi and Amir Shukayev and Sammie Bae and Aleksandra Piktus and Roman Castagné and Felipe Cruz-Salinas and Eddie Kim and Lucas Crawhall-Stein and Adrien Morisot and Sudip Roy and Phil Blunsom and Ivan Zhang and Aidan Gomez and Nick Frosst and Marzieh Fadaee and Beyza Ermis and Ahmet Üstün and Sara Hooker}, year={2024}, eprint={2412.04261}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2412.04261}, } ```

### m-ArenaHard 数据集卡片 ### 数据集详情 m-ArenaHard 是一款多语言大语言模型（Large Language Model，LLM）评测数据集。该数据集通过 Google Translate API v3，将原本仅支持英文的 LMarena（前身为 LMSYS）arena-hard-auto-v0.1 测试数据集的提示词翻译为 22 种语言。原始英文提示词由 Li 等人（2024）构建，包含 500 条源自 Chatbot Arena 的高难度用户查询。研究表明，该数据集可用于开展自动大语言模型评测，其评测结果与 Chatbot Arena 的模型排名具有高度相关性。本数据集涵盖 23 种语言： - 阿拉伯语（ar） - 中文（zh） - 捷克语（cs） - 荷兰语（nl） - 英语（en） - 法语（fr） - 德语（de） - 希腊语（el） - 希伯来语（he） - 印地语（hi） - 印度尼西亚语（id） - 意大利语（it） - 日语（ja） - 韩语（ko） - 波斯语（fa） - 波兰语（pl） - 葡萄牙语（pt） - 罗马尼亚语（ro） - 俄语（ru） - 西班牙语（es） - 土耳其语（tr） - 乌克兰语（uk） - 越南语（vi） ### 使用 Datasets 库加载若需通过 Hugging Face Datasets 库加载本数据集，请先通过 `pip install datasets --upgrade` 升级依赖库，随后使用如下代码： python from datasets import load_dataset dataset = load_dataset("CohereLabs/m_ArenaHard", "en") 上述代码仅加载完整数据集中的英文子集。若需加载其他语言子集，可指定对应的支持语言代码；若留空该参数，则可加载完整数据集。 ### 数据集结构波斯语子集的单条数据示例如下： python {'question_id': '328c149ed45a41c0b9d6f14659e63599', 'cluster': 'Acrobat PDF 管理技巧', 'category': 'arena-hard-v0.1', 'prompt': 'چگونه نوار ابزار را در یک قطعه اضافه کنیم؟' } ### 数据集字段本数据集包含以下字段： - question_id：单条示例的唯一标识符 - cluster：示例所属主题集群 - category：示例来源的原始数据集名称 - prompt：提示词（问题或指令）文本所有语言子集均采用上述统一字段格式。 ### 作者信息 - 发布机构：[Cohere Labs](https://cohere.com/research) - 行业类型：非营利性科技行业 - 联系方式：https://cohere.com/research/aya ### 授权信息本数据集可在 Apache 2.0 许可证条款下，用于学术或商业等任意用途。 ### 引用信息 bibtex @misc{dang2024ayaexpansecombiningresearch, title={Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier}, author={John Dang and Shivalika Singh and Daniel D'souza and Arash Ahmadian and Alejandro Salamanca and Madeline Smith and Aidan Peppin and Sungjin Hong and Manoj Govindassamy and Terrence Zhao and Sandra Kublik and Meor Amer and Viraat Aryabumi and Jon Ander Campos and Yi-Chern Tan and Tom Kocmi and Florian Strub and Nathan Grinsztajn and Yannis Flet-Berliac and Acyr Locatelli and Hangyu Lin and Dwarak Talupuru and Bharat Venkitesh and David Cairuz and Bowen Yang and Tim Chung and Wei-Yin Ko and Sylvie Shang Shi and Amir Shukayev and Sammie Bae and Aleksandra Piktus and Roman Castagné and Felipe Cruz-Salinas and Eddie Kim and Lucas Crawhall-Stein and Adrien Morisot and Sudip Roy and Phil Blunsom and Ivan Zhang and Aidan Gomez and Nick Frosst and Marzieh Fadaee and Beyza Ermis and Ahmet Üstün and Sara Hooker}, year={2024}, eprint={2412.04261}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2412.04261}, }

提供机构：

maas

创建时间：

2024-12-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集