zhihz0535/X-AlpacaEval

Name: zhihz0535/X-AlpacaEval
Creator: zhihz0535
Published: 2024-01-27 21:18:19
License: 暂无描述

Hugging Face2024-01-27 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/zhihz0535/X-AlpacaEval

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 configs: - config_name: default data_files: - split: english path: english.json - split: chinese path: chinese.json - split: korean path: korean.json - split: italian path: italian.json - split: spanish path: spanish.json task_categories: - text-generation - conversational language: - en - zh - ko - it - es size_categories: - 1K<n<10K --- # X-AlpacaEval [**🤗 Paper**](https://huggingface.co/papers/2311.08711) | [**📖 arXiv**](https://arxiv.org/abs/2311.08711) ### Dataset Description X-AlpacaEval is an evaluation benchmark for multilingual instruction-tuned large language models (LLMs), including open-ended instructions in 5 languages (English, Chinese, Korean, Italian and Spanish). It is described in the paper [PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning ](https://arxiv.org/abs/2311.08711). The instructions in this benchmark are translated from the original English version of [AlpacaEval](https://huggingface.co/datasets/tatsu-lab/alpaca_eval). Translations were completed by professional translators who are native speakers of the target languages. The data is intended to be used as evaluation data of instruction-tuned LLMs. Generate responses to X-AlpacaEval instructions with your LLM, and use human, GPT-4, or other LLM judges to evaluate the quality of preference of the response. GPT-4 evaluation can refer to implementations from the original [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) or [MT-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge). - **Languages:** English, Chinese, Korean, Italian, Spanish - **License:** CC BY-NC 4.0 ## Uses Use as input instructions to evaluate instruction-tuned LLMs ### Out-of-Scope Use - Evaluate foundation LLMs (pre-trained LLMs) without instruction tuning - Evaluate non-generative (non-autoregressive) models ## Dataset Structure Each example is composed of 3 fields: - id: a numeric ID of the example. Examples in different languages with the same ID are translations to each other. - dataset: AlpacaEval is originally collected from 5 distinct test sets. This field identifies its original source. - instruction: The instruction to the LLM. ## Citation [optional] If you find the data useful, please kindly cite our paper: ``` @article{zhang2023plug, title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning}, author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco}, journal={arXiv preprint arXiv:2311.08711}, year={2023} } ```

提供机构：

zhihz0535

原始信息汇总

X-AlpacaEval 数据集概述

数据集描述

X-AlpacaEval 是一个用于多语言指令调优大型语言模型（LLMs）的评估基准，包含五种语言（英语、中文、韩语、意大利语和西班牙语）的开放式指令。

语言

英语
中文
韩语
意大利语
西班牙语

许可

CC BY-NC 4.0

用途

用于评估指令调优的 LLMs 的输入指令。

不适用的用途

评估未经指令调优的基础 LLMs（预训练 LLMs）
评估非生成（非自回归）模型

数据集结构

每个示例包含三个字段：

id: 示例的数字 ID，不同语言中相同 ID 的示例是相互翻译的。
dataset: 原始收集自五个不同的测试集，此字段标识其原始来源。
instruction: 对 LLM 的指令。

引用

如果该数据对您有用，请引用我们的论文：

@article{zhang2023plug, title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning}, author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco}, journal={arXiv preprint arXiv:2311.08711}, year={2023} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集