zhihz0535/X-AlpacaEval
收藏Hugging Face2024-01-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zhihz0535/X-AlpacaEval
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
configs:
- config_name: default
data_files:
- split: english
path: english.json
- split: chinese
path: chinese.json
- split: korean
path: korean.json
- split: italian
path: italian.json
- split: spanish
path: spanish.json
task_categories:
- text-generation
- conversational
language:
- en
- zh
- ko
- it
- es
size_categories:
- 1K<n<10K
---
# X-AlpacaEval
[**🤗 Paper**](https://huggingface.co/papers/2311.08711) | [**📖 arXiv**](https://arxiv.org/abs/2311.08711)
### Dataset Description
X-AlpacaEval is an evaluation benchmark for multilingual instruction-tuned large language models (LLMs), including open-ended instructions in 5 languages (English, Chinese, Korean, Italian and Spanish).
It is described in the paper [PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
](https://arxiv.org/abs/2311.08711).
The instructions in this benchmark are translated from the original English version of [AlpacaEval](https://huggingface.co/datasets/tatsu-lab/alpaca_eval).
Translations were completed by professional translators who are native speakers of the target languages.
The data is intended to be used as evaluation data of instruction-tuned LLMs.
Generate responses to X-AlpacaEval instructions with your LLM, and use human, GPT-4, or other LLM judges to evaluate the quality of preference of the response.
GPT-4 evaluation can refer to implementations from the original [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) or [MT-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge).
- **Languages:** English, Chinese, Korean, Italian, Spanish
- **License:** CC BY-NC 4.0
## Uses
Use as input instructions to evaluate instruction-tuned LLMs
### Out-of-Scope Use
- Evaluate foundation LLMs (pre-trained LLMs) without instruction tuning
- Evaluate non-generative (non-autoregressive) models
## Dataset Structure
Each example is composed of 3 fields:
- id: a numeric ID of the example. Examples in different languages with the same ID are translations to each other.
- dataset: AlpacaEval is originally collected from 5 distinct test sets. This field identifies its original source.
- instruction: The instruction to the LLM.
## Citation [optional]
If you find the data useful, please kindly cite our paper:
```
@article{zhang2023plug,
title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning},
author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco},
journal={arXiv preprint arXiv:2311.08711},
year={2023}
}
```
提供机构:
zhihz0535
原始信息汇总
X-AlpacaEval 数据集概述
数据集描述
X-AlpacaEval 是一个用于多语言指令调优大型语言模型(LLMs)的评估基准,包含五种语言(英语、中文、韩语、意大利语和西班牙语)的开放式指令。
语言
- 英语
- 中文
- 韩语
- 意大利语
- 西班牙语
许可
CC BY-NC 4.0
用途
用于评估指令调优的 LLMs 的输入指令。
不适用的用途
- 评估未经指令调优的基础 LLMs(预训练 LLMs)
- 评估非生成(非自回归)模型
数据集结构
每个示例包含三个字段:
- id: 示例的数字 ID,不同语言中相同 ID 的示例是相互翻译的。
- dataset: 原始收集自五个不同的测试集,此字段标识其原始来源。
- instruction: 对 LLM 的指令。
引用
如果该数据对您有用,请引用我们的论文:
@article{zhang2023plug, title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning}, author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco}, journal={arXiv preprint arXiv:2311.08711}, year={2023} }



