five

zhihz0535/X-AlpacaEval

收藏
Hugging Face2024-01-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zhihz0535/X-AlpacaEval
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 configs: - config_name: default data_files: - split: english path: english.json - split: chinese path: chinese.json - split: korean path: korean.json - split: italian path: italian.json - split: spanish path: spanish.json task_categories: - text-generation - conversational language: - en - zh - ko - it - es size_categories: - 1K<n<10K --- # X-AlpacaEval [**🤗 Paper**](https://huggingface.co/papers/2311.08711) | [**📖 arXiv**](https://arxiv.org/abs/2311.08711) ### Dataset Description X-AlpacaEval is an evaluation benchmark for multilingual instruction-tuned large language models (LLMs), including open-ended instructions in 5 languages (English, Chinese, Korean, Italian and Spanish). It is described in the paper [PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning ](https://arxiv.org/abs/2311.08711). The instructions in this benchmark are translated from the original English version of [AlpacaEval](https://huggingface.co/datasets/tatsu-lab/alpaca_eval). Translations were completed by professional translators who are native speakers of the target languages. The data is intended to be used as evaluation data of instruction-tuned LLMs. Generate responses to X-AlpacaEval instructions with your LLM, and use human, GPT-4, or other LLM judges to evaluate the quality of preference of the response. GPT-4 evaluation can refer to implementations from the original [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) or [MT-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge). - **Languages:** English, Chinese, Korean, Italian, Spanish - **License:** CC BY-NC 4.0 ## Uses Use as input instructions to evaluate instruction-tuned LLMs ### Out-of-Scope Use - Evaluate foundation LLMs (pre-trained LLMs) without instruction tuning - Evaluate non-generative (non-autoregressive) models ## Dataset Structure Each example is composed of 3 fields: - id: a numeric ID of the example. Examples in different languages with the same ID are translations to each other. - dataset: AlpacaEval is originally collected from 5 distinct test sets. This field identifies its original source. - instruction: The instruction to the LLM. ## Citation [optional] If you find the data useful, please kindly cite our paper: ``` @article{zhang2023plug, title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning}, author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco}, journal={arXiv preprint arXiv:2311.08711}, year={2023} } ```
提供机构:
zhihz0535
原始信息汇总

X-AlpacaEval 数据集概述

数据集描述

X-AlpacaEval 是一个用于多语言指令调优大型语言模型(LLMs)的评估基准,包含五种语言(英语、中文、韩语、意大利语和西班牙语)的开放式指令。

语言

  • 英语
  • 中文
  • 韩语
  • 意大利语
  • 西班牙语

许可

CC BY-NC 4.0

用途

用于评估指令调优的 LLMs 的输入指令。

不适用的用途

  • 评估未经指令调优的基础 LLMs(预训练 LLMs)
  • 评估非生成(非自回归)模型

数据集结构

每个示例包含三个字段:

  • id: 示例的数字 ID,不同语言中相同 ID 的示例是相互翻译的。
  • dataset: 原始收集自五个不同的测试集,此字段标识其原始来源。
  • instruction: 对 LLM 的指令。

引用

如果该数据对您有用,请引用我们的论文:

@article{zhang2023plug, title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning}, author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco}, journal={arXiv preprint arXiv:2311.08711}, year={2023} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作