gretelai/synthetic_multilingual_llm_prompts

Name: gretelai/synthetic_multilingual_llm_prompts
Creator: gretelai
Published: 2024-07-03 15:07:33
License: 暂无描述

Hugging Face2024-07-03 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/gretelai/synthetic_multilingual_llm_prompts

下载链接

链接失效反馈

官方服务：

资源简介：

欢迎使用Synthetic Multilingual LLM Prompts数据集！该数据集包含1250个由Gretel Navigator生成的合成LLM提示，支持七种不同语言。为确保提示的准确性和多样性，以及翻译质量和一致性，我们使用了Gretel Navigator作为生成工具，并采用了LLM-as-a-Judge方法进行评估。该数据集旨在与LLM一起使用，以生成多样化和多语言的响应。数据集的结构包括提示的ID、角色或场景、英文提示文本以及六种其他语言的翻译文本。翻译质量通过LLM-as-a-Judge方法评估，保留了评分在7分及以上的翻译。

Welcome to the Synthetic Multilingual LLM Prompts Dataset! This dataset contains 1,250 synthetic LLM prompts generated by Gretel Navigator, supporting seven distinct languages. To ensure the accuracy, diversity, translation quality and consistency of the prompts, we employed Gretel Navigator as the generation tool and adopted the LLM-as-a-Judge evaluation method. This dataset is intended for use with LLMs to generate diverse and multilingual responses. The dataset structure includes the prompt ID, role or scenario, English prompt text, and translated texts in six other languages. Translation quality was evaluated using the LLM-as-a-Judge method, and only translations with a score of 7 or higher were retained.

提供机构：

gretelai

原始信息汇总

合成多语言LLM提示数据集

概述

该数据集包含1,250个使用Gretel Navigator生成的合成LLM提示，涵盖七种不同语言。数据集旨在与LLM一起使用，生成基于所提供提示的多样化和多语言响应。

语言支持

英语 (en)
荷兰语 (nl_NL)
法语 (fr_FR)
西班牙语 (es_ES)
德语 (de_DE)
葡萄牙语 (巴西) (pt_BR)
中文 (简体) (zh_CN)

数据集结构

数据集包含以下字段：

id: 提示ID。
act: 提示设计的角色或场景。
prompt: 英语提示文本。
prompt_nl_NL: 荷兰语提示文本。
prompt_fr_FR: 法语提示文本。
prompt_es_ES: 西班牙语提示文本。
prompt_de_DE: 德语提示文本。
prompt_pt_BR: 巴西葡萄牙语提示文本。
prompt_zh_CN: 简体中文提示文本。

翻译质量

每个提示的翻译质量通过LLM-as-a-Judge方法评估，基于以下三个标准：

准确性: 翻译中意义的保留程度。
流畅性: 翻译文本在目标语言中的自然度。
一致性: 术语和短语翻译的一致性。

每个翻译的总体评分是准确性、流畅性和一致性评分的平均值。只有评分达到7分或以上的翻译才会被保留在主数据集中。

使用方法

可以通过以下代码片段从Hugging Face加载数据集： python from datasets import load_dataset

dataset = load_dataset("gretelai/synthetic_multilingual_llm_prompts")

许可证

该数据集遵循Apache 2.0许可证，可公开使用并需适当署名。

引用

如使用该数据集，请按以下方式引用： bibtex @software{gretel-synthetic-multilingual-llm-prompts-2024, author = {Van Segbroeck, Maarten and Emadi, Marjan and Nathawani, Dhruv and Ramaswamy, Lipika and Greco, Johnny and Boyd, Kendrick and Grossman, Matthew and Meyer, Yev}, title = {{Synthetic Multilingual LLM Prompts}: A synthetic multilingual prompt dataset for prompting LLMs}, month = {June}, year = {2024}, url = {https://huggingface.co/datasets/gretelai/synthetic_multilingual_llm_prompts} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集