five

synthetic_multilingual_llm_prompts

收藏
魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/gretelai/synthetic_multilingual_llm_prompts
下载链接
链接失效反馈
官方服务:
资源简介:
<center> <img src="https://cdn-uploads.huggingface.co/production/uploads/64373632c96e0edeaab5c49c/eJKNvvwfOsYCxNx5hl6BT.jpeg" width="600px"> <p><em>Image generated by DALL-E. See <a href="https://huggingface.co/datasets/gretelai/synthetic_multilingual_llm_prompts/blob/main/dalle_prompt.txt">prompt</a> for more details</em></p> </center> # 📝🌐 Synthetic Multilingual LLM Prompts Welcome to the "Synthetic Multilingual LLM Prompts" dataset! This comprehensive collection features 1,250 synthetic LLM prompts generated using Gretel Navigator, available in seven different languages. To ensure accuracy and diversity in prompts, and translation quality and consistency across the different languages, we employed Gretel Navigator both as a generation tool and as an LLM-as-a-judge approach. This dataset is designed to be used with LLMs to generate diverse and multilingual responses based on the provided prompts. We are excited to contribute this dataset directly to the [Awesome ChatGPT Prompts GitHub repository](https://github.com/f/awesome-chatgpt-prompts) and its corresponding dataset on [awesome-chatgpt-prompts](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts) on Hugging Face. We invite the community to explore, utilize, and contribute to this dataset, aiming to enhance the versatility and richness of LLM interactions. **Disclaimer: The translations and overall quality of this dataset are generated synthetically and have not been perfected by human review. As a result, inaccuracies may be present.** ## Dataset Overview This dataset is designed to provide a rich collection of prompts that can be used with the ChatGPT model. Each prompt is available in the following languages: - English (en) - Dutch (nl_NL) - French (fr_FR) - Spanish (es_ES) - German (de_DE) - Portuguese (Brazilian) (pt_BR) - Chinese (Simplified) (zh_CN) ### Dataset Schema The main dataset consists of the following fields: - **id**: Prompt id. - **act**: The role or scenario for which the prompt is designed. - **prompt**: The prompt text in English. - **prompt_nl_NL**: The prompt text in Dutch. - **prompt_fr_FR**: The prompt text in French. - **prompt_es_ES**: The prompt text in Spanish. - **prompt_de_DE**: The prompt text in German. - **prompt_pt_BR**: The prompt text in Brazilian Portuguese. - **prompt_zh_CN**: The prompt text in Simplified Chinese. ### Translation Quality The translation quality of each prompt from English into the target language was assessed using the LLM-as-a-Judge method, powered by Gretel Navigator. Each translation was scored from 1 to 10 based on three key criteria: - **Accuracy**: How accurately the meaning is preserved in the translation. - **Fluency**: How naturally the translated text reads in the target language. - **Consistency**: How consistently terms and phrases are translated. An overall score was then assigned to each translation, representing an average of the accuracy, fluency, and consistency scores. Only prompts with translations that achieved an overall score of 7 or higher were retained in the main dataset. The scores and detailed evaluations for each language can be found in the corresponding CSV files uploaded separately: - `prompt_nl_NL.csv` - `prompt_fr_FR.csv` - `prompt_es_ES.csv` - `prompt_de_DE.csv` - `prompt_pt_BR.csv` - `prompt_zh_CN.csv` ## Usage To use this dataset, you can load it from Hugging Face using the following code snippet: ```python from datasets import load_dataset dataset = load_dataset("gretelai/synthetic_multilingual_llm_prompts", "main") ``` ## License This dataset is released under the Apache 2.0 license, making it open for public use with proper attribution. ## Reference If you use this dataset, please cite it as follows: ```bibtex @software{gretel-synthetic-multilingual-llm-prompts-2024, author = {Van Segbroeck, Maarten and Emadi, Marjan and Nathawani, Dhruv and Ramaswamy, Lipika and Greco, Johnny and Boyd, Kendrick and Grossman, Matthew and Meyer, Yev}, title = {{Synthetic Multilingual LLM Prompts}: A synthetic multilingual prompt dataset for prompting LLMs}, month = {June}, year = {2024}, url = {https://huggingface.co/datasets/gretelai/synthetic_multilingual_llm_prompts} } ``` --- Feel free to reach out if you have any questions or need further assistance. Enjoy using the Synthetic Multilingual LLM Prompts dataset!

<center> <img src="https://cdn-uploads.huggingface.co/production/uploads/64373632c96e0edeaab5c49c/eJKNvvwfOsYCxNx5hl6BT.jpeg" width="600px"> <p><em>本图片由DALL-E生成,详细提示词请参阅<a href="https://huggingface.co/datasets/gretelai/synthetic_multilingual_llm_prompts/blob/main/dalle_prompt.txt">提示词文件</a></em></p> </center> # 📝🌐 合成式多语言大语言模型(LLM)提示词 欢迎使用「合成式多语言大语言模型(LLM)提示词」数据集!本数据集包含1250条由Gretel Navigator生成的合成式大语言模型(LLM)提示词,涵盖7种不同语言。为确保提示词的准确性与多样性,以及不同语言版本间的翻译质量与一致性,我们采用Gretel Navigator同时作为生成工具与「LLM-as-a-judge(大语言模型评判)」方法的实现载体。 本数据集专为大语言模型(LLM)设计,可基于提供的提示词生成多样化的多语言回复。我们很高兴能将本数据集直接提交至[Awesome ChatGPT Prompts GitHub仓库](https://github.com/f/awesome-chatgpt-prompts),以及其在Hugging Face上对应的同名数据集[awesome-chatgpt-prompts](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts)。 我们邀请社区成员探索、使用并为本数据集贡献力量,以期提升大语言模型(LLM)交互的多样性与丰富度。 **免责声明:本数据集的翻译内容与整体质量均为合成生成,未经过人工审核完善,因此可能存在不准确之处。** ## 数据集概览 本数据集旨在提供丰富的提示词集合,可用于ChatGPT模型。每条提示词均支持以下7种语言: - 英语(en) - 荷兰语(nl_NL) - 法语(fr_FR) - 西班牙语(es_ES) - 德语(de_DE) - 巴西葡萄牙语(pt_BR) - 简体中文(zh_CN) ### 数据集结构 主数据集包含以下字段: - **id**:提示词编号 - **act**:提示词对应的角色或应用场景 - **prompt**:英语版本的提示词文本 - **prompt_nl_NL**:荷兰语版本的提示词文本 - **prompt_fr_FR**:法语版本的提示词文本 - **prompt_es_ES**:西班牙语版本的提示词文本 - **prompt_de_DE**:德语版本的提示词文本 - **prompt_pt_BR**:巴西葡萄牙语版本的提示词文本 - **prompt_zh_CN**:简体中文版本的提示词文本 ### 翻译质量评估 我们采用由Gretel Navigator支持的「LLM-as-a-judge(大语言模型评判)」方法,对每条提示词从英语译为目标语言的质量进行评估。每条翻译将基于三项核心标准获得1至10分的评分: - **准确率**:译文对原文语义的保留准确度 - **流畅度**:译文在目标语言中的自然流畅度 - **一致性**:术语与短语翻译的一致性 随后将为每条翻译计算综合得分,即准确率、流畅度与一致性得分的平均值。仅保留综合得分不低于7分的翻译对应的提示词,纳入主数据集。 各语言的评分与详细评估结果可在单独上传的对应CSV文件中查看: - `prompt_nl_NL.csv` - `prompt_fr_FR.csv` - `prompt_es_ES.csv` - `prompt_de_DE.csv` - `prompt_pt_BR.csv` - `prompt_zh_CN.csv` ## 使用方法 您可以通过以下代码片段从Hugging Face加载本数据集: python from datasets import load_dataset dataset = load_dataset("gretelai/synthetic_multilingual_llm_prompts", "main") ## 授权协议 本数据集采用Apache 2.0协议发布,允许在注明来源的前提下公开使用。 ## 引用说明 若您使用本数据集,请按以下格式引用: bibtex @software{gretel-synthetic-multilingual-llm-prompts-2024, author = {Van Segbroeck, Maarten and Emadi, Marjan and Nathawani, Dhruv and Ramaswamy, Lipika and Greco, Johnny and Boyd, Kendrick and Grossman, Matthew and Meyer, Yev}, title = {{Synthetic Multilingual LLM Prompts}: A synthetic multilingual prompt dataset for prompting LLMs}, month = {June}, year = {2024}, url = {https://huggingface.co/datasets/gretelai/synthetic_multilingual_llm_prompts} } --- 如有任何疑问或需要进一步协助,请随时联系我们。祝您使用「合成式多语言大语言模型(LLM)提示词」数据集愉快!
提供机构:
maas
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作