SebastianBodza/synthetischer_RAG_Datensatz_prototype

Name: SebastianBodza/synthetischer_RAG_Datensatz_prototype
Creator: SebastianBodza
Published: 2024-01-05 19:03:30
License: 暂无描述

Hugging Face2024-01-05 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/SebastianBodza/synthetischer_RAG_Datensatz_prototype

下载链接

链接失效反馈

官方服务：

资源简介：

GPT 3.5 erzeugter RAG-Trainingsdatensatz. ``` prompt = """You have been assigned a retrieval task: {task} Your mission is to write one text retrieval example for this task in JSON format. The JSON object must contain the following keys: - 'user_query': a string, a random user search query specified by the retrieval task. - 'positive_document': a string, a relevant document for the user query. - 'hard_negative_document': a string, a hard negative document that only appears relevant to the query. Please adhere to the following guidelines: - The 'user_query' should be {query_type}, {query_length}, {clarity}, and diverse in topic. - Both the query and documents should be in German. - The 'positive_document' should directly answer or be about the 'user_query'. - The 'hard_negative_document' should be topically similar to the 'user_query' but should not answer or satisfy the query. - The 'hard_negative_document' should be subtly irrelevant, meaning it appears to be related to the 'user_query' but does not provide a useful answer or information. - Ensure that the documents are not copies of each other and contain unique content. - The JSON object should be properly formatted and should validate against JSON standards. Here is an example of how your JSON object might look for a retrieval task: ```json {{ 'user_query': '...', 'positive_document': '...', 'hard_negative_document': '...' }} ``` Your output must always be just a JSON object only, do not explain yourself or output anything else. Always create it in German! You will get tiped 1000€ if you generate the right lengths!""" ```

提供机构：

SebastianBodza

原始信息汇总

数据集概述

数据集类型

该数据集是由GPT 3.5生成的RAG训练数据集。

数据格式

数据集中的每个条目以JSON格式呈现，包含以下键：
- user_query: 用户搜索查询字符串。
- positive_document: 与用户查询相关的文档。
- hard_negative_document: 看似相关但实际不满足查询的文档。

数据内容要求

user_query应满足以下条件：
- 查询类型：{query_type}
- 查询长度：{query_length}
- 清晰度：{clarity}
- 主题多样性
所有查询和文档均为德语。
positive_document应直接回答或与user_query相关。
hard_negative_document应与user_query主题相似，但不回答或满足查询，且应微妙地不相关。

数据集生成规则

文档内容必须唯一，不可重复。
JSON对象必须符合JSON标准，格式正确。

5,000+

优质数据集

54 个

任务类型

进入经典数据集