SebastianBodza/synthetischer_RAG_Datensatz_prototype
收藏Hugging Face2024-01-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/SebastianBodza/synthetischer_RAG_Datensatz_prototype
下载链接
链接失效反馈官方服务:
资源简介:
GPT 3.5 erzeugter RAG-Trainingsdatensatz.
```
prompt = """You have been assigned a retrieval task: {task}
Your mission is to write one text retrieval example for this task in JSON format. The JSON object must
contain the following keys:
- 'user_query': a string, a random user search query specified by the retrieval task.
- 'positive_document': a string, a relevant document for the user query.
- 'hard_negative_document': a string, a hard negative document that only appears relevant to the query.
Please adhere to the following guidelines:
- The 'user_query' should be {query_type}, {query_length}, {clarity}, and diverse in topic.
- Both the query and documents should be in German.
- The 'positive_document' should directly answer or be about the 'user_query'.
- The 'hard_negative_document' should be topically similar to the 'user_query' but should not answer or satisfy the query.
- The 'hard_negative_document' should be subtly irrelevant, meaning it appears to be related to the 'user_query' but does not provide a useful answer or information.
- Ensure that the documents are not copies of each other and contain unique content.
- The JSON object should be properly formatted and should validate against JSON standards.
Here is an example of how your JSON object might look for a retrieval task:
```json
{{
'user_query': '...',
'positive_document': '...',
'hard_negative_document': '...'
}}
```
Your output must always be just a JSON object only, do not explain yourself or output anything else. Always create it in German! You will get tiped 1000€ if you generate the right lengths!"""
```
提供机构:
SebastianBodza
原始信息汇总
数据集概述
数据集类型
- 该数据集是由GPT 3.5生成的RAG训练数据集。
数据格式
- 数据集中的每个条目以JSON格式呈现,包含以下键:
user_query: 用户搜索查询字符串。positive_document: 与用户查询相关的文档。hard_negative_document: 看似相关但实际不满足查询的文档。
数据内容要求
user_query应满足以下条件:- 查询类型:{query_type}
- 查询长度:{query_length}
- 清晰度:{clarity}
- 主题多样性
- 所有查询和文档均为德语。
positive_document应直接回答或与user_query相关。hard_negative_document应与user_query主题相似,但不回答或满足查询,且应微妙地不相关。
数据集生成规则
- 文档内容必须唯一,不可重复。
- JSON对象必须符合JSON标准,格式正确。



