JAYADIR/rag-query-transformations
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/JAYADIR/rag-query-transformations
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含合成的查询转换,旨在提高检索增强生成(RAG)系统中的检索质量。每个原始查询都配有四种转换后的变体:释义、更广泛、技术和明确的查询。数据来源于MS MARCO,转换由Gemini Flash Lite生成。数据集规模约为66k原始查询和264k转换后的查询。用途包括查询扩展、基于嵌入的检索和训练轻量级模型(如VAE)进行查询转换。局限性包括数据为合成生成、可能存在噪声以及不直接用于QA。
This dataset contains synthetic query transformations generated to improve retrieval quality in Retrieval-Augmented Generation (RAG) systems. Each original query is paired with four transformed variants: paraphrase, broader, technical, and explicit. The data is sourced from MS MARCO, with transformations generated by Gemini Flash Lite. The dataset size is approximately 66k original queries and 264k total transformed queries. Intended uses include query expansion, embedding-based retrieval, and training lightweight models (e.g., VAE) for query transformation. Limitations include synthetic data, potential noise, and not being intended for direct QA.
提供机构:
JAYADIR



