five

JAYADIR/rag-query-transformations

收藏
Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/JAYADIR/rag-query-transformations
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含合成的查询转换,旨在提高检索增强生成(RAG)系统中的检索质量。每个原始查询都配有四种转换后的变体:释义、更广泛、技术和明确的查询。数据来源于MS MARCO,转换由Gemini Flash Lite生成。数据集规模约为66k原始查询和264k转换后的查询。用途包括查询扩展、基于嵌入的检索和训练轻量级模型(如VAE)进行查询转换。局限性包括数据为合成生成、可能存在噪声以及不直接用于QA。

This dataset contains synthetic query transformations generated to improve retrieval quality in Retrieval-Augmented Generation (RAG) systems. Each original query is paired with four transformed variants: paraphrase, broader, technical, and explicit. The data is sourced from MS MARCO, with transformations generated by Gemini Flash Lite. The dataset size is approximately 66k original queries and 264k total transformed queries. Intended uses include query expansion, embedding-based retrieval, and training lightweight models (e.g., VAE) for query transformation. Limitations include synthetic data, potential noise, and not being intended for direct QA.
提供机构:
JAYADIR
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作