RAG retrieves the quality assessment dataset

Name: RAG retrieves the quality assessment dataset
Creator: figshare
Published: 2025-09-06 06:09:10
License: 暂无描述

DataCite Commons2025-09-06 更新2026-02-09 收录

下载链接：

https://figshare.com/articles/dataset/RAG__/30067627/2

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset aims to assess the impact of retrieval quality on the generation effect of the Retrieval Enhancement generation (RAG) system, and it contains a total of 728 samples. During the data collection process, we screened out 728 real user queries of different complexities from the public question-answering datasets (including SQuAD, Natural Questions and WebQuestions), covering various types such as factual questions and answers, multi-hop reasoning, and opinion queries. For each query, a hybrid retrieval strategy based on BM25 and Sentence-BERT is used to obtain relevant documents from the Wikipedia corpus, retrieving 5 to 15 documents for each query.During the data processing stage, we extracted eight key features through a combination of manual annotation and automatic calculation: three annotators independently evaluated the query complexity (1 to 10 points) and took the average. Document relevance is achieved by fusing the relevance score based on annotations (0-1) with the predicted values of the pre-trained model. The semantic similarity between documents is calculated through Sentence-BERT, and the retrieval diversity features are generated by combining the diversity score based on TF-IDF. Automatically calculate the coverage of key entities using the entity linking tool; Quantify information redundancy based on the degree of overlap of document content; Record the actual number of retrieved documents as the retrieval depth; Finally, five experts will rate the generated responses from three dimensions: relevance, accuracy and completeness (0 to 100 points), which will serve as the final quality label. All eigenvalues have undergone standardized processing to ensure that the data range is reasonable and in line with the actual application scenarios, providing a real and reliable research basis for the optimization of the retrieval strategy and quality assessment of the RAG system.

本数据集旨在评估检索质量对检索增强生成（Retrieval Enhancement generation，RAG）系统生成效果的影响，共包含728条样本。在数据采集阶段，我们从涵盖SQuAD、Natural Questions与WebQuestions的公开问答数据集中，筛选出728条复杂度各异的真实用户查询，涵盖事实问答、多跳推理、观点咨询等多种类型。针对每条查询，我们采用基于BM25与Sentence-BERT的混合检索策略，从维基百科语料库中获取相关文档，每条查询可检索得到5至15篇文档。在数据处理阶段，我们通过人工标注与自动计算相结合的方式提取了八项核心特征：三名标注人员独立对查询复杂度进行打分（1至10分）并取平均分；文档相关性通过融合基于标注的相关性得分（0至1分）与预训练模型的预测值得到；文档间的语义相似度通过Sentence-BERT计算，检索多样性特征则结合基于TF-IDF的多样性得分生成；借助实体链接工具自动计算关键实体覆盖率；基于文档内容的重叠程度量化信息冗余度；将实际检索得到的文档数量记为检索深度；最后由五名专家从相关性、准确性与完整性三个维度对生成的回复进行打分（0至100分），该打分结果将作为最终质量标签。所有特征值均经过标准化处理，以确保数据范围合理且贴合实际应用场景，可为检索策略优化与RAG系统的质量评估提供真实可靠的研究支撑。

提供机构：

figshare

创建时间：

2025-09-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集