five

RAG retrieves the quality assessment dataset

收藏
DataCite Commons2025-09-06 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/RAG__/30067627
下载链接
链接失效反馈
官方服务:
资源简介:
<b><i>This dataset aims to assess the impact of retrieval quality on the generation effect of the Retrieval Enhancement generation (RAG) system, and it contains a total of 728 samples. During the data collection process, we screened out 728 real user queries of different complexities from the public question-answering datasets (including SQuAD, Natural Questions and WebQuestions), covering various types such as factual questions and answers, multi-hop reasoning, and opinion queries. For each query, a hybrid retrieval strategy based on BM25 and Sentence-BERT is used to obtain relevant documents from the Wikipedia corpus, retrieving 5 to 15 documents for each query.</i></b><b><i>During the data processing stage, we extracted eight key features through a combination of manual annotation and automatic calculation: three annotators independently evaluated the query complexity (1 to 10 points) and took the average. Document relevance is achieved by fusing the relevance score based on annotations (0-1) with the predicted values of the pre-trained model. The semantic similarity between documents is calculated through Sentence-BERT, and the retrieval diversity features are generated by combining the diversity score based on TF-IDF. Automatically calculate the coverage of key entities using the entity linking tool; Quantify information redundancy based on the degree of overlap of document content; Record the actual number of retrieved documents as the retrieval depth; Finally, five experts will rate the generated responses from three dimensions: relevance, accuracy and completeness (0 to 100 points), which will serve as the final quality label. All eigenvalues have undergone standardized processing to ensure that the data range is reasonable and in line with the actual application scenarios, providing a real and reliable research basis for the optimization of the retrieval strategy and quality assessment of the RAG system.</i></b>
提供机构:
figshare
创建时间:
2025-09-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作