FRAMES-benchmark 检索增强生成测试集
收藏超神经2024-10-10 更新2024-12-14 收录
下载链接:
https://hyper.ai/cn/datasets/34835
下载链接
链接失效反馈官方服务:
资源简介:
FRAMES-benchmark 是一个由 Google 于 2024 年发布的综合评估数据集,它旨在测试检索增强生成 (RAG) 系统在事实性、检索准确性和推理方面的能力。相关论文成果为「Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation」。该数据集包含 824 个具有挑战性的多跳问题,这些问题需要从 2 到 15 篇维基百科文章中获取信息。问题涵盖了历史、体育、科学、动物、健康等多个主题,并且每个问题都标有推理类型,如数值、表格、多重约束、时间性和后处理。数据集还提供了每个问题的黄金答案和相关的维基百科文章。
The FRAMES-benchmark is a comprehensive evaluation dataset released by Google in 2024, designed to assess the capabilities of Retrieval-Augmented Generation (RAG) systems across three key dimensions: factuality, retrieval accuracy, and reasoning. The corresponding research paper is titled "Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation". This dataset comprises 824 challenging multi-hop questions that necessitate retrieving information from 2 to 15 Wikipedia articles. The questions cover a broad range of domains including history, sports, science, animals, health, and more, with each question annotated with specific reasoning categories such as numerical, tabular, multiple constraints, temporal, and post-processing. Furthermore, the dataset provides the gold-standard answers and the relevant supporting Wikipedia articles for each individual question.
创建时间:
2024-10-09
搜集汇总
数据集介绍

背景与挑战
背景概述
FRAMES-benchmark是由Google发布的检索增强生成测试集,包含824个多跳问题,覆盖历史、体育等多个主题,旨在测试RAG系统在事实性、检索准确性和推理方面的能力。该数据集设计为对最先进语言模型具有挑战性,可用于评估RAG系统性能和基准测试语言模型。
以上内容由遇见数据集搜集并总结生成



