FRAMES-benchmark 检索增强生成测试集

超神经2024-10-10 更新2024-12-14 收录

下载链接：

https://hyper.ai/cn/datasets/34835

下载链接

链接失效反馈

官方服务：

资源简介：

FRAMES-benchmark 是一个由 Google 于 2024 年发布的综合评估数据集，它旨在测试检索增强生成 (RAG) 系统在事实性、检索准确性和推理方面的能力。相关论文成果为「Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation」。该数据集包含 824 个具有挑战性的多跳问题，这些问题需要从 2 到 15 篇维基百科文章中获取信息。问题涵盖了历史、体育、科学、动物、健康等多个主题，并且每个问题都标有推理类型，如数值、表格、多重约束、时间性和后处理。数据集还提供了每个问题的黄金答案和相关的维基百科文章。

The FRAMES-benchmark is a comprehensive evaluation dataset released by Google in 2024, designed to assess the capabilities of Retrieval-Augmented Generation (RAG) systems across three key dimensions: factuality, retrieval accuracy, and reasoning. The corresponding research paper is titled "Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation". This dataset comprises 824 challenging multi-hop questions that necessitate retrieving information from 2 to 15 Wikipedia articles. The questions cover a broad range of domains including history, sports, science, animals, health, and more, with each question annotated with specific reasoning categories such as numerical, tabular, multiple constraints, temporal, and post-processing. Furthermore, the dataset provides the gold-standard answers and the relevant supporting Wikipedia articles for each individual question.

创建时间：

2024-10-09

搜集汇总

数据集介绍

背景与挑战

背景概述

FRAMES-benchmark是由Google发布的检索增强生成测试集，包含824个多跳问题，覆盖历史、体育等多个主题，旨在测试RAG系统在事实性、检索准确性和推理方面的能力。该数据集设计为对最先进语言模型具有挑战性，可用于评估RAG系统性能和基准测试语言模型。

以上内容由遇见数据集搜集并总结生成