RUC-NLPIR/OmniEval-Human-Questions

Name: RUC-NLPIR/OmniEval-Human-Questions
Creator: RUC-NLPIR
Published: 2024-12-20 03:27:12
License: 暂无描述

Hugging Face2024-12-20 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/RUC-NLPIR/OmniEval-Human-Questions

下载链接

链接失效反馈

官方服务：

资源简介：

OmniEval是一个金融领域的全方位自动RAG评估基准，它具有多维度的评估框架。该框架包括基于矩阵的RAG场景评估系统，将查询分为五个任务类别和16个金融主题，实现了对多种查询场景的结构化评估。它采用了一种多维度的评估数据生成方法，结合了基于GPT-4的自动生成和人工标注，生成的实例在人工评估中获得了87.47%的接受率。此外，它还具有一个多阶段的评估系统，能够评估检索和生成性能，从而对RAG管道进行全面的评估。该基准使用基于规则和LLM的稳健评估指标，通过人工标注和对LLM评估器的监督微调，提高了评估的可靠性。

OmniEval is an all-directional and automatic RAG evaluation benchmark in the financial domain with a multi-dimensional evaluation framework. This framework includes a matrix-based RAG scenario evaluation system that categorizes queries into five task types and 16 financial topics, achieving structured assessment of diverse query scenarios. It employs a multi-dimensional evaluation data generation approach, combining GPT-4-based automatic generation with human annotation, resulting in an 87.47% acceptance rate in human evaluations of generated instances. Moreover, it features a multi-stage evaluation system that assesses both retrieval and generation performance, leading to a comprehensive evaluation of the RAG pipeline. The benchmark uses robust evaluation metrics derived from rule-based and LLM-based ones, enhancing the reliability of assessments through manual annotations and supervised fine-tuning of an LLM evaluator.

提供机构：

RUC-NLPIR

5,000+

优质数据集

54 个

任务类型

进入经典数据集