five

RUC-NLPIR/OmniEval-Human-Questions

收藏
Hugging Face2024-12-20 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/RUC-NLPIR/OmniEval-Human-Questions
下载链接
链接失效反馈
官方服务:
资源简介:
OmniEval是一个金融领域的全方位自动RAG评估基准,它具有多维度的评估框架。该框架包括基于矩阵的RAG场景评估系统,将查询分为五个任务类别和16个金融主题,实现了对多种查询场景的结构化评估。它采用了一种多维度的评估数据生成方法,结合了基于GPT-4的自动生成和人工标注,生成的实例在人工评估中获得了87.47%的接受率。此外,它还具有一个多阶段的评估系统,能够评估检索和生成性能,从而对RAG管道进行全面的评估。该基准使用基于规则和LLM的稳健评估指标,通过人工标注和对LLM评估器的监督微调,提高了评估的可靠性。

OmniEval is an all-directional and automatic RAG evaluation benchmark in the financial domain with a multi-dimensional evaluation framework. This framework includes a matrix-based RAG scenario evaluation system that categorizes queries into five task types and 16 financial topics, achieving structured assessment of diverse query scenarios. It employs a multi-dimensional evaluation data generation approach, combining GPT-4-based automatic generation with human annotation, resulting in an 87.47% acceptance rate in human evaluations of generated instances. Moreover, it features a multi-stage evaluation system that assesses both retrieval and generation performance, leading to a comprehensive evaluation of the RAG pipeline. The benchmark uses robust evaluation metrics derived from rule-based and LLM-based ones, enhancing the reliability of assessments through manual annotations and supervised fine-tuning of an LLM evaluator.
提供机构:
RUC-NLPIR
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作