avemio/German-RAG-LLM-HARD-BENCHMARK
收藏Hugging Face2025-02-06 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/avemio/German-RAG-LLM-HARD-BENCHMARK
下载链接
链接失效反馈官方服务:
资源简介:
German-RAG-LLM-HARD基准是一个专注于评估语言模型在解决困难的RAG特定能力上的专业数据集集合。它包括多个子集,用于不同的任务,如带有多引用的硬问答、德语和英语的硬推理任务,以及会议主题和参会者主题的总结。数据集基于合成数据生成,灵感来源于腾讯的论文,并且利用PersonaHub数据集进行了增强。数据集质量通过自动验证和开源LLM模型的审查来保证。
The German-RAG-LLM-HARD Benchmark is a specialized collection focused on evaluating language models on their capability to solve hard RAG-specific tasks. It includes multiple subsets for different tasks such as hard question-answering with multiple references, hard reasoning tasks in German and English, and summarization of meeting topics and attendee topics. The dataset is based on synthetic data generation inspired by Tencents paper and enhanced using the PersonaHub dataset. The quality of the dataset is ensured through automatic validation and curation by open-source LLMs.
提供机构:
avemio



