LVEval

Name: LVEval
Creator: maas
Published: 2026-05-14 16:19:42
License: 暂无描述

魔搭社区2026-05-14 更新2024-12-21 收录

下载链接：

https://modelscope.cn/datasets/InfiniAI/LVEval

下载链接

链接失效反馈

官方服务：

资源简介：

LV-Eval是一个具备5个长度等级（16k、32k、64k、128k和256k）、最大文本测试长度达到256k的长文本评测基准。LV-Eval的平均文本长度达到102,380字，最小/最大文本长度为11,896/387,406字。LV-Eval主要有两类评测任务——单跳QA和多跳QA，共包含11个涵盖中英文的评测数据子集。LV-Eval设计时引入3个关键技术：干扰事实插入（Confusiong Facts Insertion，CFI）提高挑战性，关键词和短语替换（Keyword and Phrase Replacement，KPR）减少信息泄漏，以及基于关键词召回的评测指标（Answer Keywords，AK，指代结合答案关键词和字词黑名单的评价指标）提高评测数值客观性。我们希望LV-Eval为未来长文本大语言模型的研究发展提供有价值的性能参考。

LV-Eval is a long-text evaluation benchmark with 5 length tiers (16k, 32k, 64k, 128k, and 256k), with a maximum test text length of 256k. The average text length of LV-Eval is 102,380 Chinese characters, with the minimum and maximum text lengths being 11,896 and 387,406 characters respectively. LV-Eval mainly covers two types of evaluation tasks: single-hop QA and multi-hop QA, and comprises 11 evaluation data subsets covering both Chinese and English. Three core technologies were incorporated into the design of LV-Eval: Confusing Facts Insertion (CFI) to enhance task challenge, Keyword and Phrase Replacement (KPR) to mitigate information leakage, and the keyword-recall-based evaluation metric Answer Keywords (AK, an evaluation metric combining answer keywords and a word blacklist) to improve the objectivity of evaluation results. We anticipate that LV-Eval will provide valuable performance references for the future research and development of long-text large language models.

提供机构：

maas

创建时间：

2024-12-17

搜集汇总

数据集介绍