fang0608/LongBench-Pro
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/fang0608/LongBench-Pro
下载链接
链接失效反馈官方服务:
资源简介:
LongBench Pro是一个更真实、更全面的双语长上下文评估基准,包含1,500个样本,完全基于真实、自然的长文档构建,涵盖11个主要任务和25个次要任务,覆盖了现有基准评估的所有长上下文能力。它采用多样化的评估指标,能够更精细地测量模型能力,并提供平衡的中英文双语样本。此外,LongBench Pro引入了多维分类法,支持在不同操作条件下对模型进行全面评估,包括上下文需求(全局整合与局部检索)、长度(从8k到256k令牌的六个均匀分布长度)和难度(从易到极难的四个级别)。
LongBench Pro, containing 1,500 samples, is entirely built on authentic, natural long documents and includes 11 primary tasks and 25 secondary tasks, covering all long-context capabilities assessed by existing benchmarks. It employs diverse evaluation metrics, enabling a more fine-grained measurement of model abilities, and provides a balanced set of bilingual samples in both English and Chinese. In addition, LongBench Pro introduces a multi-dimensional taxonomy to support a comprehensive evaluation of models under different operating conditions: Context Requirement (Full context versus Partial context), Length (Six lengths uniformly distributed from 8k to 256k tokens), and Difficulty (Four levels ranging from Easy to Extreme).
提供机构:
fang0608



