caskcsg/LongBench-Pro
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/caskcsg/LongBench-Pro
下载链接
链接失效反馈官方服务:
资源简介:
LongBench Pro是一个更真实、更全面的双语长上下文评估基准数据集,包含1,500个样本,完全基于真实的自然长文档构建。它涵盖11个主要任务和25个次要任务,评估现有基准测试中所有长上下文能力。数据集采用多样化的评估指标,支持对模型能力的细粒度测量,并提供平衡的英文和中文双语样本。此外,LongBench Pro引入了多维分类法,支持在不同操作条件下对模型进行全面评估,包括上下文要求(全局整合与局部检索)、长度(从8k到256k令牌的六种均匀分布)和难度(从易到极难的四个级别)。
LongBench Pro is a more realistic and comprehensive bilingual long-context evaluation benchmark containing 1,500 samples, entirely built on authentic, natural long documents. It includes 11 primary tasks and 25 secondary tasks, covering all long-context capabilities assessed by existing benchmarks. The dataset employs diverse evaluation metrics, enabling a more fine-grained measurement of model abilities, and provides a balanced set of bilingual samples in both English and Chinese. Additionally, LongBench Pro introduces a multi-dimensional taxonomy to support a comprehensive evaluation of models under different operating conditions, including Context Requirement (Full vs. Partial), Length (six lengths uniformly distributed from 8k to 256k tokens), and Difficulty (four levels ranging from Easy to Extreme).
提供机构:
caskcsg



