Loong
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/mozerwang/loong
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个创新的基准,旨在通过扩展的多文档问答任务来评估长上下文语言模型,其设计理念与现实场景相契合。Loong数据集引入了四种类型的任务:焦点定位、比较、聚类和推理链,旨在全面评估模型对长上下文的理解能力。该数据集覆盖了不同任务中的多种上下文长度,任务类型为扩展的多文档问答。
This dataset is an innovative benchmark designed to evaluate long-context language models via extended multi-document question answering tasks, with its design philosophy aligned with real-world application scenarios. The Loong Dataset introduces four task categories: focus localization, comparison, clustering, and reasoning chain, which aim to comprehensively assess models' long-context understanding capabilities. This dataset covers varying context lengths across diverse tasks, all falling under the scope of extended multi-document question answering.



