XL2Bench
收藏arXiv2024-04-08 更新2024-06-21 收录
下载链接:
https://github.com/nuaa-nlp/XL2Bench
下载链接
链接失效反馈官方服务:
资源简介:
XL2Bench是一个专为极长上下文理解设计的基准,包含小说阅读、论文阅读和法律阅读三个场景,涵盖记忆检索、详细理解、整体理解和开放式生成四个复杂度递增的任务,共27个子任务,支持中英文。数据集平均长度超过100,000字(英文)和200,000字符(中文),旨在全面评估大型语言模型在处理长文本时的能力,解决现有基准在长文本理解上的不足。
XL2Bench is a benchmark specifically designed for extremely long context understanding. It covers three scenarios: novel reading, academic paper reading, and legal reading, and includes four incrementally complex tasks — memory retrieval, detailed comprehension, holistic comprehension, and open-ended generation — totaling 27 subtasks, with support for both Chinese and English. The average length of the dataset exceeds 100,000 words for English texts and 200,000 characters for Chinese texts. It aims to comprehensively evaluate the capabilities of large language models (LLMs) when processing long-form texts, and address the limitations of existing benchmarks in long-text understanding.
提供机构:
南京航空航天大学
创建时间:
2024-04-08



