five

bigainlco/LooGLE

收藏
Hugging Face2024-05-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/bigainlco/LooGLE
下载链接
链接失效反馈
官方服务:
资源简介:
LooGLE是一个用于评估大语言模型(LLM)长上下文理解能力的综合基准。该数据集包含最新的(2022年后)且极长的现实文档(每篇文档超过24k个token,许多超过100k字)以及6000个新生成的问题,涵盖多个领域和类别。数据集设计了7个主要任务来评估LLM对短依赖和长依赖内容的理解能力,特别是长依赖任务需要理解跨多个证据片段的相互依赖关系。数据集还提供了自动评估指标、GPT4作为评判标准和人工评估来获得综合性能参考。

LooGLE is a comprehensive evaluation benchmark for LLM long context understanding which contains up-to-date (all after 2022) and extremely long realistic documents (over 24k tokens per document, many of which exceed 100k words) and 6,000 newly generated questions spanning diverse domains and categories. LooGLE is composed of 7 major tasks to evaluate LLMs ability to understand both short and long dependency content. Long dependency tasks require the understanding of the inter-dependency across multiple shreds of evidence widely spanning over the entire long text. The dataset includes 5 types of long dependency tasks, including comprehension and reasoning, computation, timeline reorder, multiple information retrieval, and summarization. LooGLE relies on automatic metrics based on semantic similarity, GPT4-as-judgment and human evaluation to get an overall performance for reference. The dataset provides a systematic and comprehensive evaluation schema and sheds light on the future development of enhanced models toward true long-context understanding.
提供机构:
bigainlco
原始信息汇总

数据集概述

基本信息

  • 语言: 英语和中文
  • 许可证: CC BY-SA 4.0
  • 任务类别: 问答、摘要、文本生成、填空
  • 标签: 长上下文

数据集详情

  • 配置名称: shortdep_qa
  • 特征:
    • input: 字符串类型
    • title: 字符串类型
    • qa_pairs: 字符串类型
    • output: 字符串类型
  • 分割:
    • test: 105个样本,10629114字节
  • 下载大小: 5740706字节
  • 数据集大小: 10629114字节

数据文件

  • 配置名称: shortdep_qa
  • 数据文件:
    • split: test
    • path: shortdep_qa/test-*

数据格式

数据标准化为以下格式: json { "input": "The original long input texts", "title": "The title of the given document", "qa_pairs":[ { "Q": "Question to ask based on the given input", "A": "Groundtruth answer for the question", "S": [ "One or more evidence (complete sentences) for answering the question, which are extracted directly from the original input" ] },
], "output": "none" }

在长依赖问答数据中,每个问题增加了一个额外的键 type,以指示4种长依赖任务类型(除了摘要)。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作