RMT-team/babilong

Name: RMT-team/babilong
Creator: RMT-team
Published: 2024-06-17 09:49:52
License: 暂无描述

Hugging Face2024-06-17 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/RMT-team/babilong

下载链接

链接失效反馈

官方服务：

资源简介：

BABILong是一个用于评估NLP模型处理长文档能力的生成式基准数据集。它包含11种配置，对应不同的序列长度（0k到1M tokens）。数据集结合了bAbI数据集的事实和PG19的背景文本，生成长度可能达到数百万tokens的测试样本。BABILong包含10个任务，旨在评估基本推理能力。每个任务基于不同数量和复杂程度的事实，要求模型从大量无关信息中识别重要信息。

BABILong is a generative benchmark for evaluating the performance of NLP models in processing arbitrarily long documents with distributed facts. It contains 11 configs, corresponding to different sequence lengths in tokens: 0k, 1k, 2k, 4k, 8k, 16k, 32k, 128k, 256k, 512k, 1M. The dataset combines facts from the bAbI dataset and background text from PG19, resulting in test samples that might have lengths of millions of tokens. BABILong consists of 10 tasks designed for evaluation of basic aspects of reasoning. Each task is based on a varying number and complexity of facts, requiring the model to distinguish important information from large amounts of irrelevant details.

提供机构：

RMT-team

原始信息汇总

数据集概述

数据集名称

BABILong

数据集描述

BABILong 是一个用于评估 NLP 模型在处理任意长文档中分布式事实性能的生成基准。该数据集包含 11 个配置，对应不同的序列长度（以 token 计）：0k, 1k, 2k, 4k, 8k, 16k, 32k, 128k, 256k, 512k, 1M。

数据文件结构

每个配置包含多个数据文件，每个文件对应不同的拆分（split）和路径（path）。以下是部分配置的示例：

0k 配置
- split: qa1, path: data/qa1/0k.json
- split: qa2, path: data/qa2/0k.json
- ...
- split: qa20, path: data/qa20/0k.json
1k 配置
- split: qa1, path: data/qa1/1k.json
- split: qa2, path: data/qa2/1k.json
- ...
- split: qa10, path: data/qa10/1k.json
2k 配置
- split: qa1, path: data/qa1/2k.json
- split: qa2, path: data/qa2/2k.json
- ...
- split: qa10, path: data/qa10/2k.json
4k 配置
- split: qa1, path: data/qa1/4k.json
- split: qa2, path: data/qa2/4k.json
- ...
- split: qa10, path: data/qa10/4k.json
8k 配置
- split: qa1, path: data/qa1/8k.json
- split: qa2, path: data/qa2/8k.json
- ...
- split: qa10, path: data/qa10/8k.json
16k 配置
- split: qa1, path: data/qa1/16k.json
- split: qa2, path: data/qa2/16k.json
- ...
- split: qa10, path: data/qa10/16k.json
32k 配置
- split: qa1, path: data/qa1/32k.json
- split: qa2, path: data/qa2/32k.json
- ...
- split: qa10, path: data/qa10/32k.json
64k 配置
- split: qa1, path: data/qa1/64k.json
- split: qa2, path: data/qa2/64k.json
- ...
- split: qa10, path: data/qa10/64k.json
128k 配置
- split: qa1, path: data/qa1/128k.json
- split: qa2, path: data/qa2/128k.json
- ...
- split: qa10, path: data/qa10/128k.json
256k 配置
- split: qa1, path: data/qa1/256k.json
- split: qa2, path: data/qa2/256k.json
- ...
- split: qa10, path: data/qa10/256k.json
512k 配置
- split: qa1, path: data/qa1/512k.json
- split: qa2, path: data/qa2/512k.json
- ...
- split: qa10, path: data/qa10/512k.json
1M 配置
- split: qa1, path: data/qa1/1M.json
- split: qa2, path: data/qa2/1M.json
- ...
- split: qa10, path: data/qa10/1M.json

数据集任务

BABILong 包含 10 个任务，用于评估基本推理方面的性能。这些任务基于 bAbI 数据集，模拟一组角色和对象在多个位置的移动和交互。每个交互由一个事实表示，例如“Mary 去了办公室”，任务是使用当前模拟中的事实来回答问题，例如“Mary 在哪里？”。

任务详情

任务	名称	每个任务的事实数	每个任务的支持事实数
qa1	单一支持事实	2 - 10	1
qa2	两个支持事实	2 - 68	2
qa3	三个支持事实	4 - 32	3
qa4	两个参数关系	2	1
qa5	三个参数关系	2 - 126	1
qa6	是非问题	2 - 26	1
qa7	计数	2 - 52	1-10
qa8	列表-集合	2 - 50	1-8
qa9	简单否定	2 - 10	1
qa10	不定知识	2 - 10	1

数据集加载示例

python from datasets import load_dataset babilong = load_dataset("RMT-team/babilong", "128k")["qa1"]

5,000+

优质数据集

54 个

任务类型

进入经典数据集