catbAbI QA-mode (concatenated-bAbI)

Name: catbAbI QA-mode (concatenated-bAbI)
Creator: OpenDataLab
Published: 2026-05-24 12:30:32
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/catbAbI_QA-mode

下载链接

链接失效反馈

官方服务：

资源简介：

我们的目标是改进 bAbI 基准，作为开发智能对话代理的一种手段。为此，我们提出了 concatenated-bAbI (catbAbI)：一个无限序列的 bAbI 故事。 catbAbI 是从 bAbI 数据集生成的，在训练期间，从任何任务中抽取一个随机样本/故事而无需替换，并将其连接到正在进行的故事中。 catbAbI 的预处理解决了几个问题：它删除了支持事实，留下嵌入故事中的问题，在问号后插入正确答案，并将完整样本标记为单个单词序列。因此，catbAbI 旨在以自回归方式进行训练，类似于闭卷问答。 catbAbI 模型可以通过两种不同的方式进行训练：语言建模模式（LM-mode）或问答模式（QA-mode）。在 LM 模式下，catbAbI 模型像自回归词级语言模型一样进行训练。在 QA 模式下，catbAbI 模型仅被训练来预测作为问题答案的标记——使其更类似于常规 bAbI。 QA 模式只是通过掩盖非答案预测的损失来实现的。在这两种训练模式下，模型性能仅通过回答问题时的准确性和困惑度来衡量。

Our goal is to improve the bAbI benchmark as a means of developing intelligent conversational agents. To this end, we propose concatenated-bAbI (catbAbI): an infinite sequence of bAbI stories. CatbAbI is generated from the bAbI dataset: during training, random samples/stories are drawn without replacement from any of the tasks and concatenated to the ongoing story sequence. The preprocessing of catbAbI addresses several issues: it removes supporting facts, retains the questions embedded within the stories, inserts the correct answer immediately after the question mark, and formats the complete sample as a single word sequence. Thus, catbAbI is designed for autoregressive training, similar to closed-book question answering. CatbAbI models can be trained in two distinct modes: language modeling mode (LM-mode) or question answering mode (QA-mode). In LM-mode, catbAbI models are trained as autoregressive word-level language models. In QA-mode, catbAbI models are trained solely to predict the tokens that serve as the answers to the questions, making them more analogous to standard bAbI tasks. QA-mode is implemented simply by masking the loss for non-answer token predictions. For both training modes, model performance is evaluated solely using question answering accuracy and perplexity.

提供机构：

OpenDataLab

创建时间：

2022-09-01

搜集汇总

数据集介绍

背景与挑战

背景概述

catbAbI QA-mode (concatenated-bAbI) 是一个改进bAbI基准的数据集，通过连接随机故事生成无限序列，移除支持事实并将问题与答案嵌入故事中，以自回归方式进行训练。该数据集支持语言建模和问答两种训练模式，其中问答模式专注于预测答案标记，性能通过准确性和困惑度评估，由达勒莫勒人工智能研究所和Microsoft Research于2021年发布。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集