CHAPTERBREAK

Name: CHAPTERBREAK
Creator: 马萨诸塞大学阿默斯特分校
Published: 2022-04-23 02:20:23
License: 暂无描述

arXiv2022-04-23 更新2024-06-21 收录

下载链接：

https://github.com/SimengSun/ChapterBreak

下载链接

链接失效反馈

官方服务：

资源简介：

CHAPTERBREAK是一个针对长距离语言模型（LRLMs）的挑战性数据集，由马萨诸塞大学阿默斯特分校的研究团队创建。该数据集包含从PG-19验证集和Archive of Our Own (AO3)网站收集的长期叙事文本，旨在评估LRLMs在处理复杂章节转换时的能力，如并行叙事、悬念结尾等。数据集通过自动检测章节边界构建，包含多种类型的章节转换，需要模型处理全局上下文以理解。CHAPTERBREAK的应用领域主要集中在评估和改进LRLMs在长文本处理中的性能，特别是在理解长距离依赖和全局叙事结构方面。

CHAPTERBREAK is a challenging dataset for long-range language models (LRLMs), created by a research team from the University of Massachusetts Amherst. This dataset comprises long-form narrative texts collected from the PG-19 validation set and the Archive of Our Own (AO3) website, and aims to evaluate the capabilities of LRLMs when handling complex chapter transitions such as parallel narratives and cliffhanger endings. The dataset is constructed via automatic detection of chapter boundaries, including various types of chapter transitions that require models to process global context for comprehension. The primary application scenarios of CHAPTERBREAK focus on evaluating and improving the performance of LRLMs in long-text processing, particularly in understanding long-range dependencies and global narrative structures.

提供机构：

马萨诸塞大学阿默斯特分校

创建时间：

2022-04-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集