five

BookSum Dataset

收藏
paperswithcode.com2025-03-24 收录
下载链接:
https://paperswithcode.com/dataset/booksum
下载链接
链接失效反馈
官方服务:
资源简介:
BookSum is a collection of datasets for long-form narrative summarization. This dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of this dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. BookSum contains summaries for 142,753 paragraphs, 12,293 chapters and 436 books.

BookSum为一套旨在进行长篇叙事摘要的语料库。该数据集涵盖了文学领域的源文档,例如小说、戏剧和故事,并包含了在不同粒度级别上,随着难度递增的极具抽象性的、由人类撰写的摘要,具体包括段落级、章节级和全书级。此数据集的领域和结构为摘要系统提出了独特的挑战,包括处理极长文档、复杂的因果和时序依赖关系,以及丰富的语篇结构。BookSum包含142,753个段落的摘要、12,293个章节的摘要以及436本书的摘要。
提供机构:
paperswithcode.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作