five

bloomberg/entsum

收藏
Hugging Face2022-05-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/bloomberg/entsum
下载链接
链接失效反馈
官方服务:
资源简介:
# Title EntSUM: A Data Set for Entity-Centric Extractive Summarization # Author list Mounica Maddela*, Mayank Kulkarni*, Daniel Preotiuc-Pietro # Description Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document. We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control. We conduct an extensive quantitative analysis to motivate the task of entity-centric summarization and show that existing methods for controllable summarization fail to generate entity-centric summaries. We propose extensions to state-of-the-art summarization approaches that achieve substantially better results on our data set. Our analysis and results show the challenging nature of this task and of the proposed data set. As a part of this zip file, we release the EntSum dataset on which the evaluations are performed. There are three json files, namely, one summary annotation, two summary annotations and a combination of both. Each file contains the document ID from the NYT corpus, the sentence IDs, the summary(s), the salient sentences and summary sentence corresponding to the sentence IDs. Obtaining the source text can be done by downloading the original NYT corpus and mapping the document IDs. The annotation process and pre-processing details are described extensively in the research paper. # Language English # Keywords Natural Language Processing, Summarization, Abstractive Summarization, Extractive Summarization # Related identifiers NYT – is the source that this data set is derived from - https://doi.org/10.35111/77ba-9x74, License (LDC) https://catalog.ldc.upenn.edu/LDC2008T19 # Citation ``` @inproceedings{maddela-etal-2022-entsum, title = "{E}nt{SUM}: A Data Set for Entity-Centric Extractive Summarization", author = "Maddela, Mounica and Kulkarni, Mayank and Preotiuc-Pietro, Daniel", booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.acl-long.237", pages = "3355--3366", abstract = "Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document.We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control.We conduct an extensive quantitative analysis to motivate the task of entity-centric summarization and show that existing methods for controllable summarization fail to generate entity-centric summaries. We propose extensions to state-of-the-art summarization approaches that achieve substantially better results on our data set. Our analysis and results show the challenging nature of this task and of the proposed data set.", } ```
提供机构:
bloomberg
原始信息汇总

数据集概述

数据集名称

EntSUM: A Data Set for Entity-Centric Extractive Summarization

作者

Mounica Maddela*, Mayank Kulkarni*, Daniel Preotiuc-Pietro

描述

EntSUM是一个专注于实体控制的可控摘要数据集。该数据集旨在根据用户指定的实体方面和偏好生成摘要,与传统的单一通用摘要不同。数据集包含三个JSON文件,分别对应不同的摘要标注类型。每个文件包含文档ID、句子ID、摘要、重要句子和摘要对应的句子ID。数据集的源文本可通过下载NYT语料库并映射文档ID获得。

语言

英语

关键词

自然语言处理, 摘要, 抽象摘要, 提取摘要

相关标识符

数据集源自NYT语料库,相关许可证信息可在LDC网站获取。

引用信息

@inproceedings{maddela-etal-2022-entsum, title = "{E}nt{SUM}: A Data Set for Entity-Centric Extractive Summarization", author = "Maddela, Mounica and Kulkarni, Mayank and Preotiuc-Pietro, Daniel", booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.acl-long.237", pages = "3355--3366", abstract = "Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document.We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control.We conduct an extensive quantitative analysis to motivate the task of entity-centric summarization and show that existing methods for controllable summarization fail to generate entity-centric summaries. We propose extensions to state-of-the-art summarization approaches that achieve substantially better results on our data set. Our analysis and results show the challenging nature of this task and of the proposed data set.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作