Webis EditorialSum Corpus 2020

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4105764

下载链接

链接失效反馈

官方服务：

资源简介：

The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness. The files are organized as follows: corpus.csv - Contains all the editorials and their acquired summaries Note: (X = [1,5] for five summaries) - article_id : Article ID in the corpus - title : Title of the editorial - article_text : Plain text of the editorial - summary_{X}_text : Plain text of the corresponding summary - thesis_{X}_text : Plain text of the thesis from the corresponding summary - lead : top 15% of the editorial's segments - body : segments between lead and conclusion sections - conclusion : bottom 15% of the editorial's segments - article_segments: Collection of paragraphs, each further divided into collection of segments containing: { "number": segment order in the editorial, "text" : segment text, "label": ADU type } - summary_{X}_segments: Collection of summary segments containing: { "number": segment order in the editorial, "text" : segment text, "adu_label": ADU type from the editorial, "summary_label": can be 'thesis' or 'justification' } quality-groups.csv - Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial For example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality. The summary texts can be obtained from corpus.csv respectively.

创建时间：

2020-10-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集