five

Webis EditorialSum Corpus 2020

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4105764
下载链接
链接失效反馈
官方服务:
资源简介:
The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness. The files are organized as follows: corpus.csv - Contains all the editorials and their acquired summaries Note: (X = [1,5] for five summaries) - article_id : Article ID in the corpus - title : Title of the editorial - article_text : Plain text of the editorial - summary_{X}_text : Plain text of the corresponding summary - thesis_{X}_text : Plain text of the thesis from the corresponding summary - lead : top 15% of the editorial's segments - body : segments between lead and conclusion sections - conclusion : bottom 15% of the editorial's segments - article_segments: Collection of paragraphs, each further divided into collection of segments containing:  { "number": segment order in the editorial,    "text" : segment text,    "label": ADU type  } - summary_{X}_segments: Collection of summary segments containing: { "number": segment order in the editorial,   "text" : segment text,   "adu_label": ADU type from the editorial,   "summary_label": can be 'thesis' or 'justification' } quality-groups.csv - Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial For example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality. The summary texts can be obtained from corpus.csv respectively.
创建时间:
2020-10-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作