five

SubSumE Dataset

收藏
paperswithcode.com2025-03-23 收录
下载链接:
https://paperswithcode.com/dataset/subsume
下载链接
链接失效反馈
官方服务:
资源简介:
SubSumE Dataset This repository contains the SubSumE dataset for subjective document summarization. See the paper and the talk for details on dataset creation. Also check out our work SuDocu on example-based document summarization. Dataset Files Download the dataset from here. The dataset contains : Simplified text from 48 Wikipedia pages of the states in the US. Additionally, all the sentences in these documents are put together in a single file processed_state_sentences.csv and are assigned a unique sentence id that is used in summary json files. Intent-based summaries created by human annotators. Each datapoint file in the directory user_summary_jsons contains a json containing summaries of Wikipedia pages of eight states with following keys: intent : Summarization intent provided to human annotators for generating the summary summaries: List of summary jsons for eight states assigned to the annotator. Each json in the list contains following keys: state_name: Name of the state sentence_ids: Global ids of sentences (wrt processed_state_sentences.csv) present in the summary sentences: List of sentences present in the summary use_keywords: Keywords used by the annotator to search the document when creating summaries

本仓库包含主观文档摘要的SubSumE数据集。详情请参阅相关论文及讲座,了解数据集的创建过程。此外,可参考我们的SuDocu项目,该项目基于示例进行文档摘要。 数据集包含: - 美国各州48个维基百科页面的简化文本。此外,这些文档中的所有句子均汇总至单一文件processed_state_sentences.csv中,并为每个句子分配了一个唯一的句子ID,该ID用于摘要的JSON文件中。 - 由人工标注员根据意图创建的摘要。 用户摘要JSONs目录中的每个数据点文件均包含一个JSON文件,其中包含八个维基百科页面的摘要,具有以下键: - intent:提供给人工标注员以生成摘要的摘要意图。 - summaries:分配给标注员的八个州摘要JSON列表。列表中的每个JSON包含以下键: - state_name:州名。 - sentence_ids:摘要中存在的句子(相对于processed_state_sentences.csv)的全局ID。 - sentences:摘要中存在的句子列表。 - use_keywords:标注员在创建摘要时用于搜索文档的关键词。
提供机构:
paperswithcode.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作