SUMMSCREEN
收藏arXiv2022-06-07 更新2024-06-21 收录
下载链接:
https://github.com/mingdachen/SummScreen
下载链接
链接失效反馈官方服务:
资源简介:
SUMMSCREEN是一个包含电视剧剧本转录和人工编写的摘要对的数据集。该数据集挑战性在于剧本中的情节细节通常通过角色对话间接表达,并散布在整个转录中。数据集旨在通过整合这些细节来形成简洁的情节描述,同时处理剧本中与主线无关的内容,如角色发展和幽默元素。此外,数据集还提出了两种以实体为中心的评估指标,用于评估生成的情节摘要的质量。SUMMSCREEN适用于训练和评估抽象摘要模型,特别是在处理长篇叙事文本和多角色对话方面。
SUMMSCREEN is a dataset comprising pairs of television series script transcripts and human-written summaries. The key challenge of this dataset is that plot details within the scripts are typically indirectly expressed via character dialogues and dispersed throughout the entire transcripts. The dataset aims to integrate these details to generate concise plot descriptions, while also handling content irrelevant to the main storyline in the scripts, such as character development and humorous elements. Furthermore, the dataset proposes two entity-centric evaluation metrics for assessing the quality of generated plot summaries. SUMMSCREEN is suitable for training and evaluating abstractive summarization models, particularly for scenarios involving long-form narrative texts and multi-character dialogues.
提供机构:
丰田技术学院芝加哥分校
创建时间:
2021-04-15



