DialogSum
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/DialogSum
下载链接
链接失效反馈官方服务:
资源简介:
DialogSum 是一个大规模的对话摘要数据集,由 13,460 个对话以及相应的手动标记的摘要和主题组成。我们从三个公共对话语料库中收集 DialogSum 的对话数据,即 Dailydialog (Li et al., 2017)、DREAM (Sun et al., ., 2019) 和 MuTual (Cui et al., 2019),以及一个英语口语练习网站。这些数据集包含面对面的口语对话,涵盖广泛的日常生活主题,包括学校教育、工作、药物、购物、休闲、旅行。大多数对话发生在朋友、同事之间以及服务提供商和客户之间。_x000D_
_x000D_
与之前的数据集相比,DialogSum 的对话有明显的特点:_x000D_
_x000D_
丰富的现实生活场景下,包括更多样化的面向任务的场景;_x000D_
有清晰的沟通模式和意图,作为总结来源很有价值;_x000D_
有一个合理的长度,可以满足自动摘要的目的。
DialogSum is a large-scale dialogue summarization dataset consisting of 13,460 dialogues along with their manually annotated summaries and topics. We collected the dialogue data for DialogSum from three public dialogue corpora, namely Dailydialog (Li et al., 2017), DREAM (Sun et al., 2019) and MuTual (Cui et al., 2019), as well as an English oral practice website. These corpora contain face-to-face spoken dialogues covering a wide range of daily life topics, including school education, work, medicine, shopping, leisure and travel. Most of the dialogues occur between friends, colleagues, as well as between service providers and customers.
Compared with previous datasets, the dialogues in DialogSum have distinct characteristics:
Rich real-life scenarios, including more diverse task-oriented scenarios;
Clear communication patterns and intentions, making them valuable as summarization sources;
Reasonable length that meets the requirements of automatic summarization tasks.
提供机构:
OpenDataLab
创建时间:
2022-06-23
搜集汇总
数据集介绍

背景与挑战
背景概述
DialogSum是一个大规模对话摘要数据集,包含13,460个对话及其手动标记的摘要和主题,数据来源于多个公共对话语料库和英语口语练习网站,覆盖日常生活多样主题。该数据集特点在于丰富的现实生活场景和清晰沟通模式,适用于自动摘要和对话生成研究,由爱丁堡大学等机构于2021年发布。
以上内容由遇见数据集搜集并总结生成



