A Large-Scale Corpus for Conversation Disentanglement

Name: A Large-Scale Corpus for Conversation Disentanglement
Creator: 密歇根大学
Published: 2019-07-19 02:14:53
License: 暂无描述

arXiv2019-07-19 更新2024-06-21 收录

下载链接：

https://irclogs.ubuntu.com/

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集名为‘A Large-Scale Corpus for Conversation Disentanglement’，由密歇根大学创建，包含77,563条消息，是首个包含上下文信息和注释裁决的大型对话解缠数据集。数据集采样自2004年至2018年间173个时间点的技术支持频道，涵盖多样的话题和发言者，且集中在单一领域。创建过程中，通过三轮试点注释和讨论制定了注释指南，并使用SLATE工具进行注释。该数据集主要用于开发对话解缠的稳健数据驱动方法，有助于推动对话研究的发展，特别是在多参与者同步在线对话的理解和分析上。

This dataset, named "A Large-Scale Corpus for Conversation Disentanglement", was created by the University of Michigan. It contains 77,563 messages, and is the first large-scale conversation disentanglement corpus that includes contextual information and annotation adjudication. The corpus is sampled from 173 time-stamped technical support channels spanning 2004 to 2018, covering diverse topics and speakers while focusing on a single domain. During its development, annotation guidelines were formulated through three rounds of pilot annotation and discussion, and annotations were conducted using the SLATE tool. This dataset is primarily intended to develop robust data-driven approaches for conversation disentanglement, and it helps advance the progress of conversation research, especially in the understanding and analysis of synchronous online conversations involving multiple participants.

提供机构：

密歇根大学

创建时间：

2018-10-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集