five

RiSAWOZ

收藏
arXiv2020-10-17 更新2024-06-21 收录
下载链接:
https://github.com/terryqj0107/RiSAWOZ
下载链接
链接失效反馈
官方服务:
资源简介:
RiSAWOZ是由苏州大学计算机科学与技术学院和天津大学智能与计算学部合作创建的大规模中文多领域Wizard-of-Oz数据集。该数据集包含11,200个多轮对话,总计超过150,000个话语,覆盖12个领域,如餐饮、住宿、交通等。数据集的创建过程包括数据库和本体构建、目标生成、对话收集和两轮标注。RiSAWOZ不仅包含传统的对话标注,还特别提供了对话中的语篇现象的语义标注,如省略和指代,这些对于对话中的指代和省略解析任务非常有用。该数据集适用于工业场景中的实际应用以及研究领域适应、少/零样本学习等任务,旨在解决多领域对话建模中的挑战。

RiSAWOZ is a large-scale Chinese multi-domain Wizard-of-Oz dataset co-created by the School of Computer Science and Technology, Soochow University and the College of Intelligence and Computing, Tianjin University. This dataset contains 11,200 multi-turn dialogues with over 150,000 utterances in total, covering 12 domains such as catering, accommodation, transportation and so on. The construction process of the dataset includes database and ontology construction, goal generation, dialogue collection and two-round annotation. In addition to traditional dialogue annotations, RiSAWOZ also specifically provides semantic annotations of discourse phenomena in dialogues, such as ellipsis and reference, which are highly useful for tasks like reference resolution and ellipsis resolution in conversational contexts. This dataset is applicable to practical applications in industrial scenarios and research tasks such as domain adaptation, few-shot/zero-shot learning, aiming to address the challenges in multi-domain dialogue modeling.
提供机构:
苏州大学计算机科学与技术学院
创建时间:
2020-10-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作