RiSAWOZ
收藏魔搭社区2026-01-07 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/RiSAWOZ
下载链接
链接失效反馈官方服务:
资源简介:
displayName: RiSAWOZ
labelTypes:
- SemanticSegMap
- Chinese Corpus
license:
- CC BY-NC 4.0
mediaTypes:
- Text
paperUrl: https://arxiv.org/pdf/2010.08738v1.pdf
publishDate: "2020"
publishUrl: https://terryqj0107.github.io/RiSAWOZ_webpage/
publisher:
- Soochow University
- Tianjin University
tags:
- Annotation
taskTypes:
- Natural Language Generation
- Slot Filling
- Dialogue State Tracking
---
# 数据集介绍
## 简介
为了缓解多领域数据的短缺并为面向任务的对话建模捕获话语现象,我们提出了 RiSAWOZ,这是一个具有丰富语义注释的大型多领域中文绿野仙踪数据集。 RiSAWOZ 包含 11.2K 人对人 (H2H) 多轮语义注释对话,超过 150K 话语跨越 12 个域,比以前所有带注释的 H2H 对话数据集都要大。单域对话和多域对话都构建,分别占65%和35%。每个对话都带有全面的对话注释,包括自然语言描述形式的对话目标、领域、对话状态以及用户和系统方面的行为。除了传统的对话注释外,我们还特别提供了对话中话语现象的语言注释,例如省略号和共指,这对于对话共指和省略号解析任务很有用。除了完全注释的数据集外,我们还详细描述了数据集的数据收集过程、统计和分析。报告了一系列基准模型和结果,包括自然语言理解(意图检测和槽填充)、对话状态跟踪和对话上下文到文本生成,以及共指和省略号解析,有助于未来研究的基线比较在这个语料库上。
## 引文
```
"@article{quan2020risawoz,
title={Risawoz: A large-scale multi-domain wizard-of-oz dataset with rich semantic annotations for task-oriented dialogue modeling},
author={Quan, Jun and Zhang, Shian and Cao, Qian and Li, Zizhong and Xiong, Deyi},
journal={arXiv preprint arXiv:2010.08738},
year={2020}
}"
```
## Download dataset
:modelscope-code[]{type="git"}
displayName: RiSAWOZ
labelTypes:
- SemanticSegMap
- Chinese Corpus
license:
- CC BY-NC 4.0
mediaTypes:
- Text
paperUrl: https://arxiv.org/pdf/2010.08738v1.pdf
publishDate: "2020"
publishUrl: https://terryqj0107.github.io/RiSAWOZ_webpage/
publisher:
- Soochow University
- Tianjin University
tags:
- Annotation
taskTypes:
- Natural Language Generation
- Slot Filling
- Dialogue State Tracking
---
# Dataset Introduction
## Introduction
To alleviate the shortage of multi-domain data and capture conversational phenomena for task-oriented dialogue modeling, we propose RiSAWOZ, a large-scale multi-domain Chinese Wizard-of-Oz dataset with rich semantic annotations. RiSAWOZ contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with over 150K utterances spanning 12 domains, which is larger than all previously annotated H2H dialogue datasets. Both single-domain and multi-domain dialogues are constructed, accounting for 65% and 35% of the total respectively. Each dialogue is equipped with comprehensive dialogue annotations, including dialogue goals, domains, dialogue states, and behaviors of both user and system sides in the form of natural language descriptions. In addition to traditional dialogue annotations, we also specifically provide linguistic annotations for conversational phenomena in dialogues, such as ellipsis and coreference, which are useful for tasks like dialogue coreference resolution and ellipsis resolution. Apart from the fully annotated dataset, we also elaborate on the data collection process, statistics and analysis of the dataset. A series of benchmark models and results are reported, including natural language understanding (intent detection and slot filling), dialogue state tracking, dialogue context-to-text generation, as well as coreference and ellipsis resolution, which facilitate baseline comparison for future research on this corpus.
## Citation
"@article{quan2020risawoz,
title={Risawoz: A large-scale multi-domain wizard-of-oz dataset with rich semantic annotations for task-oriented dialogue modeling},
author={Quan, Jun and Zhang, Shian and Cao, Qian and Li, Zizhong and Xiong, Deyi},
journal={arXiv preprint arXiv:2010.08738},
year={2020}
}"
## Download Dataset
:modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-03



