ConvLab/woz

Name: ConvLab/woz
Creator: ConvLab
Published: 2022-11-25 09:17:30
License: 暂无描述

Hugging Face2022-11-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ConvLab/woz

下载链接

链接失效反馈

官方服务：

资源简介：

WOZ 2.0数据集是一个用于对话系统的英语单语数据集，主要应用于自然语言理解（NLU）、对话状态跟踪（DST）和端到端（E2E）任务。数据集包含1200个对话，分为训练、验证和测试集。数据集的领域设置为餐厅，主要变化包括规范化状态和对话行为中的分类槽值，忽略请求意图，并使用简单的字符串匹配来找到非分类槽的值范围。数据集的注释包括用户对话行为和状态。

The WOZ 2.0 dataset is an English monolingual dialogue system dataset, primarily utilized for natural language understanding (NLU), dialogue state tracking (DST), and end-to-end (E2E) tasks. It consists of 1,200 dialogues split into training, validation, and test sets. The dataset is focused on the restaurant domain, with key modifications including normalizing categorical slot values in dialogue states and dialogue acts, omitting request intents, and using simple string matching to identify the value ranges of non-categorical slots. Annotations for the dataset include user dialogue acts and dialogue states.

提供机构：

ConvLab

原始信息汇总

数据集概述

基本信息

名称: WOZ 2.0
语言: 英语
许可证: Apache-2.0
多语言性: 单语种
大小: 1K<n<10K
任务类别: 对话式

数据集详情

领域: 餐厅
数据转换:
- 原始数据下载后，通过运行python preprocess.py进行转换。
- 转换主要变化包括：设置领域为餐厅，标准化分类槽位的值，忽略belief_states中的request意图，使用简单字符串匹配非分类槽位的值。
注释: 包含用户对话行为和状态。

支持的任务

NLU（自然语言理解）
DST（对话状态跟踪）
E2E（端到端）

数据分割

分割	对话数	话语数	平均话语数	平均令牌数	平均领域数	分类槽匹配(状态)	分类槽匹配(目标)	分类槽匹配(对话行为)	非分类槽跨度(对话行为)
训练	600	4472	7.45	11.37	1	100	-	100	96.56
验证	200	1460	7.3	11.28	1	100	-	100	95.52
测试	400	2892	7.23	11.49	1	100	-	100	94.83
全部	1200	8824	7.35	11.39	1	100	-	100	95.83

许可证

类型: Apache License, Version 2.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集