ConvLab/sgd4

Name: ConvLab/sgd4
Creator: ConvLab
Published: 2022-11-25 08:51:35
License: 暂无描述

Hugging Face2022-11-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ConvLab/sgd4

下载链接

链接失效反馈

官方服务：

资源简介：

Schema-Guided Dialogue (SGD)数据集包含超过20,000条注释的多领域、任务导向的人机对话，涉及20个领域的服务和API交互，如银行、事件、媒体、日历、旅行和天气等。该数据集包含多个不同API，许多API具有重叠功能但接口不同，反映了现实世界的常见场景。SGD-X数据集包含原始SGD数据集中每个模式的五个语言变体，这些变体由数百名付费众包工作者编写。SGD-X目录中，v1代表最接近原始模式的变体，v5代表语言距离最远的变体。该数据集可用于意图预测、槽填充、对话状态跟踪、策略模仿学习、语言生成和用户模拟学习等任务。

The Schema-Guided Dialogue (SGD) dataset consists of over 20,000 annotated multi-domain task-oriented human-machine dialogues involving services and API interactions across 20 domains, such as banking, events, media, calendar, travel, weather, and others. This dataset features multiple distinct APIs, many of which have overlapping functionalities but differing interfaces, mirroring common real-world scenarios. The SGD-X dataset includes five linguistic variants for every schema in the original SGD dataset, which were created by hundreds of paid crowd workers. Within the SGD-X collection, v1 denotes the variant closest to the original schema, while v5 represents the variant with the greatest linguistic divergence from the original. This dataset can be applied to tasks including intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, and user simulation learning, among others.

提供机构：

ConvLab

原始信息汇总

数据集概述

数据集名称： SGD-X v4

语言： 英语

许可： CC BY-SA 4.0

数据集大小： 10K<n<100K

任务类别： 对话式

数据集内容

Schema-Guided Dialogue (SGD) 数据集： 包含超过20,000个多领域、任务导向的人机对话，涉及20个领域，如银行、事件、媒体、日历、旅行和天气等。

SGD-X 数据集： 包含原始SGD数据集中每个模式的5种语言变体，由数百名付费众包工作者编写。SGD-X目录中，v1代表最接近原始模式，v5代表语言距离最远。

数据集使用

数据转换： 使用generate_sgdx_dialogues.py脚本将对话转换为SGD-X模式。

主要转换变更：

将原始act更改为intent。
为每个领域添加count槽，非分类，通过文本匹配查找范围。
根据intent对对话行为进行分类。
使用|连接多个值。
保留active_intent、requested_slots、service_call。

支持的任务

NLU（自然语言理解）
DST（对话状态跟踪）
Policy（策略）
NLG（自然语言生成）
E2E（端到端）

数据分割

分割	对话数	话语数	平均话语数	平均令牌数	平均领域数	分类槽匹配(状态)	分类槽匹配(目标)	分类槽匹配(对话行为)	非分类槽范围(对话行为)
训练	16142	329964	20.44	9.75	1.84	100	-	100	100
验证	2482	48726	19.63	9.66	1.84	100	-	100	100
测试	4201	84594	20.14	10.4	2.02	100	-	100	100
全部	22825	463284	20.3	9.86	1.87	100	-	100	100

引用信息

@inproceedings{lee2022sgd, title={SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems}, author={Lee, Harrison and Gupta, Raghav and Rastogi, Abhinav and Cao, Yuan and Zhang, Bin and Wu, Yonghui}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={36}, number={10}, pages={10938--10946}, year={2022} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集