five

SSD_PLATE (Sub-Slot Dialogue dataset license plate number domain)

收藏
OpenDataLab2026-05-31 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/SSD_PLATE
下载链接
链接失效反馈
官方服务:
资源简介:
可以通过对话中的多轮交互逐段提供槽值,尤其是对于一些重要信息,例如电话号码和姓名。这是日常生活中普遍存在的现象,但在以往的工作中却很少关注。为了填补这一空白,本文定义了一个名为基于子槽的面向任务的对话(SSTOD)的新任务,并构建了一个中文对话数据集SSD,以促进对SSTOD的研究。该数据集包括来自四个不同领域的总共 40K 对话和 500K 话语:中文姓名、电话号码、身份证号码和车牌号码。数据用子槽值、槽值、对话状态和动作进行了很好的注释。我们在 SSTOD 中发现了一些新的语言现象和交互方式,这对为任务构建对话代理提出了严峻挑战。我们在 SSTOD 上测试了三个最先进的对话模型,发现它们无法在四个域中的任何一个域上很好地处理任务。我们还通过以插件方式涉及槽知识来研究改进的模型。应该做更多的工作来应对在现实生活中广泛存在的 SSTOD 提出的新挑战。

Multi-turn interactions in conversations can be used to provide slot values segment by segment, especially for critical information such as phone numbers and names. This is a widespread phenomenon in daily life, but it has received little attention in previous research. To fill this gap, this paper defines a novel task named Sub-slot-based Task-oriented Dialogue (SSTOD), and constructs a Chinese dialogue dataset SSD to promote research on SSTOD. The dataset consists of a total of 40K conversations and 500K utterances across four distinct domains: Chinese names, phone numbers, ID card numbers, and license plate numbers. The data is thoroughly annotated with sub-slot values, slot values, dialogue states, and dialogue acts. We discover several new linguistic phenomena and interaction patterns in SSTOD, which pose significant challenges for building task-oriented dialogue agents. We test three state-of-the-art dialogue models on SSTOD and find that they fail to handle the task well across all four domains. We also investigate improved models by incorporating slot knowledge in a plugin-based manner. More work is needed to address the new challenges posed by SSTOD, which is prevalent in real-world scenarios.
提供机构:
OpenDataLab
创建时间:
2022-09-01
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
SSD_PLATE是一个中文对话数据集,专注于子槽面向任务的对话(SSTOD),涵盖车牌号码等四个领域,包含40K对话和500K话语,用于支持多轮交互中逐段提供槽值的研究。该数据集旨在解决以往工作中较少关注的现实语言现象,为对话代理的改进提供基础。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作