SSD
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/shunjiu/SSTOD
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为SSD,包含了4万个对话和50万个语句,主要涉及四个领域:中文名字、电话号码、身份证号码和车牌号码。数据集对子槽值、槽值、对话状态以及动作进行了注释。为避免隐私问题,该数据集采用了半自动化的构建方法,并包含了多种对话动作和转换概率。规模上,该数据集拥有4万对话和50万语句,任务是面向子槽的基于任务的对话(Sstod)。
The dataset named SSD consists of 40,000 dialogues and 500,000 utterances across four core domains: Chinese names, telephone numbers, ID card numbers, and license plate numbers. It includes annotations for sub-slot values, slot values, dialogue states, and dialogue acts. To mitigate privacy concerns, the dataset was built using a semi-automated method, and encompasses a variety of dialogue acts and transition probabilities. The corresponding task for this dataset is slot-oriented task-oriented dialogue (SSTOD).



