SSD

arXiv2025-09-30 收录

下载链接：

https://github.com/shunjiu/SSTOD

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为SSD，包含了4万个对话和50万个语句，主要涉及四个领域：中文名字、电话号码、身份证号码和车牌号码。数据集对子槽值、槽值、对话状态以及动作进行了注释。为避免隐私问题，该数据集采用了半自动化的构建方法，并包含了多种对话动作和转换概率。规模上，该数据集拥有4万对话和50万语句，任务是面向子槽的基于任务的对话（Sstod）。

The dataset named SSD consists of 40,000 dialogues and 500,000 utterances across four core domains: Chinese names, telephone numbers, ID card numbers, and license plate numbers. It includes annotations for sub-slot values, slot values, dialogue states, and dialogue acts. To mitigate privacy concerns, the dataset was built using a semi-automated method, and encompasses a variety of dialogue acts and transition probabilities. The corresponding task for this dataset is slot-oriented task-oriented dialogue (SSTOD).

5,000+

优质数据集

54 个

任务类型

进入经典数据集