STAR

Name: STAR
Creator: 语言技术研究所
Published: 2020-10-23 00:45:00
License: 暂无描述

arXiv2020-10-23 更新2024-06-21 收录

下载链接：

https://github.com/RasaHQ/STAR

下载链接

链接失效反馈

官方服务：

资源简介：

STAR数据集是由语言技术研究所创建，包含5820个面向任务的对话，总计127,833条语句和知识库查询，涵盖13个领域。该数据集特别设计用于促进任务和领域间的转移学习。数据收集采用可扩展的众包模式，确保数据质量。STAR数据集不仅用于测试模型在已知任务上的表现，还用于评估模型在未知任务和领域上的零样本泛化能力。此外，数据集还支持多种其他任务，如知识库查询预测和领域外检测，为对话系统研究提供了丰富的资源。

The STAR dataset was created by the Language Technology Institute. It contains 5,820 task-oriented dialogues, totaling 127,833 utterances and knowledge base queries, covering 13 domains. This dataset is specifically designed to facilitate transfer learning across tasks and domains. The data was collected using a scalable crowdsourcing model to ensure data quality. The STAR dataset is not only used to test model performance on known tasks, but also to evaluate the zero-shot generalization ability of models on unseen tasks and domains. Additionally, the dataset supports multiple other tasks such as knowledge base query prediction and out-of-domain detection, providing a rich resource for dialogue system research.

提供机构：

语言技术研究所

创建时间：

2020-10-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集