STAR
收藏arXiv2020-10-23 更新2024-06-21 收录
下载链接:
https://github.com/RasaHQ/STAR
下载链接
链接失效反馈官方服务:
资源简介:
STAR数据集是由语言技术研究所创建,包含5820个面向任务的对话,总计127,833条语句和知识库查询,涵盖13个领域。该数据集特别设计用于促进任务和领域间的转移学习。数据收集采用可扩展的众包模式,确保数据质量。STAR数据集不仅用于测试模型在已知任务上的表现,还用于评估模型在未知任务和领域上的零样本泛化能力。此外,数据集还支持多种其他任务,如知识库查询预测和领域外检测,为对话系统研究提供了丰富的资源。
The STAR dataset was created by the Language Technology Institute. It contains 5,820 task-oriented dialogues, totaling 127,833 utterances and knowledge base queries, covering 13 domains. This dataset is specifically designed to facilitate transfer learning across tasks and domains. The data was collected using a scalable crowdsourcing model to ensure data quality. The STAR dataset is not only used to test model performance on known tasks, but also to evaluate the zero-shot generalization ability of models on unseen tasks and domains. Additionally, the dataset supports multiple other tasks such as knowledge base query prediction and out-of-domain detection, providing a rich resource for dialogue system research.
提供机构:
语言技术研究所
创建时间:
2020-10-23



