five

STAR

收藏
arXiv2020-10-23 更新2024-06-21 收录
下载链接:
https://github.com/RasaHQ/STAR
下载链接
链接失效反馈
官方服务:
资源简介:
STAR数据集是由语言技术研究所创建,包含5820个面向任务的对话,总计127,833条语句和知识库查询,涵盖13个领域。该数据集特别设计用于促进任务和领域间的转移学习。数据收集采用可扩展的众包模式,确保数据质量。STAR数据集不仅用于测试模型在已知任务上的表现,还用于评估模型在未知任务和领域上的零样本泛化能力。此外,数据集还支持多种其他任务,如知识库查询预测和领域外检测,为对话系统研究提供了丰富的资源。

The STAR dataset was created by the Language Technology Institute. It contains 5,820 task-oriented dialogues, totaling 127,833 utterances and knowledge base queries, covering 13 domains. This dataset is specifically designed to facilitate transfer learning across tasks and domains. The data was collected using a scalable crowdsourcing model to ensure data quality. The STAR dataset is not only used to test model performance on known tasks, but also to evaluate the zero-shot generalization ability of models on unseen tasks and domains. Additionally, the dataset supports multiple other tasks such as knowledge base query prediction and out-of-domain detection, providing a rich resource for dialogue system research.
提供机构:
语言技术研究所
创建时间:
2020-10-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作