X-RiSAWOZ
收藏arXiv2023-06-30 更新2024-06-21 收录
下载链接:
https://github.com/stanford-oval/dialogues
下载链接
链接失效反馈官方服务:
资源简介:
X-RiSAWOZ是一个多语言、大规模、高质量的任务导向对话数据集,由斯坦福大学计算机科学系创建。该数据集通过将中文RiSAWOZ数据集翻译成英语、法语、印地语、韩语和英印混合语言而生成,每种语言包含超过18,000条经过人工验证的对话语句。X-RiSAWOZ覆盖了12个领域,比之前的MultiWOZ数据集更广泛,旨在为构建全功能的对话代理提供一个端到端的基准。数据集的创建过程中,采用了混合实体对齐技术,结合神经网络和基于字典的方法,以及多种自动化和半自动化的验证检查,以确保数据质量。X-RiSAWOZ的应用领域包括任务导向对话系统的研究和开发,特别是在零样本或少样本学习环境中,旨在解决跨语言对话系统的高成本和时间消耗问题。
X-RiSAWOZ is a multilingual, large-scale, high-quality task-oriented dialogue dataset created by the Department of Computer Science at Stanford University. It is generated by translating the original Chinese RiSAWOZ dataset into English, French, Hindi, Korean, and Hinglish, with each language version containing over 18,000 manually verified dialogue utterances. X-RiSAWOZ covers 12 domains, offering a broader scope than the prior MultiWOZ dataset, and aims to provide an end-to-end benchmark for building fully functional dialogue agents. During the dataset creation process, hybrid entity alignment techniques combining neural network and dictionary-based methods, as well as multiple automated and semi-automated validation checks, were adopted to ensure data quality. The application fields of X-RiSAWOZ include research and development of task-oriented dialogue systems, especially in zero-shot or few-shot learning environments, with the goal of addressing the high cost and time consumption issues of cross-lingual dialogue systems.
提供机构:
斯坦福大学计算机科学系
创建时间:
2023-06-30



