Cross-lingual Outline-based Dialogue (COD)
收藏arXiv2022-02-01 更新2024-06-21 收录
下载链接:
https://github.com/cambridgeltl/COD
下载链接
链接失效反馈官方服务:
资源简介:
Cross-lingual Outline-based Dialogue (COD) 数据集由剑桥大学创建,旨在通过基于大纲的标注过程生成多语言任务导向对话数据集。该数据集支持阿拉伯语、印尼语、俄语和斯瓦希里语四种语言的自然语言理解、对话状态跟踪和端到端对话建模及评估。COD数据集通过将特定领域的抽象对话模式映射到自然语言大纲,指导目标语言标注者在编写对话时提供每个回合的意图和槽位信息,从而确保对话的自然性和文化特定性。该数据集的应用领域包括但不限于金融服务、旅行规划和医疗咨询,旨在解决多语言环境下对话系统的性能评估和优化问题。
The Cross-lingual Outline-based Dialogue (COD) dataset was developed by the University of Cambridge, with the objective of creating a multilingual task-oriented dialogue dataset through an outline-based annotation process. This dataset supports natural language understanding, dialogue state tracking, end-to-end dialogue modeling and evaluation for four languages: Arabic, Indonesian, Russian and Swahili. By mapping domain-specific abstract dialogue patterns to natural language outlines, the COD dataset instructs annotators in target languages to provide intent and slot information for each dialogue turn when drafting dialogues, thereby ensuring the naturalness and cultural specificity of the resulting conversations. Application fields of this dataset include but are not limited to financial services, travel planning and medical consultation, with the aim of addressing the performance evaluation and optimization challenges of dialogue systems in multilingual environments.
提供机构:
剑桥大学
创建时间:
2022-02-01



