IndoToD
收藏arXiv2023-11-02 更新2024-06-21 收录
下载链接:
https://github.com/dehanalkautsar/IndoToD
下载链接
链接失效反馈官方服务:
资源简介:
IndoToD是一个针对印尼语的多领域端到端任务导向对话系统基准数据集。该数据集通过扩展两个英文ToD数据集至印尼语,涵盖了四个不同领域,通过去词汇化有效减少了标注的大小。数据集由母语为印尼语的标注者手动翻译,确保高质量的数据收集。IndoToD不仅可用于评估印尼语和英语的ToD系统,还可探索跨语言和双语迁移学习方法的潜在优势。数据集的应用领域包括餐厅搜索、公共交通、车载辅助等,旨在解决多语言环境下对话系统的理解和交互问题。
IndoToD is a benchmark dataset for Indonesian-language, multi-domain end-to-end task-oriented dialogue systems. This dataset is extended from two English ToD datasets into Indonesian, covers four distinct domains, and effectively reduces annotation scale via delexicalization. The dataset was manually translated by native Indonesian annotators to ensure high-quality data collection. IndoToD can not only be used to evaluate both Indonesian and English ToD systems, but also explore the potential advantages of cross-lingual and bilingual transfer learning methods. The application scenarios of the dataset include restaurant search, public transportation, in-vehicle assistance and others, aiming to address the understanding and interaction issues of dialogue systems in multilingual environments.
提供机构:
万隆理工学院
创建时间:
2023-11-02



