AfriWOZ
收藏arXiv2022-05-19 更新2024-07-24 收录
下载链接:
https://huggingface.co/tosin
下载链接
链接失效反馈官方服务:
资源简介:
AfriWOZ是为促进非洲低资源语言的对话生成而创建的数据集,包含Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda和Yorùbá六种语言,每种语言有1500个对话轮次。数据集通过翻译MultiWOZ数据集的部分内容生成,涵盖酒店、餐厅、出租车和预订等多个领域。创建过程中,使用了人工翻译和机器翻译加人工审核的方法,确保翻译质量。AfriWOZ适用于开放领域对话模型的训练和评估,旨在解决非洲语言在自然语言处理领域的代表性不足问题。
AfriWOZ is a dataset developed to facilitate conversational generation for low-resource African languages. It encompasses six languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda, and Yorùbá, with 1,500 dialogue turns for each language. This dataset is constructed by translating subsets of the MultiWOZ dataset, covering diverse domains such as hotels, restaurants, taxi services, and booking-related scenarios. During the creation process, both manual translation and machine translation coupled with manual verification were utilized to guarantee translation quality. AfriWOZ is suitable for training and evaluating open-domain dialogue models, and its core objective is to address the underrepresentation of African languages in the field of natural language processing (NLP).
提供机构:
Luleå大学技术/Masakhane/CIS
创建时间:
2022-04-18



