MASSIVE
收藏arXiv2022-06-18 更新2024-06-21 收录
下载链接:
https://github.com/alexa/massive
下载链接
链接失效反馈官方服务:
资源简介:
MASSIVE是由亚马逊创建的多语言自然语言理解数据集,包含100万个真实、并行、标注的虚拟助手语句,涵盖51种语言、18个领域、60种意图和55个槽位。该数据集通过将仅有的英语SLURP数据集本地化为50种语言而创建,旨在解决多语言NLU模型的训练和评估数据不足的问题,特别是在任务相关和每种语言自然性方面的数据。MASSIVE的应用领域包括多语言NLU模型的训练和评估,以及推动低资源语言的研究。
MASSIVE is a multilingual natural language understanding (NLU) dataset created by Amazon. It contains 1 million real, parallel, annotated virtual assistant utterances, covering 51 languages, 18 domains, 60 intents and 55 slots. The dataset was developed by localizing the only English SLURP dataset into 50 other languages, aiming to address the shortage of training and evaluation data for multilingual NLU models, especially in terms of task relevance and the naturalness of each language. Applications of MASSIVE include training and evaluating multilingual NLU models, as well as promoting research on low-resource languages.
提供机构:
亚马逊
创建时间:
2022-04-19



