Bangla and Sylheti Intent Detection and Slot Filling Dataset
收藏arXiv2023-10-17 更新2024-06-21 收录
下载链接:
https://github.com/mushfiqur11/banglasylheti-snips.git
下载链接
链接失效反馈官方服务:
资源简介:
本研究首次推出了针对正式孟加拉语、口语孟加拉语和口语锡尔赫特语的意图检测和槽填充的综合数据集,总计包含984个样本,涵盖10种独特意图。数据集源自SNIPS数据集的328个英文样本,经过手动校正和多语言转换,确保了数据的质量和多样性。该数据集的创建过程严谨,涉及多名标注者的独立工作、交叉验证和第三方审查,以确保数据的准确性和可靠性。此数据集主要用于支持家庭助手中的自然语言理解模型,特别是在处理低资源语言的意图检测和槽填充任务中,旨在提升这些语言环境下的人机交互体验。
This study introduces, for the first time, a comprehensive dataset for intent detection and slot filling across formal Bengali, spoken Bengali, and spoken Sylheti. It contains a total of 984 samples covering 10 distinct intents. Derived from 328 English samples sourced from the SNIPS dataset, the dataset underwent manual correction and multilingual conversion to ensure its quality and diversity. The dataset was developed via a rigorous workflow involving independent work from multiple annotators, cross-validation, and third-party review to guarantee its accuracy and reliability. Primarily designed to support natural language understanding models for smart home assistants, this dataset specifically targets intent detection and slot filling tasks for low-resource languages, aiming to improve human-computer interaction experiences in these linguistic environments.
提供机构:
乔治梅森大学计算机科学系
创建时间:
2023-10-17



