five

Multilingual intent recognition datasets

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/mj5d7y7kmh
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset contains Amharic, Affan oromo and english uttrance textual datasets with respective Intent type category class "Abstract The spoken language understanding is an emerging field between speech processing and language understanding from the human spoken utterance. We can apply in different sectors for representing the meaning of a human spoken utterance in different domains. The analysis and identification of the speech intent in the use of language for the speaker motivation in political speech candidates are important factors for the willingness of the speaker and the audience. The study used the XLM-RoBERTa Base model, which was trained on annotated speech in English, Amharic, and Affan Oromo. The methodology involved several key steps: using subword-level tokenization, splitting the dataset fairly, methodically preprocessing the data, and fine-tuning the model to classify different intents. The results showed that the model performed well across all three languages, with training demonstrating consistent learning trends. Ultimately, the method successfully solved issues related to uneven data distribution and language differences, despite minor errors occurring with intents that had similar meanings. A 95% accuracy value and weighted F1 score would be attained, the findings demonstrate how transformer-based models can be used as trustworthy baselines for intent recognition when there are several languages and little resources available. Future work will focus on incorporating more regional languages, applying data augmentation strategies, and examining ensemble designs in order to further enhance model generality and durability."
创建时间:
2025-10-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作