airline-customersupport-Hinglish-audio
收藏魔搭社区2026-01-06 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/Kratos-AI/airline-customersupport-Hinglish-audio
下载链接
链接失效反馈官方服务:
资源简介:
# Airline Customer Support Hinglish Audio Dataset
*This dataset contains intentionally low-quality (“B-grade”) data. It has been curated to include noisy, imperfect, or otherwise suboptimal samples for the purpose of testing model robustness and performance under degraded input conditions
**Text spoken by all participants:**
"Meri flight delay ho gayi, next flight mein seat book kar sakte hain? Main airport par phasa hoon aur jaldi destination pahunchna chahta hoon."
The dataset supports training and evaluation of models in:
- Automatic Speech Recognition (ASR)
- Emotional tone classification
- Voice synthesis and generation
- Emotion-aware conversational agents
---
## Intended Uses
### ✅ Direct Use
- Training and benchmarking ASR models with Indian-accented English
- Emotion detection and classification from voice
- Research in affective computing and empathetic AI
### ❌ Out-of-Scope Use
- Real-time or production-grade systems
- Commercial use without proper CC BY 4.0 attribution
- Clinical or diagnostic use cases
---
## Considerations and Limitations
- ❗ The dataset is small (<1,000 samples) and not fully representative of India's linguistic and emotional diversity
- 💡 Emotions are subjective — classification results may vary by listener or model
- 🔄 Future versions will aim to expand multilingual support and speaker diversity
---
## License
**CC BY 4.0** — You can use, modify, and share the dataset with appropriate credit.
---
## Contact
- For queries or collaborations related to datasets, contact at :
- anoushka@kgen.io
- abhishek.vadapalli@kgen.io
---
# 航空客服印地语英语混合语(Hinglish)音频数据集
*本数据集采用故意降级的("B级")数据构建,精心遴选收录含噪声、存在缺陷或其他次优样本,用于测试模型在劣化输入条件下的鲁棒性与性能表现。*
**所有参与者的发言文本:**
“我的航班延误了,能否预订下一班航班的座位?我被困在机场,迫切希望尽快抵达目的地。”
本数据集可用于支撑以下场景的模型训练与评估:
- 自动语音识别(Automatic Speech Recognition, ASR)
- 情绪语调分类
- 语音合成与生成
- 情绪感知型对话AI智能体(AI Agent)
---
## 预期用途
### ✅ 直接适用场景
- 针对印度口音英语的自动语音识别模型训练与基准测试
- 从语音中进行情绪检测与分类
- 情感计算与共情式AI相关研究
### ❌ 越界使用场景
- 实时或生产级系统
- 未遵循CC BY 4.0协议进行合理署名的商业使用
- 临床或诊断类应用场景
---
## 注意事项与局限性
- ❗ 本数据集规模较小(样本量不足1000条),未能完全覆盖印度的语言与情绪多样性
- 💡 情绪标注具有主观性——分类结果可能因标注者或模型不同而存在差异
- 🔄 未来版本将致力于拓展多语言支持与说话人多样性
---
## 授权协议
**CC BY 4.0** — 您可在进行适当署名的前提下使用、修改与共享本数据集。
---
## 联系方式
- 若有数据集相关咨询或合作需求,请联系:
- anoushka@kgen.io
- abhishek.vadapalli@kgen.io
---
提供机构:
maas
创建时间:
2025-08-01
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个航空客户支持领域的印式英语音频集合,包含故意低质量的样本,旨在测试模型在降级输入条件下的鲁棒性。它支持自动语音识别和情感分类等任务,但样本量有限且情感标注具有主观性,采用CC BY 4.0许可证。
以上内容由遇见数据集搜集并总结生成



