medical-symptoms-english-audio
收藏魔搭社区2025-10-03 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/Kratos-AI/medical-symptoms-english-audio
下载链接
链接失效反馈官方服务:
资源简介:
# Medical Symptoms English Audio Dataset
*This dataset contains intentionally low-quality (“B-grade”) data. It has been curated to include noisy, imperfect, or otherwise suboptimal samples for the purpose of testing model robustness and performance under degraded input conditions
**Text spoken by all participants:**
"Doctor, I'm constantly tired, like a heavy fog I can't shake. Sharp headaches hit, worse at night, and sleep is tough. I get dizzy, and my stomach feels uneasy after meals. I'm really worried it’s serious. Please help me figure out what's wrong."
The dataset supports training and evaluation of models in:
- Automatic Speech Recognition (ASR)
- Emotional tone classification
- Voice synthesis and generation
- Emotion-aware conversational agents
---
## Intended Uses
### ✅ Direct Use
- Training and benchmarking ASR models with Indian-accented English
- Emotion detection and classification from voice
- Research in affective computing and empathetic AI
### ❌ Out-of-Scope Use
- Real-time or production-grade systems
- Commercial use without proper CC BY 4.0 attribution
- Clinical or diagnostic use cases
---
## Considerations and Limitations
- ❗ The dataset is small (<1,000 samples) and not fully representative of India's linguistic and emotional diversity
- 💡 Emotions are subjective — classification results may vary by listener or model
- 🔄 Future versions will aim to expand multilingual support and speaker diversity
---
## License
**CC BY 4.0** — You can use, modify, and share the dataset with appropriate credit.
---
## Contact
- For queries or collaborations related to datasets, contact at :
- anoushka@kgen.io
- abhishek.vadapalli@kgen.io
---
# 医学症状英语音频数据集
*本数据集包含故意制作的低质量(“B级”)数据。其甄选纳入了带有噪声、存在瑕疵或其他欠佳表现的样本,旨在测试模型在退化输入条件下的鲁棒性与性能表现。
**所有参与者的朗读文本:**
“医生,我总是感到疲惫不堪,仿佛被一团挥之不去的浓雾笼罩。夜间会出现剧烈头痛,且睡眠困难。我时常感到眩晕,餐后胃部也会不适。我十分担心病情严重,恳请您帮我查明病因。”
本数据集可用于以下场景下的模型训练与评估:
- 自动语音识别(Automatic Speech Recognition, ASR)
- 情感语调分类
- 语音合成与生成
- 情感感知对话AI智能体(AI Agent)
---
## 预期用途
### ✅ 直接使用场景
- 针对印度口音英语的自动语音识别模型训练与基准测试
- 从语音中开展情感检测与分类
- 情感计算与共情式人工智能领域的研究
### ❌ 不适用场景
- 实时或生产级系统
- 未遵循CC BY 4.0协议进行署名的商业使用
- 临床或诊断相关场景
---
## 注意事项与局限性
- ❗ 本数据集规模较小(样本量不足1000条),未能完全涵盖印度的语言与情感多样性
- 💡 情感具有主观性——分类结果可能因听众或模型不同而存在差异
- 🔄 未来版本将致力于拓展多语言支持与说话人多样性
---
## 许可证
**CC BY 4.0** — 您可在标注适当来源的前提下使用、修改与共享本数据集。
---
## 联系方式
- 若有关于本数据集的咨询或合作需求,请联系:
- anoushka@kgen.io
- abhishek.vadapalli@kgen.io
提供机构:
maas
创建时间:
2025-08-01
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含低质量的英语音频,所有参与者都陈述相同的医疗症状文本,旨在用于ASR、情感分类等模型的训练和评估。它规模较小,适用于研究目的,但不适合生产或临床使用,遵循CC BY 4.0许可证。
以上内容由遇见数据集搜集并总结生成



