five

Kratos-AI/Natural-ASR-Samples

收藏
Hugging Face2025-12-03 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/Kratos-AI/Natural-ASR-Samples
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 pretty_name: Natural ASR Speech Dataset tags: - speech - asr - automatic-speech-recognition - hinglish - tamil - bengali - conversational-ai - multilingual - audio-dataset task_categories: - automatic-speech-recognition size_categories: - n<1K --- # Natural ASR Speech Samples This set contains natural two-person conversations recorded in Hinglish, Tamil, and Bengali, paired with high-quality human-generated transcriptions. The recordings capture spontaneous, real-life dialogue including pauses, fillers, overlaps, and informal phrasing, making it ideal for building robust Automatic Speech Recognition (ASR) systems. Also we can support all indic languages and accents across India + Arabic + Korean + Vietnamese + Portuguese --- ## Dataset Features - Natural, unscripted two-speaker conversations - Hinglish, Tamil, and Bengali multilingual coverage - Varied speaking speeds, tones, and regional accents - Clean and accurate human transcriptions - Includes pauses, interruptions, and conversational flow - Suitable for research and commercial ASR development with attribution --- ## Intended Uses ### ✅ Direct Use - Training multilingual and code-mixed ASR models - Benchmarking conversational ASR performance - Hinglish language modeling - Accent-robust ASR system development - Dialogue understanding and speech-to-text tasks - Evaluation of spontaneous-speech ASR accuracy ### ❌ Out-of-Scope Use - Speaker or biometric identification - Psychological, emotion, or behavior profiling - Medical or clinical speech analysis - Commercial deployment without CC BY 4.0 credit - Real-time mission-critical ASR applications --- ## Considerations and Limitations - ❗ Dataset size is limited (<1,000 samples) and may not include all dialects - 🎧 Contains fillers, hesitations, and overlapping conversation (true natural speech) - 🗣️ Accent diversity exists but is not fully representative of all regions - 🔄 Future versions will add more speakers, languages, and recording environments --- ## License **CC BY 4.0** — Free to use, modify, distribute, and publish with attribution. --- ## Contact For dataset collaboration, contribution, or citation details, contact: - arunabh@kgen.io
提供机构:
Kratos-AI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作