five

Swecha Telugu ASR Dataset

收藏
India Data2026-02-20 更新2026-05-16 收录
下载链接:
https://india-data.org/dataset-details/6fc7cd5a-7e0e-4235-9e55-3c33739677db
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has been collected through diverse community-driven initiatives by Swecha, aimed at building a comprehensive Telugu ASR corpus: Gonthuka Activity – A participatory voice data collection initiative where volunteers read curated Telugu sentences, ensuring diverse pronunciation coverage. Telugu Corpus Collection – Crowdsourced voice recordings from various dialects and regions, digitally preserving linguistic diversity and cultural heritage for future generations. Storytelling & Conversational Data – Natural speech samples from folktales, real-life conversations, and storytelling sessions to enhance spontaneous speech recognition. Public Voice Submissions – Contributions from volunteers across Telangana and Andhra Pradesh, incorporating diverse speech variations, accents, and real-world noise conditions. This dataset plays a crucial role in developing robust Telugu ASR models, promoting language accessibility, and advancing open-source AI initiatives.
提供机构:
Natural Language Processing (NLP)
创建时间:
2025-02-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作