Swecha Telugu ASR Dataset
收藏India Data2026-02-20 更新2026-05-16 收录
下载链接:
https://india-data.org/dataset-details/6fc7cd5a-7e0e-4235-9e55-3c33739677db
下载链接
链接失效反馈官方服务:
资源简介:
This dataset has been collected through diverse community-driven initiatives by Swecha, aimed at building a comprehensive Telugu ASR corpus: Gonthuka Activity – A participatory voice data collection initiative where volunteers read curated Telugu sentences, ensuring diverse pronunciation coverage. Telugu Corpus Collection – Crowdsourced voice recordings from various dialects and regions, digitally preserving linguistic diversity and cultural heritage for future generations. Storytelling & Conversational Data – Natural speech samples from folktales, real-life conversations, and storytelling sessions to enhance spontaneous speech recognition. Public Voice Submissions – Contributions from volunteers across Telangana and Andhra Pradesh, incorporating diverse speech variations, accents, and real-world noise conditions. This dataset plays a crucial role in developing robust Telugu ASR models, promoting language accessibility, and advancing open-source AI initiatives.
提供机构:
Natural Language Processing (NLP)
创建时间:
2025-02-14



