Kathbath
收藏arXiv2022-12-15 更新2024-06-21 收录
下载链接:
https://github.com/AI4Bharat/indicSUPERB
下载链接
链接失效反馈官方服务:
资源简介:
Kathbath数据集由印度理工学院马德拉斯分校和AI4Bharat共同创建,包含1684小时的标注语音数据,覆盖12种印度语言。数据来源于1218名分布在印度203个地区的贡献者,通过精心设计的收集和验证流程确保数据质量。该数据集主要用于支持自动语音识别、说话人验证、语言识别等多种语音理解任务的研究,旨在推动印度语言的语音技术发展。
The Kathbath dataset was co-created by the Indian Institute of Technology Madras and AI4Bharat. It contains 1,684 hours of annotated speech data covering 12 Indian languages. The data was sourced from 1,218 contributors distributed across 203 regions in India, and its quality is ensured through a meticulously designed collection and validation workflow. This dataset is primarily used to support research on various speech understanding tasks such as automatic speech recognition, speaker verification, and language identification, aiming to advance the development of speech technologies for Indian languages.
提供机构:
印度理工学院马德拉斯分校
创建时间:
2022-08-25



