वाक् सञ्चयः (/Vāksañcayaḥ/)
收藏arXiv2021-07-23 更新2024-07-18 收录
下载链接:
https://www.cse.iitb.ac.in/~asr/;https://github.com/cyfer0618/Vaksanca
下载链接
链接失效反馈官方服务:
资源简介:
数据集वाक् सञ्चयः (/Vāksañcayaḥ/)由印度理工学院孟买分校等机构创建,旨在解决梵语自动语音识别(ASR)的挑战。该数据集包含超过78小时、约46,000句的语音数据,涵盖从公元前1500年到现代的三个时期,涉及哲学、文学等多个领域。数据收集自27位不同母语的演讲者,确保了语言的多样性和真实性。创建过程中,特别注意了梵语特有的语音和语法特点,如Sandhi规则的应用。该数据集不仅适用于梵语ASR研究,还为其他印度语言的ASR系统提供了宝贵的参考和扩展基础。
The dataset वाक् सञ्चयः (/Vāksañcayaḥ/) was developed by institutions including the Indian Institute of Technology Bombay, aiming to address the challenges of Sanskrit automatic speech recognition (ASR). This dataset contains over 78 hours of speech data with approximately 46,000 utterances, covering three historical periods ranging from 1500 BCE to the modern era, and involving multiple domains such as philosophy and literature. The data was collected from 27 native speakers, ensuring linguistic diversity and authenticity. During its development, special attention was paid to the unique phonetic and grammatical features of Sanskrit, such as the application of Sandhi rules. This dataset is not only applicable to Sanskrit ASR research, but also provides valuable references and expansion foundations for ASR systems of other Indian languages.
提供机构:
印度理工学院孟买分校, 印度; 剑桥大学, 英国; 印度理工学院卡拉格普尔分校, 印度
创建时间:
2021-06-03



