数据堂—759小时印地语手机采集语音数据
收藏魔搭社区2025-07-31 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/DatatangBeijing/759Hours-HindiSpeechDataByMobilePhone
下载链接
链接失效反馈官方服务:
资源简介:
759小时印地语手机采集语音数据由1,425名印度本土发音人参与录制,口音正宗;录音文本由语言专家参与设计,涵盖通用、交互、车载、家居等多类别,内容丰富;文本经过人工校对,准确率高;匹配主流安卓、苹果系统手机。759小时印地语手机采集语音数据可应用于语音识别、机器翻译、声纹识别。
This 759-hour Hindi speech dataset collected via mobile devices was recorded by 1,425 native Indian speakers, featuring authentic accents. The accompanying transcript texts were designed by linguistic experts, covering a wide range of categories including general scenarios, interactive scenarios, in-vehicle scenarios and smart home scenarios, with rich and comprehensive content. All transcript texts have undergone manual proofreading to ensure high accuracy. This dataset is compatible with mainstream Android and Apple iOS mobile devices, and can be applied to tasks such as speech recognition, machine translation and voiceprint recognition.
提供机构:
maas
创建时间:
2024-05-06
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含759小时通过手机采集的印地语语音数据,由1,425名印度母语者录制,内容涵盖通用、交互、车载和家庭等多种场景,适用于语音识别、机器翻译和声纹识别等任务。数据格式为16kHz、16位、单声道WAV,录制于安静室内环境。
以上内容由遇见数据集搜集并总结生成



