Malay Speech Data by Mobile Phone - 370 Hours
收藏catalogue.elra.info2025-03-22 收录
下载链接:
https://catalogue.elra.info/en-us/repository/browse/ELRA-S0438/
下载链接
链接失效反馈官方服务:
资源简介:
675 Malaysians native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.Format:16kHz,16bit, uncompressed wav, mono channelRecording environment:quiet indoor environment, low background noise, without echoRecording content (read speech):oral category; human-machine interaction category; smart home command and in-car command category; numbers; news category;Demographics:675 speakers totally, with 44% male and 56% female; and 66% speakers of all are in the age group of 18-25,32% speakers of all are in the age group of 26-45, 5% speakers of all are in the age group of 46-60, with a floating rate of 2%Device:Android mobile phone, iPhoneLanguage:MalayApplication scenarios:speech recognition; voiceprint recognition
本数据集由675位马来语母语者参与录制,语音具有地道口音。录音脚本由语言学家精心设计,涵盖包括通用、交互式、车载和家庭在内的广泛主题。文本经过人工校对,确保了高精度。该数据集与主流的安卓和苹果系统手机兼容。数据集适用于自动语音识别和机器翻译场景。录音格式为16kHz,16位,未压缩的WAV格式,单声道。录音环境为安静室内环境,背景噪音低,无回声。录音内容包括口语类别、人机交互类别、智能家居指令和车载指令类别、数字、新闻类别等。人口统计学方面,共有675位演讲者,其中男性占44%,女性占56%;所有演讲者中,66%处于18-25岁年龄段,32%处于26-45岁年龄段,5%处于46-60岁年龄段,其余2%分布不均。设备为安卓手机和iPhone,语言为马来语,应用场景包括语音识别和声纹识别。
提供机构:
ELRA Catalogue of Language Resources



