数据堂—639小时印度尼西亚语手机采集语音数据
收藏魔搭社区2025-11-26 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/DatatangBeijing/639Hours-IndonesianSpeechDataByMobilePhone
下载链接
链接失效反馈官方服务:
资源简介:
639小时印度尼西亚语手机采集语音数据是由1285名印尼本地发音人参与录制。口音正宗。录音文本由语言专家参与设计,涵盖通用、交互、车载、家居等多类别,内容丰富。文本经过人工校对,准确率高。匹配主流安卓、苹果系统手机。639小时印度尼西亚语手机采集语音数据可应用于语音识别、机器翻译等场景
This 639-hour Indonesian speech dataset collected via mobile devices was recorded by 1,285 local Indonesian speakers, boasting authentic regional accents. The accompanying transcript texts were developed in collaboration with linguists, covering a wide range of categories including general scenarios, conversational interactions, in-vehicle environments, smart home scenarios and more, with comprehensive and diverse content. All transcript texts have undergone manual proofreading, ensuring high accuracy. The dataset is compatible with mainstream Android and Apple iOS mobile devices, and is applicable to scenarios such as speech recognition and machine translation.
提供机构:
maas
创建时间:
2024-05-06
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含639小时的印尼语手机采集语音,由1,285名母语者录制,覆盖通用、交互、车载和家庭等多种场景,音频格式为16kHz、16bit的WAV单声道。它主要用于印尼语语音识别模型的测试任务,内容经过人工校对以确保高准确性。
以上内容由遇见数据集搜集并总结生成



