mesolitica/Malaysian-Emilia-annotated
收藏Hugging Face2025-03-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/mesolitica/Malaysian-Emilia-annotated
下载链接
链接失效反馈官方服务:
资源简介:
Malaysian Emilia Annotated数据集是一个包含马来西亚和新加坡的YouTube视频、播客和议会录音的语音数据集。数据集经过性别预测、语言预测、采样率调整等处理,并生成了合成描述。数据集的语言主要为马来语和英语。具体内容包括:1. 马来西亚YouTube视频,总计3168.8小时,经过24k和44k采样率处理,并包含性别和语言预测;2. 马来西亚播客,总计622.8小时,同样经过24k和44k采样率处理;3. 新加坡播客,总计175.9小时,经过24k和44k采样率处理;4. 马来西亚议会录音,总计2317.9小时,经过24k和44k采样率处理。
The Malaysian Emilia Annotated dataset is created by annotating the Malaysian-Emilia dataset using the Data-Speech pipeline. This dataset includes audio data from Malaysian YouTube, Malaysian Podcast, Singaporean Podcast, and Malaysia Parliament, with total durations of 3168.8 hours, 622.8 hours, 175.9 hours, and 2317.9 hours respectively. The dataset undergoes gender prediction and language prediction, and is processed at 24k and 44k sampling rates. Each audio file is accompanied by a detailed synthetic description, including transcription, gender, country, pitch, speaking rate, noise level, and other relevant information.
提供机构:
mesolitica
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



