Scicom-intl/Malaysian-Emilia
收藏Hugging Face2026-02-13 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Scicom-intl/Malaysian-Emilia
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: audio_length_ratio_text
features:
- name: audio_filename
dtype: string
- name: audio_filename_trim
dtype: string
- name: audio_length
dtype: float64
- name: text
dtype: string
- name: audio_length_ratio_text
dtype: float64
- name: audio_length_ratio_text_accept
dtype: bool
splits:
- name: train
num_bytes: 993579111
num_examples: 1572014
download_size: 194482310
dataset_size: 993579111
- config_name: default
features:
- name: reference_audio
dtype: string
- name: reference_text
dtype: string
- name: target_audio
dtype: string
- name: target_text
dtype: string
splits:
- name: train
num_bytes: 7285025442
num_examples: 8664602
download_size: 817749796
dataset_size: 7285025442
- config_name: dialects_v1
features:
- name: text
dtype: string
- name: start
dtype: float64
- name: end
dtype: float64
- name: speaker
dtype: string
- name: language
dtype: string
- name: dnsmos
dtype: float64
- name: audio_filename
dtype: string
- name: folder
dtype: string
splits:
- name: train
num_bytes: 1421096802
num_examples: 2946861
download_size: 390102357
dataset_size: 1421096802
- config_name: dialects_v1_permutation_sample
features:
- name: reference_audio
dtype: string
- name: reference_text
dtype: string
- name: target_audio
dtype: string
- name: target_text
dtype: string
splits:
- name: train
num_bytes: 3554683417
num_examples: 4963410
download_size: 664980435
dataset_size: 3554683417
- config_name: klasik
features:
- name: text
dtype: string
- name: start
dtype: float64
- name: end
dtype: float64
- name: speaker
dtype: string
- name: language
dtype: string
- name: dnsmos
dtype: float64
- name: audio_filename
dtype: string
- name: folder
dtype: string
splits:
- name: train
num_bytes: 4320146
num_examples: 10369
download_size: 1283151
dataset_size: 4320146
- config_name: dialects_v1_audio_length_ratio_text
features:
- name: audio_filename
dtype: string
- name: audio_filename_trim
dtype: string
- name: audio_length
dtype: float64
- name: text
dtype: string
- name: audio_length_ratio_text
dtype: float64
- name: audio_length_ratio_text_accept
dtype: bool
splits:
- name: train
num_bytes: 1603936949
num_examples: 2946861
download_size: 362944016
dataset_size: 1603936949
configs:
- config_name: audio_length_ratio_text
data_files:
- split: train
path: audio_length_ratio_text/train-*
- config_name: default
data_files:
- split: train
path: data/train-*
- config_name: dialects_v1
data_files:
- split: train
path: dialects_v1/train-*
- config_name: dialects_v1_permutation_sample
data_files:
- split: train
path: dialects_v1_permutation_sample/train-*
- config_name: klasik
data_files:
- split: train
path: klasik/train-*
- config_name: dialects_v1_audio_length_ratio_text
data_files:
- split: train
path: dialects_v1_audio_length_ratio_text/train-*
---
# Malaysian Emilia
Gather Malaysian Emilia from,
1. https://huggingface.co/datasets/mesolitica/Malaysian-Emilia-v2
2. https://huggingface.co/datasets/Scicom-intl/Malaysian-Chinese-Emilia
3. https://huggingface.co/datasets/mesolitica/Malaysian-Emilia#malaysian-dialect
And do,
1. Trim silent.
2. Permutation for Voice Conversion include post-filtering during permutation.
3. Convert to Neucodec speech tokens.
提供机构:
Scicom-intl



