Malaysian-TTS-v2
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/mesolitica/Malaysian-TTS-v2
下载链接
链接失效反馈官方服务:
资源简介:
# Malaysian TTS v2
Generate Malay and localize English for TTS dataset, currently only support 2 speakers, `husein` and `idayu`, where total audio is 4642.77 hours.
## How to prepare the dataset
```bash
huggingface-cli download \
mesolitica/Malaysian-TTS-v2 \
--include "all-*.zip" \
--repo-type "dataset" \
--local-dir './'
huggingface-cli download \
mesolitica/STT-Normalizer \
--include "*husein*.zip" \
--exclude "*force*" \
--repo-type "dataset" \
--local-dir './'
huggingface-cli download \
mesolitica/STT-Normalizer \
--include "*idayu*.zip" \
--exclude "*force*" \
--repo-type "dataset" \
--local-dir './'
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
```
### Chunk based, optional
```bash
huggingface-cli download \
mesolitica/Malaysian-TTS-v2 \
--include "tts-filtered-chunk-rows-audio-dedup-*.zip" \
--repo-type "dataset" \
--local-dir './'
wget https://huggingface.co/datasets/mesolitica/Malaysian-TTS-v2/resolve/main/chunk.parquet
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
```
## Acknowledgement
Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
# 马来西亚文本转语音v2(Malaysian TTS v2)
本数据集面向文本转语音(Text-to-Speech, TTS)任务,用于生成马来语文本与本地化英语文本,目前仅支持`husein`与`idayu`两位发音人,总音频时长达4642.77小时。
## 数据集准备流程
bash
huggingface-cli download
mesolitica/Malaysian-TTS-v2
--include "all-*.zip"
--repo-type "dataset"
--local-dir './'
huggingface-cli download
mesolitica/STT-Normalizer
--include "*husein*.zip"
--exclude "*force*"
--repo-type "dataset"
--local-dir './'
huggingface-cli download
mesolitica/STT-Normalizer
--include "*idayu*.zip"
--exclude "*force*"
--repo-type "dataset"
--local-dir './'
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
### 基于分块的可选流程
bash
huggingface-cli download
mesolitica/Malaysian-TTS-v2
--include "tts-filtered-chunk-rows-audio-dedup-*.zip"
--repo-type "dataset"
--local-dir './'
wget https://huggingface.co/datasets/mesolitica/Malaysian-TTS-v2/resolve/main/chunk.parquet
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
## 致谢
特别感谢https://www.sns.com.my与英伟达(Nvidia)为本项目提供的8×H100节点算力支持!
提供机构:
maas
创建时间:
2025-10-03



