five

Malaysian-TTS-v2

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/mesolitica/Malaysian-TTS-v2
下载链接
链接失效反馈
官方服务:
资源简介:
# Malaysian TTS v2 Generate Malay and localize English for TTS dataset, currently only support 2 speakers, `husein` and `idayu`, where total audio is 4642.77 hours. ## How to prepare the dataset ```bash huggingface-cli download \ mesolitica/Malaysian-TTS-v2 \ --include "all-*.zip" \ --repo-type "dataset" \ --local-dir './' huggingface-cli download \ mesolitica/STT-Normalizer \ --include "*husein*.zip" \ --exclude "*force*" \ --repo-type "dataset" \ --local-dir './' huggingface-cli download \ mesolitica/STT-Normalizer \ --include "*idayu*.zip" \ --exclude "*force*" \ --repo-type "dataset" \ --local-dir './' wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py ``` ### Chunk based, optional ```bash huggingface-cli download \ mesolitica/Malaysian-TTS-v2 \ --include "tts-filtered-chunk-rows-audio-dedup-*.zip" \ --repo-type "dataset" \ --local-dir './' wget https://huggingface.co/datasets/mesolitica/Malaysian-TTS-v2/resolve/main/chunk.parquet wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py ``` ## Acknowledgement Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!

# 马来西亚文本转语音v2(Malaysian TTS v2) 本数据集面向文本转语音(Text-to-Speech, TTS)任务,用于生成马来语文本与本地化英语文本,目前仅支持`husein`与`idayu`两位发音人,总音频时长达4642.77小时。 ## 数据集准备流程 bash huggingface-cli download mesolitica/Malaysian-TTS-v2 --include "all-*.zip" --repo-type "dataset" --local-dir './' huggingface-cli download mesolitica/STT-Normalizer --include "*husein*.zip" --exclude "*force*" --repo-type "dataset" --local-dir './' huggingface-cli download mesolitica/STT-Normalizer --include "*idayu*.zip" --exclude "*force*" --repo-type "dataset" --local-dir './' wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py ### 基于分块的可选流程 bash huggingface-cli download mesolitica/Malaysian-TTS-v2 --include "tts-filtered-chunk-rows-audio-dedup-*.zip" --repo-type "dataset" --local-dir './' wget https://huggingface.co/datasets/mesolitica/Malaysian-TTS-v2/resolve/main/chunk.parquet wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py ## 致谢 特别感谢https://www.sns.com.my与英伟达(Nvidia)为本项目提供的8×H100节点算力支持!
提供机构:
maas
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作