mesolitica/semisupervised-audiobook
收藏Hugging Face2024-01-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mesolitica/semisupervised-audiobook
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ms
task_categories:
- automatic-speech-recognition
- text-to-speech
---
# Pseudolabel Youtube Malay audiobooks using Whisper Large V3
Notebooks at https://github.com/mesolitica/malaysian-dataset/tree/master/speech-to-text-semisupervised/youtube-audiobook
1. Split based on 10 seconds utterances using WebRTC VAD.
## how-to
Download files,
```bash
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-5secs-noisy.tar.gz
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-noisy.tar.gz
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-5secs-noisy.tar.gz
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-noisy.tar.gz
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-5secs-noisy.tar.gz
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-noisy.tar.gz
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part1.json
wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part2.json
```
提供机构:
mesolitica
原始信息汇总
数据集概述
数据集名称
Pseudolabel Youtube Malay audiobooks using Whisper Large V3
语言
- 马来语 (ms)
任务类别
- 自动语音识别
- 文本到语音
数据集内容
- 包含多个音频文件,分别命名为:
- bukan-kerana-aku-5secs-noisy
- bukan-kerana-aku-noisy
- harry-potter-5secs-noisy
- harry-potter-noisy
- teme-5secs-noisy
- teme-noisy
- 包含两个JSON文件,分别命名为:
- semisupervised-audiobook-part1.json
- semisupervised-audiobook-part2.json
数据集下载
- 数据集文件可通过以下链接下载:
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-5secs-noisy.tar.gz
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-noisy.tar.gz
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-5secs-noisy.tar.gz
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-noisy.tar.gz
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-5secs-noisy.tar.gz
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-noisy.tar.gz
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part1.json
- https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part2.json



