five

mesolitica/semisupervised-audiobook

收藏
Hugging Face2024-01-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mesolitica/semisupervised-audiobook
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ms task_categories: - automatic-speech-recognition - text-to-speech --- # Pseudolabel Youtube Malay audiobooks using Whisper Large V3 Notebooks at https://github.com/mesolitica/malaysian-dataset/tree/master/speech-to-text-semisupervised/youtube-audiobook 1. Split based on 10 seconds utterances using WebRTC VAD. ## how-to Download files, ```bash wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-5secs-noisy.tar.gz wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-noisy.tar.gz wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-5secs-noisy.tar.gz wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-noisy.tar.gz wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-5secs-noisy.tar.gz wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-noisy.tar.gz wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part1.json wget https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part2.json ```
提供机构:
mesolitica
原始信息汇总

数据集概述

数据集名称

Pseudolabel Youtube Malay audiobooks using Whisper Large V3

语言

  • 马来语 (ms)

任务类别

  • 自动语音识别
  • 文本到语音

数据集内容

  • 包含多个音频文件,分别命名为:
    • bukan-kerana-aku-5secs-noisy
    • bukan-kerana-aku-noisy
    • harry-potter-5secs-noisy
    • harry-potter-noisy
    • teme-5secs-noisy
    • teme-noisy
  • 包含两个JSON文件,分别命名为:
    • semisupervised-audiobook-part1.json
    • semisupervised-audiobook-part2.json

数据集下载

  • 数据集文件可通过以下链接下载:
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-5secs-noisy.tar.gz
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/bukan-kerana-aku-noisy.tar.gz
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-5secs-noisy.tar.gz
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/harry-potter-noisy.tar.gz
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-5secs-noisy.tar.gz
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/teme-noisy.tar.gz
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part1.json
    • https://huggingface.co/datasets/mesolitica/semisupervised-audiobook/resolve/main/semisupervised-audiobook-part2.json
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作