five

amanuelbyte/merged_speech_dataset

收藏
Hugging Face2026-02-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/amanuelbyte/merged_speech_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: amharic features: - name: audio dtype: audio: sampling_rate: 16000 - name: source dtype: string splits: - name: train num_bytes: 5984412487 num_examples: 10839 download_size: 5262240245 dataset_size: 5984412487 - config_name: arabic features: - name: audio dtype: audio: sampling_rate: 16000 - name: source dtype: string splits: - name: train num_bytes: 922902417 num_examples: 1159 download_size: 920293901 dataset_size: 922902417 - config_name: default features: - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: source dtype: string - name: text dtype: string splits: - name: train num_bytes: 13098597804 num_examples: 46583 download_size: 12034412106 dataset_size: 13098597804 - config_name: hausa features: - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: source dtype: string splits: - name: train num_bytes: 602620189 num_examples: 1572 download_size: 602412677 dataset_size: 602620189 - config_name: lingala features: - name: audio dtype: audio: sampling_rate: 16000 - name: source dtype: string splits: - name: train num_bytes: 4143017698 num_examples: 14400 download_size: 3805028034 dataset_size: 4143017698 - config_name: somali features: - name: audio dtype: audio: sampling_rate: 16000 - name: source dtype: string splits: - name: train num_bytes: 724935288 num_examples: 17164 download_size: 723399205 dataset_size: 724935288 - config_name: swahili features: - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: source dtype: string splits: - name: train num_bytes: 8763983905 num_examples: 412435 download_size: 5012049012 dataset_size: 8763983905 - config_name: wolof features: - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: source dtype: string splits: - name: train num_bytes: 4275421272 num_examples: 36009 download_size: 4100478830 dataset_size: 4275421272 - config_name: yoruba features: - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: source dtype: string splits: - name: train num_bytes: 25749381711 num_examples: 1648471 download_size: 22184813205 dataset_size: 25749381711 configs: - config_name: amharic data_files: - split: train path: amharic/train-* - config_name: arabic data_files: - split: train path: arabic/train-* - config_name: default data_files: - split: train path: data/train-* - config_name: hausa data_files: - split: train path: hausa/train-* - config_name: lingala data_files: - split: train path: lingala/train-* - config_name: somali data_files: - split: train path: somali/train-* - config_name: swahili data_files: - split: train path: swahili/train-* - config_name: wolof data_files: - split: train path: wolof/train-* - config_name: yoruba data_files: - split: train path: yoruba/train-* ---
提供机构:
amanuelbyte
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作