five

tomas-gajarsky/speech-commands-lt

收藏
Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/tomas-gajarsky/speech-commands-lt
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 task_categories: - audio-classification tags: - long-tail - imbalanced - speech-commands size_categories: - 10K<n<100K configs: - config_name: r-10 data_files: - split: train path: r-10/train-* - split: validation path: r-10/validation-* - split: test path: r-10/test-* - config_name: r-13 data_files: - split: train path: r-13/train-* - split: validation path: r-13/validation-* - split: test path: r-13/test-* - config_name: r-20 data_files: - split: train path: r-20/train-* - split: validation path: r-20/validation-* - split: test path: r-20/test-* - config_name: r-50 data_files: - split: train path: r-50/train-* - split: validation path: r-50/validation-* - split: test path: r-50/test-* --- # Speech Commands-LT (Long Tail) Long-tail variants of [Google Speech Commands v0.02](https://arxiv.org/abs/1804.03209) for benchmarking imbalanced audio classification. ## Dataset Summary Speech Commands v0.02 (Warden, 2018) contains 35 spoken word classes with ~1,200-3,200 samples each. This dataset applies **exponential decay** to the training set to create long-tail distributions with varying imbalance ratios, simulating real-world class imbalance in audio classification. The \_silence\_ class (label 35) is removed. Validation and test sets remain balanced for fair evaluation. ## Configs | Config | Imbalance Ratio | Train Samples | Head Count | Tail Count | |--------|-----------------|---------------|------------|------------| | r-10 | 10x | 44,954 | 3,228 | 325 | | r-13 | 13x | 41,496 | 3,228 | 249 | | r-20 | 20x | 36,729 | 3,228 | 162 | | r-50 | 50x | 29,330 | 3,228 | 64 | All configs share the same validation (9,981) and test (4,890) sets. ## Features - : Audio waveform (16kHz, ~1 second WAV clips) - : ClassLabel (35 classes) - , , , : metadata ## Usage ## Long-Tail Construction Exponential decay is applied per class: where is the imbalance ratio, , and is the largest class count. Random seed 42 is used for reproducible subsampling. ## Source Derived from [beeneptune/speech_commands](https://huggingface.co/datasets/beeneptune/speech_commands) (parquet conversion of [google/speech_commands](https://huggingface.co/datasets/google/speech_commands) v0.02). Original paper: [Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition](https://arxiv.org/abs/1804.03209) (Warden, 2018) ## License CC BY 4.0 (same as the original Google Speech Commands dataset)
提供机构:
tomas-gajarsky
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作