tomas-gajarsky/speech-commands-lt
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/tomas-gajarsky/speech-commands-lt
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
task_categories:
- audio-classification
tags:
- long-tail
- imbalanced
- speech-commands
size_categories:
- 10K<n<100K
configs:
- config_name: r-10
data_files:
- split: train
path: r-10/train-*
- split: validation
path: r-10/validation-*
- split: test
path: r-10/test-*
- config_name: r-13
data_files:
- split: train
path: r-13/train-*
- split: validation
path: r-13/validation-*
- split: test
path: r-13/test-*
- config_name: r-20
data_files:
- split: train
path: r-20/train-*
- split: validation
path: r-20/validation-*
- split: test
path: r-20/test-*
- config_name: r-50
data_files:
- split: train
path: r-50/train-*
- split: validation
path: r-50/validation-*
- split: test
path: r-50/test-*
---
# Speech Commands-LT (Long Tail)
Long-tail variants of [Google Speech Commands v0.02](https://arxiv.org/abs/1804.03209)
for benchmarking imbalanced audio classification.
## Dataset Summary
Speech Commands v0.02 (Warden, 2018) contains 35 spoken word classes with ~1,200-3,200
samples each. This dataset applies **exponential decay** to the training set to create
long-tail distributions with varying imbalance ratios, simulating real-world class
imbalance in audio classification.
The \_silence\_ class (label 35) is removed. Validation and test sets remain balanced
for fair evaluation.
## Configs
| Config | Imbalance Ratio | Train Samples | Head Count | Tail Count |
|--------|-----------------|---------------|------------|------------|
| r-10 | 10x | 44,954 | 3,228 | 325 |
| r-13 | 13x | 41,496 | 3,228 | 249 |
| r-20 | 20x | 36,729 | 3,228 | 162 |
| r-50 | 50x | 29,330 | 3,228 | 64 |
All configs share the same validation (9,981) and test (4,890) sets.
## Features
- : Audio waveform (16kHz, ~1 second WAV clips)
- : ClassLabel (35 classes)
- , , , : metadata
## Usage
## Long-Tail Construction
Exponential decay is applied per class:
where is the imbalance ratio, , and is the largest
class count. Random seed 42 is used for reproducible subsampling.
## Source
Derived from [beeneptune/speech_commands](https://huggingface.co/datasets/beeneptune/speech_commands)
(parquet conversion of [google/speech_commands](https://huggingface.co/datasets/google/speech_commands) v0.02).
Original paper: [Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition](https://arxiv.org/abs/1804.03209) (Warden, 2018)
## License
CC BY 4.0 (same as the original Google Speech Commands dataset)
提供机构:
tomas-gajarsky



