Marek324/speech-music-classification
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Marek324/speech-music-classification
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Speech / Music classification
license: unknown
language: en
tags:
- audio
- parquet
configs:
- config_name: full
data_files:
- split: train
path: full/train/**/*.parquet
- split: validation
path: full/validation/**/*.parquet
- split: test
path: full/test/**/*.parquet
- config_name: mid
data_files:
- split: train
path: mid/train/**/*.parquet
- split: validation
path: mid/validation/**/*.parquet
- split: test
path: mid/test/**/*.parquet
- config_name: mini
data_files:
- split: train
path: mini/train/**/*.parquet
- split: validation
path: mini/validation/**/*.parquet
- split: test
path: mini/test/**/*.parquet
---
## Structure
After `uv run python build.py {mini|mid|full}` for each tier (same `--out-dir` staging parent):
```text
<staging>/
full/ # or mid/, mini/
train/
speech/ part_*.parquet
music/
inactive/
validation/
speech/
...
test/
...
```
Parquet columns: `audio` (HF Audio struct), `sampling_rate`, `class`, `subclass`, `source`, `row_idx`, `labels` (list of `{label, start, end}` in ms).
## Upload
From the **staging parent** (folder that contains `mini/`, `mid/`, `full/`):
```bash
hf upload-large-folder --repo-type dataset YOUR_ORG/YOUR_REPO .
```
## Load
```python
from datasets import load_dataset
ds = load_dataset("YOUR_ORG/YOUR_REPO", "full", split="train")
# Rows include columns from parquet; decode WAV from bytes if needed.
```
Adjust `YOUR_ORG/YOUR_REPO` and verify YAML against current [Hub dataset data files](https://huggingface.co/docs/hub/datasets-data-files-configuration) if `load_dataset` errors.
提供机构:
Marek324



