five

Gariscat/HouseX

收藏
Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Gariscat/HouseX
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: HouseX Genre Audio Classification task_categories: - audio-classification task_ids: - multi-class-classification tags: - audio - music - edm - house - classification size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: train/** - split: validation path: validation/** --- # HouseX Genre Audio Classification [![arXiv](https://img.shields.io/badge/arXiv-2409.06690-b31b1b.svg)](https://arxiv.org/abs/2409.06690) [![GitHub](https://img.shields.io/badge/GitHub-181717?logo=github&logoColor=white)](https://github.com/Gariscat/housex-v2) ## Dataset Description This dataset contains EDM/house audio tracks organized for Hugging Face `audiofolder` loading. Labels are inferred directly from folder names. - Total annotated examples: 1037 - Audio files found and included: 1035 - Missing audio excluded: 2 - Splits: `train` (932), `validation` (103) ## Task Multi-class audio classification for 8 house-related genres. ## Labels - `bass_house` - `bigroom` - `deep_house` - `future_house` - `future_rave` - `progressive_house` - `slap_house` - `techno` ## Dataset Structure The dataset follows `audiofolder` format: ```text train/<label_name>/*.ogg validation/<label_name>/*.ogg ``` ## Data Preparation Notes - Source annotations were numeric scores per class. - A single class label was assigned per track via argmax score. - If multiple classes tied at max score, deterministic tie-break (alphabetical) was used. - Random seed for split: `42`. - Train/validation split ratio: `9:1` (stratified by class). ## Loading Example ```python from datasets import load_dataset train_ds = load_dataset("audiofolder", data_dir=".", split="train") val_ds = load_dataset("audiofolder", data_dir=".", split="validation") print(train_ds, val_ds) ``` ## Class Distribution ### Train - `bass_house`: 139 - `bigroom`: 99 - `deep_house`: 85 - `future_house`: 169 - `future_rave`: 106 - `progressive_house`: 158 - `slap_house`: 93 - `techno`: 83 ### Validation - `bass_house`: 15 - `bigroom`: 11 - `deep_house`: 9 - `future_house`: 19 - `future_rave`: 12 - `progressive_house`: 18 - `slap_house`: 10 - `techno`: 9 ## License We are committed to ethical and legal research practices and have carefully considered copyright implications in our non-commercial, academic work aimed at advancing music information retrieval (MIR) for EDM. Below, we address these concerns and clarify our approach: - Justification for Commercial Releases: To ensure high audio quality and representativeness of contemporary EDM, we used commercial releases, as these reflect the production standards and diversity of the genre’s biggest hits. This choice strengthens the validity and generalizability of our findings, which aim to benefit the MIR community and, indirectly, the music industry through improved music analysis tools. - Non-Commercial Academic Research: Our study is purely academic, with no commercial intent or application. The dataset and model are developed solely to advance MIR techniques for EDM sub-genre classification, contributing to the broader scientific community’s understanding of music structure and style. - Transformative Use of Limited Excerpts: We extracted only the “drop” sections of the songs, which are short, distinct segments (typically 15–30 seconds). This use is transformative, as the drops are processed for feature extraction and classification, not for reproduction or consumption as music. The dataset does not enable reconstruction of the original songs, ensuring no substitution for the artists’ or labels’ original market. - No Market Harm: Our work, even if the dataset is made public, poses no threat to the commercial market of the original artists or record labels. The dataset consists of processed audio features and short excerpts, not full tracks, and is intended for research purposes only. To the best of our knowledge, they could not be used to replicate or compete with the original songs, We are faithfully grateful to all the artists who produced these amazing tracks. Still, if you have copyright issues, please contact xl3133@nyu.edu.
提供机构:
Gariscat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作