treadon/banger-scorer-generated-songs

Name: treadon/banger-scorer-generated-songs
Creator: treadon
Published: 2026-03-26 15:58:20
License: 暂无描述

Hugging Face2026-03-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/treadon/banger-scorer-generated-songs

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 tags: - music - audio - ai-generated-music - ace-step - quality-evaluation - music-generation - banger-scorer task_categories: - audio-classification size_categories: - n<1K language: - en - es - hi - pa - zh --- # Banger Scorer Generated Songs 230 AI-generated songs across 10 genres and 5 languages, all scored by the [banger scorer](https://huggingface.co/treadon/banger-scorer). Includes 10 genre tests (20 songs each) plus 1 banger-optimized run (30 songs) that used data-driven parameter selection to maximize scores. Every song includes its MP3 audio, generation metadata (BPM, key, seed, caption prompt), and banger score. Useful for research into AI music quality, training better scorers, or just listening to what works and what does not. ![Global scatter: all 230 songs scored](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/overview/global_scatter.png) ## Dataset Description ### What is this? A systematic evaluation of AI music generation quality. 230 songs were generated with [ACE-Step 1.5](https://huggingface.co/ACE-Step/Ace-Step1.5) on an Apple M4 Pro, spanning a grid of caption styles, BPMs, keys, and languages. Each song was then scored by the [banger scorer](https://huggingface.co/treadon/banger-scorer) (MERT + MLP, trained on FMA play counts). The dataset captures what an automated quality model thinks about AI-generated music -- and reveals clear patterns in what makes some generated songs score higher than others. ### Structure ``` 230 songs total: ├── test01/ (20 songs) -- Hip Hop (English) ├── test02/ (20 songs) -- Pop/Dance (English) ├── test03/ (20 songs) -- R&B/Soul (English) ├── test04/ (20 songs) -- Latin/Reggaeton (Spanish) ├── test05/ (20 songs) -- Bollywood (Hindi) ├── test06/ (20 songs) -- Punjabi/Bhangra (Punjabi) ├── test07/ (20 songs) -- C-Pop (Chinese) ├── test08/ (20 songs) -- Rock/Alternative (English) ├── test09/ (20 songs) -- Electronic/EDM (Instrumental) ├── test10/ (20 songs) -- Acoustic/Folk (English) └── bangers/ (30 songs) -- Banger-optimized run (mixed EDM/Bhangra/Bollywood) ``` ### Per-Song Fields | Field | Type | Description | |-------|------|-------------| | `audio` | MP3 (192kbps) | The generated song, 2 minutes, 48kHz stereo | | `genre` | string | Genre category (e.g., "EDM", "Hip Hop", "Bollywood") | | `bpm` | int | Beats per minute used for generation | | `key` | string | Musical key used for generation (e.g., "Eb minor", "D major") | | `seed` | int | Random seed for deterministic reproduction | | `caption` | string | Full text prompt given to ACE-Step | | `banger_score` | float (0-10) | Score from the banger scorer model | | `language` | string | Lyrics language (en/es/hi/pa/zh or instrumental) | ## Generation Details ### Model [ACE-Step 1.5](https://huggingface.co/ACE-Step/Ace-Step1.5) -- a full-song AI music generator with vocals, instruments, and structure. Generates ~2-minute songs from a text caption and optional lyrics. ### Hardware Apple M4 Pro with 64 GB unified memory. Each song took ~110 seconds to generate (language model planning ~27s, DiT diffusion ~9s for 8 turbo steps, VAE decoding ~18s, plus overhead). Total generation time for all 230 songs: ~7 hours. Scoring all 230 songs took ~8 minutes. **200/200 success rate on genre tests, 30/30 on the banger run. Zero generation failures.** Achieved by spawning a fresh subprocess per song to avoid MPS memory accumulation. ### Parameter Grid Each genre test sampled 20 combinations from a grid of 4 caption styles x 5 BPMs x 5 keys (100 possible combos). Within each genre, lyrics were held constant while musical parameters varied, isolating the effect of BPM, key, and caption style on score. ## Score Statistics ![Score distribution histogram](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/overview/score_histogram.png) ### By Genre ![Genre ranking by mean score](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/overview/genre_ranking.png) ![Box plot of scores by genre](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/overview/global_boxplot.png) | Rank | Genre | Language | Mean | Best | Worst | Range | |------|-------|----------|------|------|-------|-------| | 1 | Punjabi/Bhangra | Punjabi | 3.77 | 4.26 | 2.79 | 1.47 | | 2 | Electronic/EDM | Instrumental | 3.71 | **5.29** | 2.80 | 2.49 | | 3 | Bollywood | Hindi | 3.53 | 4.38 | 2.70 | 1.68 | | 4 | C-Pop | Chinese | 3.20 | 4.47 | 2.13 | 2.34 | | 5 | Latin/Reggaeton | Spanish | 3.19 | 3.90 | 2.36 | 1.54 | | 6 | Pop/Dance | English | 3.05 | 4.31 | 1.98 | 2.33 | | 7 | Rock/Alternative | English | 3.03 | 3.66 | 2.02 | 1.64 | | 8 | Hip Hop | English | 2.92 | 3.38 | 2.52 | 0.86 | | 9 | Acoustic/Folk | English | 2.63 | 3.31 | 2.03 | 1.28 | | 10 | R&B/Soul | English | 2.62 | 3.21 | 2.14 | 1.07 | | -- | **Banger-optimized** | **Mixed** | **3.48** | **5.29** | **2.01** | **3.28** | ### Top 10 Overall (out of 230 songs) | Rank | Score | Genre | BPM | Key | Caption Style | |------|-------|-------|-----|-----|---------------| | #1 | **5.29** | EDM | 130 | Eb minor | Melodic techno, atmospheric, driving | | #2 | **5.29** | Banger Run | 128 | D minor | Dark electronic, industrial, driving | | #3 | **4.90** | EDM | 138 | F minor | Deep house, groovy bassline | | #4 | **4.87** | EDM | 126 | Eb minor | Progressive house, euphoric drop | | #5 | **4.66** | EDM | 138 | Bb minor | Melodic techno, dark and dreamy | | #6 | **4.47** | C-Pop | 120 | A minor | Emotional ballad, piano-driven | | #7 | **4.40** | Banger Run | 128 | Bb minor | Progressive house, festival anthem | | #8 | **4.38** | Bollywood | 120 | E minor | Club-ready beat, heavy bass | | #9 | **4.36** | Banger Run | 130 | D minor | Deep house, groovy bassline | | #10 | **4.31** | Pop/Dance | 128 | D major | Electronic pop, shimmering arpeggios | ### Best vs Worst Examples ![Top vs bottom songs](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/overview/top_vs_bottom.png) The highest-scoring songs share common traits: driving four-on-the-floor beats, minor keys, 126-138 BPM, electronic/dark textures. The lowest-scoring songs tend to be slow R&B, acoustic folk, or jazzy hip hop with mellow dynamics. ## Musical Analysis ### Optimization Impact The banger-optimized run (30 songs) used only parameter combinations that scored highest in the 200-song random sweep: dark electronic/industrial captions, 126-138 BPM, minor keys only. ![Optimization impact](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/overview/optimization_impact.png) | Metric | Random (200 songs) | Optimized (30 songs) | Change | |--------|-------------------|---------------------|--------| | Mean score | 3.17 | **3.48** | +10% | | Songs >= 3.5 | 20% | **60%** | 3x | | Songs >= 4.0 | 5% | **20%** | 4x | | Top score | 5.29 | 5.29 | Same | Data-driven parameter selection raised the floor and consistency without raising the ceiling. The winning formula: dark electronic/industrial styles, 128-130 BPM, D minor or Bb minor. ![Hit rate comparison](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/overview/hit_rate_comparison.png) ### Key and Tonality Minor keys consistently outperformed major keys across all genres. All top-5 overall songs were in minor keys. ![Major vs minor key comparison](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/analysis/major_vs_minor.png) ![Detailed key analysis](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/analysis/key_analysis.png) ### BPM Clear BPM sweet spots per genre. Slower BPMs (< 85) consistently underperformed. The highest-scoring region across all genres: 126-138 BPM. ![BPM vs score (all genres)](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/analysis/bpm_vs_score_global.png) ![BPM/Key heatmap for banger-optimized run](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/per_genre/bangers_bpm_key_heatmap.png) ### Caption Style The caption prompt significantly affects scores. Descriptive, production-oriented captions ("melodic techno, atmospheric pads, driving beat") outperformed vague ones. The "traditional/ethnic" keyword cluster (dhol, tumbi, bhangra) achieved the highest style-level mean (3.8). ![Caption style analysis](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/analysis/caption_style_analysis.png) ### Generated vs Training Data AI-generated songs cluster tightly around the FMA training mean (3.17 generated vs 3.27 FMA), while real music has a long right tail up to 10. Even optimized generation peaks at 5.29 (67th percentile). The scorer correctly identifies that AI music does not yet match the quality distribution of human-made music. ![Generated vs training score distributions](https://raw.githubusercontent.com/treadon/banger-scorer/main/plots/training/generated_vs_training.png) ## Use Cases - **AI music quality research.** Study what musical parameters (BPM, key, style, language) lead to higher/lower quality scores. The structured parameter grid makes controlled comparisons possible. - **Listening comparison.** Download the top-5 and bottom-5 from each genre to hear what the scorer prefers. Do you agree with the model? - **Train better scorers.** Use these scored songs as additional training data or evaluation benchmarks for music quality models. - **ACE-Step evaluation.** Benchmark ACE-Step 1.5 across genres and languages with quantitative quality scores. - **Preference modeling.** Use the ranked pairs (best vs worst within each genre) to train preference models or reward models for music generation. ## Limitations - **Scorer bias.** All scores come from a single model trained on FMA play counts. The model favors high-energy, beat-driven music and underrates mellow genres. Scores reflect learned statistical patterns, not absolute quality. - **Single generator.** All songs are from ACE-Step 1.5. Results may not generalize to other music generation models. - **Limited seeds per combo.** Each parameter combination was tested with only 1 random seed. Different seeds produce different songs from the same parameters. - **Lyrics held constant per genre.** Within each genre test, all 20 songs share the same lyrics. Only musical parameters (BPM, key, caption style) vary. - **2-minute songs.** All songs are approximately 2 minutes. Longer or shorter durations may produce different quality characteristics. ## How the Scores Were Computed Each song was: 1. Loaded at 24kHz mono 2. Truncated to 30 seconds 3. Encoded through [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) to produce a 1024-dim embedding 4. Scored by the [banger scorer](https://huggingface.co/treadon/banger-scorer) MLP (1024 -> 512 -> 256 -> 128 -> 1) 5. Clamped to the 0-10 range The scorer model (MAE 0.858, Spearman 0.468 on FMA test set) was trained on pre-computed MERT embeddings from [treadon/fma-mert-embeddings](https://huggingface.co/datasets/treadon/fma-mert-embeddings). ## Genre Test Details | Test | Genre | BPMs | Keys | Language | Lyrics Theme | |------|-------|------|------|----------|-------------| | 01 | Hip Hop | 78, 85, 90, 95, 100 | C/Bb/D/E/Ab minor | English | East coast, gritty, street | | 02 | Pop/Dance | 110, 118, 124, 128, 135 | C/G/D/A/F major | English | Upbeat, summer, dance | | 03 | R&B/Soul | 65, 72, 80, 88, 95 | Eb/Ab/Db maj, Bb/F min | English | Romantic, intimate | | 04 | Latin/Reggaeton | 90, 95, 100, 105, 110 | A/D/E/G/C minor | Spanish | Party, dancing, heat | | 05 | Bollywood | 100, 110, 120, 130, 140 | C/D/A/E min, G maj | Hindi | Film dance number | | 06 | Punjabi/Bhangra | 95, 105, 115, 125, 135 | G/D/A/C maj, E min | Punjabi | Celebration, dance | | 07 | C-Pop | 75, 90, 105, 120, 130 | C/G/F maj, A/D min | Chinese | Modern pop, emotional | | 08 | Rock/Alt | 120, 130, 140, 150, 160 | E/A/D/G/B minor | English | Indie, garage, alt | | 09 | Electronic/EDM | 122, 126, 130, 138, 150 | A/F/C/Eb/Bb minor | Instrumental | House, techno, trance | | 10 | Acoustic/Folk | 85, 95, 105, 115, 125 | G/C/D/A/E major | English | Singer-songwriter | | -- | Banger Run | 126, 128, 130, 135, 138 | Eb/F/Bb/C/D minor | Mixed | Optimized for score | ## Citation ```bibtex @misc{acestep2025, title={ACE-Step: A Step Towards Music Generation Foundation Model}, author={ACE-Step Team}, year={2025}, url={https://huggingface.co/ACE-Step/Ace-Step1.5} } @article{li2023mert, title={MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training}, author={Li, Yizhi and Yuan, Ruibin and Zhang, Ge and Ma, Yinghao and others}, journal={arXiv preprint arXiv:2306.00107}, year={2023} } ``` ## Dataset Card Contact [treadon](https://huggingface.co/treadon) on HuggingFace

提供机构：

treadon

5,000+

优质数据集

54 个

任务类型

进入经典数据集