ChaoticEconomist/Jazz-Blues-Music-Dataset_SFT-or-LoRA
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ChaoticEconomist/Jazz-Blues-Music-Dataset_SFT-or-LoRA
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
task_categories:
- text-generation
- question-answering
tags:
- jazz
- blues
- music-history
- sft
- lora
- instruction-tuning
- alpaca-format
- music
- cultural-heritage
size_categories:
- 1K<n<10K
---
# Jazz & Blues Music Dataset (SFT / LoRA Ready)
A structured dataset covering **82 iconic Jazz and Blues songs**, **21 artist
profiles**, and **41 historical events**, expanded into **1,219
instruction-tuning rows** across 7 task types.
Designed for fine-tuning LLMs on music knowledge, cultural history, artist
biography, and domain-specific Q&A tasks.
---
## Overview
| Property | Value |
|------------------|------------------------------------|
| Domain | Jazz & Blues Music |
| Total rows | 1,219 |
| Train split | 1,036 (85%) |
| Validation split | 91 (~7.5%) |
| Test split | 92 (~7.5%) |
| Format | CSV (Alpaca-style prompt format) |
| Songs covered | 82 (41 Blues, 41 Jazz) |
| Artists covered | 21 (11 Blues, 10 Jazz) |
| History events | 41 (18 Blues, 23 Jazz) |
| License | CC BY 4.0 |
---
## Row Types
| Type | Count | Description |
|-------------------------|-------|-------------|
| `song_qa` | 656 | 8 Q&A pairs per song (artist, year, genre, key, tempo, description, significance, subgenre) |
| `history_qa` | 205 | 5 Q&A pairs per historical event |
| `listening_guide` | 82 | How to listen to and appreciate each song |
| `artist_qa` | 126 | 6 Q&A pairs per artist (genre, origin, instruments, style, legacy, birth) |
| `song_explanation` | 82 | Full structured explanation of each song |
| `history_explanation` | 41 | Full structured explanation of each historical event |
| `artist_profile` | 21 | Full artist biography and legacy |
| `genre_comparison` | 6 | Side-by-side comparison of related subgenres |
---
## Columns
| Column | Type | Description |
|---------------|--------|-------------|
| `instruction` | string | Task instruction for the model |
| `input` | string | The question, song title, artist name, or event name |
| `completion` | string | The expected model output |
| `prompt` | string | Full Alpaca-style prompt (`### Instruction / ### Input / ### Response:`) |
| `text` | string | `prompt + completion` — ready for SFT trainers |
| `genre` | string | `jazz` or `blues` |
| `subgenre` | string | Musical subgenre, historical era, or `artist_profile` / `genre_comparison` |
| `difficulty` | string | `beginner`, `intermediate`, or `advanced` |
| `row_type` | string | One of the 7 row types above |
| `source` | string | Always `jazz_blues` |
| `split` | string | `train`, `validation`, or `test` |
---
## Genres & Subgenres
### Blues Subgenres
| Subgenre | Key Artists |
|-----------------------|-------------|
| `delta_blues` | Robert Johnson, Elmore James, Robert Petway |
| `chicago_blues` | Muddy Waters, Howlin' Wolf, Jimmy Reed, Junior Wells |
| `electric_blues` | B.B. King, Shake Your Moneymaker |
| `texas_blues` | Stevie Ray Vaughan |
| `west_coast_blues` | T-Bone Walker, Lowell Fulson |
| `boogie_blues` | John Lee Hooker |
| `soul_blues` | Albert King, Etta James, Bobby 'Blue' Bland |
| `country_blues` | Big Bill Broonzy |
| `rhythm_and_blues` | Big Mama Thornton |
| `classic_blues` | Bessie Smith, Mamie Smith |
### Jazz Subgenres
| Subgenre | Key Artists |
|-----------------------|-------------|
| `new_orleans_jazz` | Louis Armstrong, Jelly Roll Morton |
| `swing_era` | Duke Ellington, Benny Goodman, Billie Holiday |
| `bebop` | Charlie Parker, Dizzy Gillespie, Thelonious Monk |
| `cool_jazz` | Miles Davis, Dave Brubeck, Chet Baker |
| `hard_bop` | Herbie Hancock, Art Blakey, Bobby Timmons |
| `modal_jazz` | Miles Davis (Kind of Blue), John Coltrane |
| `post_bop` | Wayne Shorter, Miles Davis (second quintet), Bill Evans |
| `spiritual_jazz` | John Coltrane (A Love Supreme) |
| `jazz_fusion` | Weather Report, Herbie Hancock (Head Hunters) |
| `jazz_standard` | Autumn Leaves, Summertime, Body and Soul, Cherokee |
| `traditional_jazz` | Louis Armstrong |
| `free_jazz` | Ornette Coleman, Cecil Taylor |
---
## Difficulty Levels
| Level | Criteria |
|----------------|----------|
| `beginner` | Foundational songs, early history, and well-known artists |
| `intermediate` | Core genre developments, Chicago blues, bebop, cool jazz, hard bop |
| `advanced` | Modal jazz, free jazz, fusion, spiritual jazz, post-bop |
---
## Genre Comparisons Included
- Delta Blues vs Chicago Blues
- Bebop vs Cool Jazz
- Hard Bop vs Modal Jazz
- Texas Blues vs West Coast Blues
- Jazz Fusion vs Post-Bop
- Soul Blues vs Rhythm and Blues
---
## Prompt Format
All rows use the **Alpaca instruction format**:
```
### Instruction:
Explain the following jazz or blues song, including its artist, year, genre,
musical characteristics, and historical significance.
### Input:
"So What" by Miles Davis
### Response:
**So What** — Miles Davis (1959)
**Genre:** Jazz / Modal Jazz
**Key:** D Dorian | **Tempo:** Slow
**Description:** The opening track of Kind of Blue; built on two modal scales
rather than complex chord changes, giving improvisers vast melodic freedom.
**Historical Significance:** Defined modal jazz and remains the most-streamed
jazz recording of all time.
```
---
## Usage
### Load with 🤗 Datasets
```python
from datasets import load_dataset
ds = load_dataset("csv", data_files={
"train": "jazz_blues_data/train.csv",
"validation": "jazz_blues_data/validation.csv",
"test": "jazz_blues_data/test.csv",
})
```
### Fine-tune with TRL SFTTrainer
```python
from trl import SFTTrainer
trainer = SFTTrainer(
model=model,
train_dataset=ds["train"],
dataset_text_field="text",
...
)
```
### Filter examples
```python
# Only blues rows
blues = ds["train"].filter(lambda x: x["genre"] == "blues")
# Only artist profiles
profiles = ds["train"].filter(lambda x: x["row_type"] == "artist_profile")
# Only advanced jazz history
adv_jazz_hist = ds["train"].filter(
lambda x: x["genre"] == "jazz"
and x["row_type"] == "history_explanation"
and x["difficulty"] == "advanced"
)
# Listening guides only
guides = ds["train"].filter(lambda x: x["row_type"] == "listening_guide")
```
---
## Files
| File | Description |
|----------------------------------|-------------|
| `jazz_blues_dataset.csv` | Full dataset (1,219 rows) |
| `jazz_blues_data/train.csv` | Training split (1,036 rows, 85%) |
| `jazz_blues_data/validation.csv` | Validation split (91 rows, ~7.5%) |
| `jazz_blues_data/test.csv` | Test split (92 rows, ~7.5%) |
| `gen_jazz_blues.py` | Dataset generator script |
| `jazz_blues_songs.py` | Source knowledge base |
---
## Intended Uses
- **Music LLM fine-tuning** — teach models to reason about jazz and blues
- **Cultural history Q&A** — question answering over music history
- **Artist biography generation** — structured artist profiles
- **LoRA adapters** — lightweight music-domain adapters
- **Educational tools** — interactive music history tutors
- **Genre classification** — structured subgenre knowledge
## Out-of-Scope Uses
- Does not cover rock, soul, R&B, classical, or other genres in depth
- Not a substitute for a full musicology reference or discography database
---
## License
[Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
提供机构:
ChaoticEconomist



