five

ChaoticEconomist/Jazz-Blues-Music-Dataset_SFT-or-LoRA

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ChaoticEconomist/Jazz-Blues-Music-Dataset_SFT-or-LoRA
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 task_categories: - text-generation - question-answering tags: - jazz - blues - music-history - sft - lora - instruction-tuning - alpaca-format - music - cultural-heritage size_categories: - 1K<n<10K --- # Jazz & Blues Music Dataset (SFT / LoRA Ready) A structured dataset covering **82 iconic Jazz and Blues songs**, **21 artist profiles**, and **41 historical events**, expanded into **1,219 instruction-tuning rows** across 7 task types. Designed for fine-tuning LLMs on music knowledge, cultural history, artist biography, and domain-specific Q&A tasks. --- ## Overview | Property | Value | |------------------|------------------------------------| | Domain | Jazz & Blues Music | | Total rows | 1,219 | | Train split | 1,036 (85%) | | Validation split | 91 (~7.5%) | | Test split | 92 (~7.5%) | | Format | CSV (Alpaca-style prompt format) | | Songs covered | 82 (41 Blues, 41 Jazz) | | Artists covered | 21 (11 Blues, 10 Jazz) | | History events | 41 (18 Blues, 23 Jazz) | | License | CC BY 4.0 | --- ## Row Types | Type | Count | Description | |-------------------------|-------|-------------| | `song_qa` | 656 | 8 Q&A pairs per song (artist, year, genre, key, tempo, description, significance, subgenre) | | `history_qa` | 205 | 5 Q&A pairs per historical event | | `listening_guide` | 82 | How to listen to and appreciate each song | | `artist_qa` | 126 | 6 Q&A pairs per artist (genre, origin, instruments, style, legacy, birth) | | `song_explanation` | 82 | Full structured explanation of each song | | `history_explanation` | 41 | Full structured explanation of each historical event | | `artist_profile` | 21 | Full artist biography and legacy | | `genre_comparison` | 6 | Side-by-side comparison of related subgenres | --- ## Columns | Column | Type | Description | |---------------|--------|-------------| | `instruction` | string | Task instruction for the model | | `input` | string | The question, song title, artist name, or event name | | `completion` | string | The expected model output | | `prompt` | string | Full Alpaca-style prompt (`### Instruction / ### Input / ### Response:`) | | `text` | string | `prompt + completion` — ready for SFT trainers | | `genre` | string | `jazz` or `blues` | | `subgenre` | string | Musical subgenre, historical era, or `artist_profile` / `genre_comparison` | | `difficulty` | string | `beginner`, `intermediate`, or `advanced` | | `row_type` | string | One of the 7 row types above | | `source` | string | Always `jazz_blues` | | `split` | string | `train`, `validation`, or `test` | --- ## Genres & Subgenres ### Blues Subgenres | Subgenre | Key Artists | |-----------------------|-------------| | `delta_blues` | Robert Johnson, Elmore James, Robert Petway | | `chicago_blues` | Muddy Waters, Howlin' Wolf, Jimmy Reed, Junior Wells | | `electric_blues` | B.B. King, Shake Your Moneymaker | | `texas_blues` | Stevie Ray Vaughan | | `west_coast_blues` | T-Bone Walker, Lowell Fulson | | `boogie_blues` | John Lee Hooker | | `soul_blues` | Albert King, Etta James, Bobby 'Blue' Bland | | `country_blues` | Big Bill Broonzy | | `rhythm_and_blues` | Big Mama Thornton | | `classic_blues` | Bessie Smith, Mamie Smith | ### Jazz Subgenres | Subgenre | Key Artists | |-----------------------|-------------| | `new_orleans_jazz` | Louis Armstrong, Jelly Roll Morton | | `swing_era` | Duke Ellington, Benny Goodman, Billie Holiday | | `bebop` | Charlie Parker, Dizzy Gillespie, Thelonious Monk | | `cool_jazz` | Miles Davis, Dave Brubeck, Chet Baker | | `hard_bop` | Herbie Hancock, Art Blakey, Bobby Timmons | | `modal_jazz` | Miles Davis (Kind of Blue), John Coltrane | | `post_bop` | Wayne Shorter, Miles Davis (second quintet), Bill Evans | | `spiritual_jazz` | John Coltrane (A Love Supreme) | | `jazz_fusion` | Weather Report, Herbie Hancock (Head Hunters) | | `jazz_standard` | Autumn Leaves, Summertime, Body and Soul, Cherokee | | `traditional_jazz` | Louis Armstrong | | `free_jazz` | Ornette Coleman, Cecil Taylor | --- ## Difficulty Levels | Level | Criteria | |----------------|----------| | `beginner` | Foundational songs, early history, and well-known artists | | `intermediate` | Core genre developments, Chicago blues, bebop, cool jazz, hard bop | | `advanced` | Modal jazz, free jazz, fusion, spiritual jazz, post-bop | --- ## Genre Comparisons Included - Delta Blues vs Chicago Blues - Bebop vs Cool Jazz - Hard Bop vs Modal Jazz - Texas Blues vs West Coast Blues - Jazz Fusion vs Post-Bop - Soul Blues vs Rhythm and Blues --- ## Prompt Format All rows use the **Alpaca instruction format**: ``` ### Instruction: Explain the following jazz or blues song, including its artist, year, genre, musical characteristics, and historical significance. ### Input: "So What" by Miles Davis ### Response: **So What** — Miles Davis (1959) **Genre:** Jazz / Modal Jazz **Key:** D Dorian | **Tempo:** Slow **Description:** The opening track of Kind of Blue; built on two modal scales rather than complex chord changes, giving improvisers vast melodic freedom. **Historical Significance:** Defined modal jazz and remains the most-streamed jazz recording of all time. ``` --- ## Usage ### Load with 🤗 Datasets ```python from datasets import load_dataset ds = load_dataset("csv", data_files={ "train": "jazz_blues_data/train.csv", "validation": "jazz_blues_data/validation.csv", "test": "jazz_blues_data/test.csv", }) ``` ### Fine-tune with TRL SFTTrainer ```python from trl import SFTTrainer trainer = SFTTrainer( model=model, train_dataset=ds["train"], dataset_text_field="text", ... ) ``` ### Filter examples ```python # Only blues rows blues = ds["train"].filter(lambda x: x["genre"] == "blues") # Only artist profiles profiles = ds["train"].filter(lambda x: x["row_type"] == "artist_profile") # Only advanced jazz history adv_jazz_hist = ds["train"].filter( lambda x: x["genre"] == "jazz" and x["row_type"] == "history_explanation" and x["difficulty"] == "advanced" ) # Listening guides only guides = ds["train"].filter(lambda x: x["row_type"] == "listening_guide") ``` --- ## Files | File | Description | |----------------------------------|-------------| | `jazz_blues_dataset.csv` | Full dataset (1,219 rows) | | `jazz_blues_data/train.csv` | Training split (1,036 rows, 85%) | | `jazz_blues_data/validation.csv` | Validation split (91 rows, ~7.5%) | | `jazz_blues_data/test.csv` | Test split (92 rows, ~7.5%) | | `gen_jazz_blues.py` | Dataset generator script | | `jazz_blues_songs.py` | Source knowledge base | --- ## Intended Uses - **Music LLM fine-tuning** — teach models to reason about jazz and blues - **Cultural history Q&A** — question answering over music history - **Artist biography generation** — structured artist profiles - **LoRA adapters** — lightweight music-domain adapters - **Educational tools** — interactive music history tutors - **Genre classification** — structured subgenre knowledge ## Out-of-Scope Uses - Does not cover rock, soul, R&B, classical, or other genres in depth - Not a substitute for a full musicology reference or discography database --- ## License [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
提供机构:
ChaoticEconomist
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作