SongFormDB
收藏魔搭社区2025-12-04 更新2025-09-27 收录
下载链接:
https://modelscope.cn/datasets/ASLP-lab/SongFormDB
下载链接
链接失效反馈官方服务:
资源简介:
# SongFormDB 🎵
[English | [中文](README_ZH.md)]
**A Large-Scale Multilingual Music Structure Analysis Dataset for Training [SongFormer](https://huggingface.co/ASLP-lab/SongFormer) 🚀**
<div align="center">


[](https://arxiv.org/abs/2510.02797)
[](https://github.com/ASLP-lab/SongFormer)
[](https://huggingface.co/spaces/ASLP-lab/SongFormer)
[](https://huggingface.co/ASLP-lab/SongFormer)
[](https://huggingface.co/datasets/ASLP-lab/SongFormDB)
[](https://huggingface.co/datasets/ASLP-lab/SongFormBench)
[](https://discord.gg/p5uBryC4Zs)
[](http://www.npu-aslp.org/)
</div>
<div align="center">
<h3>
Chunbo Hao<sup>1*</sup>, Ruibin Yuan<sup>2,5*</sup>, Jixun Yao<sup>1</sup>, Qixin Deng<sup>3,5</sup>,<br>Xinyi Bai<sup>4,5</sup>, Wei Xue<sup>2</sup>, Lei Xie<sup>1†</sup>
</h3>
<p>
<sup>*</sup>Equal contribution <sup>†</sup>Corresponding author
</p>
<p>
<sup>1</sup>Audio, Speech and Language Processing Group (ASLP@NPU),<br>Northwestern Polytechnical University<br>
<sup>2</sup>Hong Kong University of Science and Technology<br>
<sup>3</sup>Northwestern University<br>
<sup>4</sup>Cornell University<br>
<sup>5</sup>Multimodal Art Projection (M-A-P)
</p>
</div>
---
## 🌟 What is SongFormDB?
SongFormDB is a **comprehensive, large-scale, multilingual dataset** designed to revolutionize Music Structure Analysis (MSA). This dataset serves as the training foundation for our state-of-the-art SongFormer model, providing unprecedented scale and diversity for MSA research.
---
## ✨ Key Highlights
### 🎯 **Three Powerful Subsets**
#### 🎸 **SongForm-HX (HX)** - *Precision & Quality*
- ✅ **Rule-corrected HarmonixSet** with improved annotation accuracy
- 🎛️ **Custom BigVGAN vocoder** trained on internal data for superior mel spectrogram reconstruction
- 📊 **Unified train/validation/test splits** for consistent evaluation
#### 🎵 **SongForm-Hook (H)** - *Scale & Diversity*
- 🎼 **5,933 songs** with precise structural annotations
- 🌍 Helps improve the model's **generalization ability**
#### 💎 **SongForm-Gem (G)** - *Global Coverage*
- 🌐 **47 different languages** for true multilingual coverage
- 🎶 **Diverse BPMs and musical styles** for comprehensive training
- 🤖 **Gemini-annotated** with strong performance on ACC and HR3F metrics
- 🎯 **4,387 high-quality songs** with music structure analysis
---
## 📊 Dataset Composition
### 🎸 SongForm-HX (HX) - 712 Songs
Enhanced HarmonixSet with rule-based corrections and unified evaluation protocol.
**Data Location:** `data/HX/SongFormDB-HX.jsonl`
| Field | Description |
|-------|-------------|
| `id` | Unique song identifier |
| `youtube_url` | Original YouTube source (⚠️ Note: May differ from HarmonixSet audio) |
| `split` | Dataset split (`train`/`val`) |
| `subset` | Always "HX" |
| `duration` | Total song duration in seconds |
| `mel_path` | Path to mel spectrogram file |
| `label_path` | Path to structural annotation file |
| `labels` | JSON-formatted structural information |
### 🎵 SongForm-Hook (H) - 5,933 Songs
Large-scale dataset with precise structural annotations for enhanced generalization.
**Data Location:** `data/Hook/SongFormDB-Hook.jsonl`
| Field | Description |
|-------|-------------|
| `id` | Unique song identifier |
| `youtube_url` | YouTube source URL |
| `split` | Always `train` |
| `subset` | Always "Hook" |
| `duration` | Total song duration |
| `mel_path` | Mel spectrogram file path |
| `start` | Segment start time |
| `end` | Segment end time |
| `label` | List of structural labels for this segment |
**⚠️ Important Notes:**
- Each row corresponds to a structurally annotated segment
- One song may have multiple annotation rows
- Labels are provided as lists (multi-label support)
### 💎 SongForm-Gem (G) - 4,387 Songs
Globally diverse dataset with Gemini-powered annotations across 47 languages.
**Data Location:** `data/Gem/SongFormDB-Gem.jsonl`
**⚠️ Important Notes:**
- Some YouTube links might be inactive, so the actual number of available samples is slightly reduced.
- Format similar to SongForm-HX
- YouTube URLs correspond to actual used data
- Gaps between segments labeled as `NO_LABEL` due to Gemini's time resolution limitations
---
## 🚀 Quick Start
### Download Options
You can speed up the download by skipping the `mels` folder and downloading other parts you need.
### Getting the Audio Files
The dataset contains annotations only. To get the actual audio files, follow these instructions based on the dataset version:
#### SongForm-HX
You have two options:
**Option 1 (Recommended): Audio Reconstruction**
- Use the mel-spectrograms provided in the official HarmonixSet dataset, which are also included in this repository.
- Follow the `Audio Reconstruction` steps described later in this document
**Option 2: YouTube Download**
- Download songs from YouTube using [*this list*](https://github.com/urinieto/harmonixset/blob/main/dataset/youtube_urls.csv)
- **Important:** Pay attention to the notes in brackets after each link
- YouTube versions may differ from the original HarmonixSet
- If needed, you can align the audio using: [*Reference code*](https://github.com/urinieto/harmonixset/blob/main/notebooks/Audio%20Alignment.ipynb) and mel-spectrograms from the HarmonixSet README
- **Note:** Alignment may cause audio discontinuities, so Option 1 is preferred
#### SongForm-Hook (H) and SongForm-Gem (G)
Choose either method:
- **Direct download from YouTube** (better quality)
- **Use a vocoder** to reconstruct from mel-spectrograms (may have lower quality)
---
## 🎼 Audio Reconstruction
If YouTube sources become unavailable, reconstruct audio using mel spectrograms:
### For SongForm-HX:
```bash
# Clone BigVGAN repository
git clone https://github.com/NVIDIA/BigVGAN.git
cd utils/HarmonixSet
# Update BIGVGAN_REPO_DIR in inference_e2e.sh
bash inference_e2e.sh
```
### For SongForm-Hook & SongForm-Gem:
Use [bigvgan_v2_44khz_128band_256x](https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x):
```python
# Add BigVGAN to PYTHONPATH, then:
# See implementation in utils/CN/infer.py
```
---
## 📈 Impact & Applications
- 🎯 **Enhanced MSA Performance:** Train more robust and accurate music structure analysis models
- 🌍 **Cross-lingual Music Understanding:** Enable comprehensive multilingual music analysis capabilities that transcend language barriers
- 🎵 **Genre Adaptability:** Strengthen model generalization across diverse musical styles and genres for broader applicability
---
## 📚 Resources
- 📖 **Paper:** Coming Soon
- 🧑💻 **Model:** [SongFormer](https://huggingface.co/ASLP-lab/SongFormer)
- 📊 **Benchmark:** [SongFormBench](https://huggingface.co/datasets/ASLP-lab/SongFormBench)
- 💻 **Code:** [GitHub Repository](https://github.com/ASLP-lab/SongFormer)
---
## 🤝 Citation
```bibtex
@misc{hao2025songformer,
title = {SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision},
author = {Chunbo Hao and Ruibin Yuan and Jixun Yao and Qixin Deng and Xinyi Bai and Wei Xue and Lei Xie},
year = {2025},
eprint = {2510.02797},
archivePrefix = {arXiv},
primaryClass = {eess.AS},
url = {https://arxiv.org/abs/2510.02797}
}
```
---
## 📧 Contact & Support
🐛 **Issues?** Open an issue on our [GitHub repository](https://github.com/ASLP-lab/SongFormer)
📧 **Collaboration?** Contact us through GitHub
# SongFormDB 🎵
[English | [中文](README_ZH.md)]
**面向训练[SongFormer](https://huggingface.co/ASLP-lab/SongFormer)的大规模多语言音乐结构分析数据集** 🚀
<div align="center">


[](https://arxiv.org/abs/2510.02797)
[](https://github.com/ASLP-lab/SongFormer)
[](https://huggingface.co/spaces/ASLP-lab/SongFormer)
[](https://huggingface.co/ASLP-lab/SongFormer)
[](https://huggingface.co/datasets/ASLP-lab/SongFormDB)
[](https://huggingface.co/datasets/ASLP-lab/SongFormBench)
[](https://discord.gg/p5uBryC4Zs)
[](http://www.npu-aslp.org/)
</div>
<div align="center">
<h3>
郝春博<sup>1*</sup>, 袁瑞彬<sup>2,5*</sup>, 姚吉勋<sup>1</sup>, 邓启欣<sup>3,5</sup>,<br>白欣怡<sup>4,5</sup>, 薛巍<sup>2</sup>, 谢磊<sup>1†</sup>
</h3>
<p>
<sup>*</sup>共同第一作者 <sup>†</sup>通讯作者
</p>
<p>
<sup>1</sup>音频、语音与语言处理小组(ASLP@NPU),<br>西北工业大学<br>
<sup>2</sup>香港科技大学<br>
<sup>3</sup>美国西北大学<br>
<sup>4</sup>康奈尔大学<br>
<sup>5</sup>多模态艺术投影实验室(M-A-P)
</p>
</div>
---
## 🌟 什么是SongFormDB?
SongFormDB是一款**综合性、大规模多语言数据集**,旨在革新音乐结构分析(Music Structure Analysis, MSA)领域。本数据集作为当前领先的SongFormer模型的训练基础,为音乐结构分析研究提供了前所未有的规模与多样性。
---
## ✨ 核心亮点
### 🎯 **三大优质子集**
#### 🎸 **SongForm-HX(HX)—— 精准高质**
- ✅ **经过规则修正的HarmonixSet**,提升了标注准确率
- 🎛️ **基于内部数据训练的定制化BigVGAN声码器**,可实现更优质的梅尔频谱图重建
- 📊 **统一的训练/验证/测试划分**,保障评估一致性
#### 🎵 **SongForm-Hook(H)—— 规模多元**
- 🎼 **5933首歌曲,均带有精确的结构标注**
- 🌍 有效提升模型的**泛化能力**
#### 💎 **SongForm-Gem(G)—— 全球覆盖**
- 🌐 **涵盖47种不同语言**,实现真正的多语言覆盖
- 🎶 **丰富的节拍速度(BPM)与音乐风格**,支持全面训练
- 🤖 **由Gemini标注**,在ACC与HR3F指标上表现优异
- 🎯 **4387首高质量歌曲,带有音乐结构分析标注**
---
## 📊 数据集构成
### 🎸 SongForm-HX(HX)—— 712首歌曲
基于HarmonixSet优化而来,采用基于规则的修正与统一评估协议。
**数据存储路径:** `data/HX/SongFormDB-HX.jsonl`
| 字段名 | 说明 |
|-------|-------------|
| `id` | 唯一歌曲标识符 |
| `youtube_url` | 原始YouTube来源(⚠️ 注意:可能与HarmonixSet的音频来源不一致) |
| `split` | 数据集划分(`train`/`val`) |
| `subset` | 固定为“HX” |
| `duration` | 歌曲总时长(单位:秒) |
| `mel_path` | 梅尔频谱图文件路径 |
| `label_path` | 结构标注文件路径 |
| `labels` | JSON格式的结构信息 |
### 🎵 SongForm-Hook(H)—— 5933首歌曲
大规模数据集,带有精确的结构标注,可提升模型泛化能力。
**数据存储路径:** `data/Hook/SongFormDB-Hook.jsonl`
| 字段名 | 说明 |
|-------|-------------|
| `id` | 唯一歌曲标识符 |
| `youtube_url` | YouTube来源链接 |
| `split` | 固定为`train` |
| `subset` | 固定为“Hook” |
| `duration` | 歌曲总时长 |
| `mel_path` | 梅尔频谱图文件路径 |
| `start` | 片段起始时间 |
| `end` | 片段结束时间 |
| `label` | 当前片段的结构标签列表 |
**⚠️ 重要说明:**
- 每一行对应一个带有结构标注的音频片段
- 单首歌曲可能对应多条标注行
- 标签以列表形式提供(支持多标签)
### 💎 SongForm-Gem(G)—— 4387首歌曲
全球多元化数据集,由Gemini完成标注,涵盖47种语言。
**数据存储路径:** `data/Gem/SongFormDB-Gem.jsonl`
**⚠️ 重要说明:**
- 部分YouTube链接可能已失效,实际可用样本数量略有减少
- 数据格式与SongForm-HX类似
- YouTube链接对应实际使用的音频来源
- 由于Gemini的时间分辨率限制,未标注的片段间隙将标记为`NO_LABEL`
---
## 🚀 快速上手
### 下载选项
您可跳过`mels`文件夹,按需下载其他组件,以加快下载速度。
### 获取音频文件
本数据集仅包含标注信息。如需获取原始音频文件,请根据数据集版本遵循以下步骤:
#### SongForm-HX
您有两种选择:
**方案1(推荐):音频重建**
- 使用官方HarmonixSet数据集提供的梅尔频谱图,本仓库也已包含相关文件
- 遵循本文档后续的「音频重建」步骤操作
**方案2:YouTube下载**
- 通过[此列表](https://github.com/urinieto/harmonixset/blob/main/dataset/youtube_urls.csv)从YouTube下载歌曲
- **重要提示:** 请注意每个链接后的括号注释
- YouTube版本可能与原始HarmonixSet音频存在差异
- 如有需要,可通过[参考代码](https://github.com/urinieto/harmonixset/blob/main/notebooks/Audio%20Alignment.ipynb)进行音频对齐,并参考HarmonixSet README中的梅尔频谱图说明
- **注意:** 音频对齐可能导致音频断连,因此推荐使用方案1
#### SongForm-Hook(H)与SongForm-Gem(G)
可任选以下方式:
- **直接从YouTube下载**(音质更佳)
- **使用声码器**从梅尔频谱图重建音频(音质可能较低)
---
## 🎼 音频重建
若YouTube来源失效,可通过梅尔频谱图重建音频:
### 针对SongForm-HX:
bash
# 克隆BigVGAN仓库
git clone https://github.com/NVIDIA/BigVGAN.git
cd utils/HarmonixSet
# 更新inference_e2e.sh中的BIGVGAN_REPO_DIR路径
bash inference_e2e.sh
### 针对SongForm-Hook与SongForm-Gem:
使用[bigvgan_v2_44khz_128band_256x](https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x):
python
# 将BigVGAN添加至PYTHONPATH,随后:
# 具体实现请参考utils/CN/infer.py
---
## 📈 应用价值与影响
- 🎯 **提升音乐结构分析性能:** 训练更鲁棒、更精准的音乐结构分析模型
- 🌍 **跨语言音乐理解:** 实现跨越语言壁垒的全面多语言音乐分析能力
- 🎵 **音乐风格适配性:** 增强模型在多样音乐风格与流派中的泛化能力,拓展应用场景
---
## 📚 相关资源
- 📖 **论文:** 即将上线
- 🧑💻 **模型:** [SongFormer](https://huggingface.co/ASLP-lab/SongFormer)
- 📊 **基准测试集:** [SongFormBench](https://huggingface.co/datasets/ASLP-lab/SongFormBench)
- 💻 **代码:** [GitHub仓库](https://github.com/ASLP-lab/SongFormer)
---
## 🤝 引用格式
bibtex
@misc{hao2025songformer,
title = {SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision},
author = {Chunbo Hao and Ruibin Yuan and Jixun Yao and Qixin Deng and Xinyi Bai and Wei Xue and Lei Xie},
year = {2025},
eprint = {2510.02797},
archivePrefix = {arXiv},
primaryClass = {eess.AS},
url = {https://arxiv.org/abs/2510.02797}
}
---
## 📧 联系与支持
🐛 **遇到问题?** 请在我们的[GitHub仓库](https://github.com/ASLP-lab/SongFormer)提交Issue
📧 **合作洽谈?** 请通过GitHub联系我们
提供机构:
maas
创建时间:
2025-09-15
搜集汇总
数据集介绍

背景与挑战
背景概述
SongFormDB是一个综合性大规模多语言音乐结构分析数据集,包含三个子集,共计超过10,000首歌曲,支持高质量标注和泛化能力提升,适用于训练先进的音乐结构分析模型。
以上内容由遇见数据集搜集并总结生成



