five

SongFormDB

收藏
魔搭社区2025-12-04 更新2025-09-27 收录
下载链接:
https://modelscope.cn/datasets/ASLP-lab/SongFormDB
下载链接
链接失效反馈
官方服务:
资源简介:
# SongFormDB 🎵 [English | [中文](README_ZH.md)] **A Large-Scale Multilingual Music Structure Analysis Dataset for Training [SongFormer](https://huggingface.co/ASLP-lab/SongFormer) 🚀** <div align="center"> ![Python](https://img.shields.io/badge/Python-3.10-brightgreen) ![License](https://img.shields.io/badge/License-CC%20BY%204.0-lightblue) [![arXiv Paper](https://img.shields.io/badge/arXiv-2510.02797-blue)](https://arxiv.org/abs/2510.02797) [![GitHub](https://img.shields.io/badge/GitHub-SongFormer-black)](https://github.com/ASLP-lab/SongFormer) [![HuggingFace Space](https://img.shields.io/badge/HuggingFace-space-yellow)](https://huggingface.co/spaces/ASLP-lab/SongFormer) [![HuggingFace Model](https://img.shields.io/badge/HuggingFace-model-blue)](https://huggingface.co/ASLP-lab/SongFormer) [![Dataset SongFormDB](https://img.shields.io/badge/HF%20Dataset-SongFormDB-green)](https://huggingface.co/datasets/ASLP-lab/SongFormDB) [![Dataset SongFormBench](https://img.shields.io/badge/HF%20Dataset-SongFormBench-orange)](https://huggingface.co/datasets/ASLP-lab/SongFormBench) [![Discord](https://img.shields.io/badge/Discord-join%20us-purple?logo=discord&logoColor=white)](https://discord.gg/p5uBryC4Zs) [![lab](https://img.shields.io/badge/🏫-ASLP-grey?labelColor=lightgrey)](http://www.npu-aslp.org/) </div> <div align="center"> <h3> Chunbo Hao<sup>1*</sup>, Ruibin Yuan<sup>2,5*</sup>, Jixun Yao<sup>1</sup>, Qixin Deng<sup>3,5</sup>,<br>Xinyi Bai<sup>4,5</sup>, Wei Xue<sup>2</sup>, Lei Xie<sup>1†</sup> </h3> <p> <sup>*</sup>Equal contribution &nbsp;&nbsp; <sup>†</sup>Corresponding author </p> <p> <sup>1</sup>Audio, Speech and Language Processing Group (ASLP@NPU),<br>Northwestern Polytechnical University<br> <sup>2</sup>Hong Kong University of Science and Technology<br> <sup>3</sup>Northwestern University<br> <sup>4</sup>Cornell University<br> <sup>5</sup>Multimodal Art Projection (M-A-P) </p> </div> --- ## 🌟 What is SongFormDB? SongFormDB is a **comprehensive, large-scale, multilingual dataset** designed to revolutionize Music Structure Analysis (MSA). This dataset serves as the training foundation for our state-of-the-art SongFormer model, providing unprecedented scale and diversity for MSA research. --- ## ✨ Key Highlights ### 🎯 **Three Powerful Subsets** #### 🎸 **SongForm-HX (HX)** - *Precision & Quality* - ✅ **Rule-corrected HarmonixSet** with improved annotation accuracy - 🎛️ **Custom BigVGAN vocoder** trained on internal data for superior mel spectrogram reconstruction - 📊 **Unified train/validation/test splits** for consistent evaluation #### 🎵 **SongForm-Hook (H)** - *Scale & Diversity* - 🎼 **5,933 songs** with precise structural annotations - 🌍 Helps improve the model's **generalization ability** #### 💎 **SongForm-Gem (G)** - *Global Coverage* - 🌐 **47 different languages** for true multilingual coverage - 🎶 **Diverse BPMs and musical styles** for comprehensive training - 🤖 **Gemini-annotated** with strong performance on ACC and HR3F metrics - 🎯 **4,387 high-quality songs** with music structure analysis --- ## 📊 Dataset Composition ### 🎸 SongForm-HX (HX) - 712 Songs Enhanced HarmonixSet with rule-based corrections and unified evaluation protocol. **Data Location:** `data/HX/SongFormDB-HX.jsonl` | Field | Description | |-------|-------------| | `id` | Unique song identifier | | `youtube_url` | Original YouTube source (⚠️ Note: May differ from HarmonixSet audio) | | `split` | Dataset split (`train`/`val`) | | `subset` | Always "HX" | | `duration` | Total song duration in seconds | | `mel_path` | Path to mel spectrogram file | | `label_path` | Path to structural annotation file | | `labels` | JSON-formatted structural information | ### 🎵 SongForm-Hook (H) - 5,933 Songs Large-scale dataset with precise structural annotations for enhanced generalization. **Data Location:** `data/Hook/SongFormDB-Hook.jsonl` | Field | Description | |-------|-------------| | `id` | Unique song identifier | | `youtube_url` | YouTube source URL | | `split` | Always `train` | | `subset` | Always "Hook" | | `duration` | Total song duration | | `mel_path` | Mel spectrogram file path | | `start` | Segment start time | | `end` | Segment end time | | `label` | List of structural labels for this segment | **⚠️ Important Notes:** - Each row corresponds to a structurally annotated segment - One song may have multiple annotation rows - Labels are provided as lists (multi-label support) ### 💎 SongForm-Gem (G) - 4,387 Songs Globally diverse dataset with Gemini-powered annotations across 47 languages. **Data Location:** `data/Gem/SongFormDB-Gem.jsonl` **⚠️ Important Notes:** - Some YouTube links might be inactive, so the actual number of available samples is slightly reduced. - Format similar to SongForm-HX - YouTube URLs correspond to actual used data - Gaps between segments labeled as `NO_LABEL` due to Gemini's time resolution limitations --- ## 🚀 Quick Start ### Download Options You can speed up the download by skipping the `mels` folder and downloading other parts you need. ### Getting the Audio Files The dataset contains annotations only. To get the actual audio files, follow these instructions based on the dataset version: #### SongForm-HX You have two options: **Option 1 (Recommended): Audio Reconstruction** - Use the mel-spectrograms provided in the official HarmonixSet dataset, which are also included in this repository. - Follow the `Audio Reconstruction` steps described later in this document **Option 2: YouTube Download** - Download songs from YouTube using [*this list*](https://github.com/urinieto/harmonixset/blob/main/dataset/youtube_urls.csv) - **Important:** Pay attention to the notes in brackets after each link - YouTube versions may differ from the original HarmonixSet - If needed, you can align the audio using: [*Reference code*](https://github.com/urinieto/harmonixset/blob/main/notebooks/Audio%20Alignment.ipynb) and mel-spectrograms from the HarmonixSet README - **Note:** Alignment may cause audio discontinuities, so Option 1 is preferred #### SongForm-Hook (H) and SongForm-Gem (G) Choose either method: - **Direct download from YouTube** (better quality) - **Use a vocoder** to reconstruct from mel-spectrograms (may have lower quality) --- ## 🎼 Audio Reconstruction If YouTube sources become unavailable, reconstruct audio using mel spectrograms: ### For SongForm-HX: ```bash # Clone BigVGAN repository git clone https://github.com/NVIDIA/BigVGAN.git cd utils/HarmonixSet # Update BIGVGAN_REPO_DIR in inference_e2e.sh bash inference_e2e.sh ``` ### For SongForm-Hook & SongForm-Gem: Use [bigvgan_v2_44khz_128band_256x](https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x): ```python # Add BigVGAN to PYTHONPATH, then: # See implementation in utils/CN/infer.py ``` --- ## 📈 Impact & Applications - 🎯 **Enhanced MSA Performance:** Train more robust and accurate music structure analysis models - 🌍 **Cross-lingual Music Understanding:** Enable comprehensive multilingual music analysis capabilities that transcend language barriers - 🎵 **Genre Adaptability:** Strengthen model generalization across diverse musical styles and genres for broader applicability --- ## 📚 Resources - 📖 **Paper:** Coming Soon - 🧑‍💻 **Model:** [SongFormer](https://huggingface.co/ASLP-lab/SongFormer) - 📊 **Benchmark:** [SongFormBench](https://huggingface.co/datasets/ASLP-lab/SongFormBench) - 💻 **Code:** [GitHub Repository](https://github.com/ASLP-lab/SongFormer) --- ## 🤝 Citation ```bibtex @misc{hao2025songformer, title = {SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision}, author = {Chunbo Hao and Ruibin Yuan and Jixun Yao and Qixin Deng and Xinyi Bai and Wei Xue and Lei Xie}, year = {2025}, eprint = {2510.02797}, archivePrefix = {arXiv}, primaryClass = {eess.AS}, url = {https://arxiv.org/abs/2510.02797} } ``` --- ## 📧 Contact & Support 🐛 **Issues?** Open an issue on our [GitHub repository](https://github.com/ASLP-lab/SongFormer) 📧 **Collaboration?** Contact us through GitHub

# SongFormDB 🎵 [English | [中文](README_ZH.md)] **面向训练[SongFormer](https://huggingface.co/ASLP-lab/SongFormer)的大规模多语言音乐结构分析数据集** 🚀 <div align="center"> ![Python](https://img.shields.io/badge/Python-3.10-brightgreen) ![License](https://img.shields.io/badge/License-CC%20BY%204.0-lightblue) [![arXiv Paper](https://img.shields.io/badge/arXiv-2510.02797-blue)](https://arxiv.org/abs/2510.02797) [![GitHub](https://img.shields.io/badge/GitHub-SongFormer-black)](https://github.com/ASLP-lab/SongFormer) [![HuggingFace Space](https://img.shields.io/badge/HuggingFace-space-yellow)](https://huggingface.co/spaces/ASLP-lab/SongFormer) [![HuggingFace Model](https://img.shields.io/badge/HuggingFace-model-blue)](https://huggingface.co/ASLP-lab/SongFormer) [![Dataset SongFormDB](https://img.shields.io/badge/HF%20Dataset-SongFormDB-green)](https://huggingface.co/datasets/ASLP-lab/SongFormDB) [![Dataset SongFormBench](https://img.shields.io/badge/HF%20Dataset-SongFormBench-orange)](https://huggingface.co/datasets/ASLP-lab/SongFormBench) [![Discord](https://img.shields.io/badge/Discord-join%20us-purple?logo=discord&logoColor=white)](https://discord.gg/p5uBryC4Zs) [![lab](https://img.shields.io/badge/🏫-ASLP-grey?labelColor=lightgrey)](http://www.npu-aslp.org/) </div> <div align="center"> <h3> 郝春博<sup>1*</sup>, 袁瑞彬<sup>2,5*</sup>, 姚吉勋<sup>1</sup>, 邓启欣<sup>3,5</sup>,<br>白欣怡<sup>4,5</sup>, 薛巍<sup>2</sup>, 谢磊<sup>1†</sup> </h3> <p> <sup>*</sup>共同第一作者 &nbsp;&nbsp; <sup>†</sup>通讯作者 </p> <p> <sup>1</sup>音频、语音与语言处理小组(ASLP@NPU),<br>西北工业大学<br> <sup>2</sup>香港科技大学<br> <sup>3</sup>美国西北大学<br> <sup>4</sup>康奈尔大学<br> <sup>5</sup>多模态艺术投影实验室(M-A-P) </p> </div> --- ## 🌟 什么是SongFormDB? SongFormDB是一款**综合性、大规模多语言数据集**,旨在革新音乐结构分析(Music Structure Analysis, MSA)领域。本数据集作为当前领先的SongFormer模型的训练基础,为音乐结构分析研究提供了前所未有的规模与多样性。 --- ## ✨ 核心亮点 ### 🎯 **三大优质子集** #### 🎸 **SongForm-HX(HX)—— 精准高质** - ✅ **经过规则修正的HarmonixSet**,提升了标注准确率 - 🎛️ **基于内部数据训练的定制化BigVGAN声码器**,可实现更优质的梅尔频谱图重建 - 📊 **统一的训练/验证/测试划分**,保障评估一致性 #### 🎵 **SongForm-Hook(H)—— 规模多元** - 🎼 **5933首歌曲,均带有精确的结构标注** - 🌍 有效提升模型的**泛化能力** #### 💎 **SongForm-Gem(G)—— 全球覆盖** - 🌐 **涵盖47种不同语言**,实现真正的多语言覆盖 - 🎶 **丰富的节拍速度(BPM)与音乐风格**,支持全面训练 - 🤖 **由Gemini标注**,在ACC与HR3F指标上表现优异 - 🎯 **4387首高质量歌曲,带有音乐结构分析标注** --- ## 📊 数据集构成 ### 🎸 SongForm-HX(HX)—— 712首歌曲 基于HarmonixSet优化而来,采用基于规则的修正与统一评估协议。 **数据存储路径:** `data/HX/SongFormDB-HX.jsonl` | 字段名 | 说明 | |-------|-------------| | `id` | 唯一歌曲标识符 | | `youtube_url` | 原始YouTube来源(⚠️ 注意:可能与HarmonixSet的音频来源不一致) | | `split` | 数据集划分(`train`/`val`) | | `subset` | 固定为“HX” | | `duration` | 歌曲总时长(单位:秒) | | `mel_path` | 梅尔频谱图文件路径 | | `label_path` | 结构标注文件路径 | | `labels` | JSON格式的结构信息 | ### 🎵 SongForm-Hook(H)—— 5933首歌曲 大规模数据集,带有精确的结构标注,可提升模型泛化能力。 **数据存储路径:** `data/Hook/SongFormDB-Hook.jsonl` | 字段名 | 说明 | |-------|-------------| | `id` | 唯一歌曲标识符 | | `youtube_url` | YouTube来源链接 | | `split` | 固定为`train` | | `subset` | 固定为“Hook” | | `duration` | 歌曲总时长 | | `mel_path` | 梅尔频谱图文件路径 | | `start` | 片段起始时间 | | `end` | 片段结束时间 | | `label` | 当前片段的结构标签列表 | **⚠️ 重要说明:** - 每一行对应一个带有结构标注的音频片段 - 单首歌曲可能对应多条标注行 - 标签以列表形式提供(支持多标签) ### 💎 SongForm-Gem(G)—— 4387首歌曲 全球多元化数据集,由Gemini完成标注,涵盖47种语言。 **数据存储路径:** `data/Gem/SongFormDB-Gem.jsonl` **⚠️ 重要说明:** - 部分YouTube链接可能已失效,实际可用样本数量略有减少 - 数据格式与SongForm-HX类似 - YouTube链接对应实际使用的音频来源 - 由于Gemini的时间分辨率限制,未标注的片段间隙将标记为`NO_LABEL` --- ## 🚀 快速上手 ### 下载选项 您可跳过`mels`文件夹,按需下载其他组件,以加快下载速度。 ### 获取音频文件 本数据集仅包含标注信息。如需获取原始音频文件,请根据数据集版本遵循以下步骤: #### SongForm-HX 您有两种选择: **方案1(推荐):音频重建** - 使用官方HarmonixSet数据集提供的梅尔频谱图,本仓库也已包含相关文件 - 遵循本文档后续的「音频重建」步骤操作 **方案2:YouTube下载** - 通过[此列表](https://github.com/urinieto/harmonixset/blob/main/dataset/youtube_urls.csv)从YouTube下载歌曲 - **重要提示:** 请注意每个链接后的括号注释 - YouTube版本可能与原始HarmonixSet音频存在差异 - 如有需要,可通过[参考代码](https://github.com/urinieto/harmonixset/blob/main/notebooks/Audio%20Alignment.ipynb)进行音频对齐,并参考HarmonixSet README中的梅尔频谱图说明 - **注意:** 音频对齐可能导致音频断连,因此推荐使用方案1 #### SongForm-Hook(H)与SongForm-Gem(G) 可任选以下方式: - **直接从YouTube下载**(音质更佳) - **使用声码器**从梅尔频谱图重建音频(音质可能较低) --- ## 🎼 音频重建 若YouTube来源失效,可通过梅尔频谱图重建音频: ### 针对SongForm-HX: bash # 克隆BigVGAN仓库 git clone https://github.com/NVIDIA/BigVGAN.git cd utils/HarmonixSet # 更新inference_e2e.sh中的BIGVGAN_REPO_DIR路径 bash inference_e2e.sh ### 针对SongForm-Hook与SongForm-Gem: 使用[bigvgan_v2_44khz_128band_256x](https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x): python # 将BigVGAN添加至PYTHONPATH,随后: # 具体实现请参考utils/CN/infer.py --- ## 📈 应用价值与影响 - 🎯 **提升音乐结构分析性能:** 训练更鲁棒、更精准的音乐结构分析模型 - 🌍 **跨语言音乐理解:** 实现跨越语言壁垒的全面多语言音乐分析能力 - 🎵 **音乐风格适配性:** 增强模型在多样音乐风格与流派中的泛化能力,拓展应用场景 --- ## 📚 相关资源 - 📖 **论文:** 即将上线 - 🧑‍💻 **模型:** [SongFormer](https://huggingface.co/ASLP-lab/SongFormer) - 📊 **基准测试集:** [SongFormBench](https://huggingface.co/datasets/ASLP-lab/SongFormBench) - 💻 **代码:** [GitHub仓库](https://github.com/ASLP-lab/SongFormer) --- ## 🤝 引用格式 bibtex @misc{hao2025songformer, title = {SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision}, author = {Chunbo Hao and Ruibin Yuan and Jixun Yao and Qixin Deng and Xinyi Bai and Wei Xue and Lei Xie}, year = {2025}, eprint = {2510.02797}, archivePrefix = {arXiv}, primaryClass = {eess.AS}, url = {https://arxiv.org/abs/2510.02797} } --- ## 📧 联系与支持 🐛 **遇到问题?** 请在我们的[GitHub仓库](https://github.com/ASLP-lab/SongFormer)提交Issue 📧 **合作洽谈?** 请通过GitHub联系我们
提供机构:
maas
创建时间:
2025-09-15
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
SongFormDB是一个综合性大规模多语言音乐结构分析数据集,包含三个子集,共计超过10,000首歌曲,支持高质量标注和泛化能力提升,适用于训练先进的音乐结构分析模型。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作