SongFormDB

Name: SongFormDB
Creator: maas
Published: 2025-12-04 16:50:03
License: 暂无描述

魔搭社区2025-12-04 更新2025-09-27 收录

下载链接：

https://modelscope.cn/datasets/ASLP-lab/SongFormDB

下载链接

链接失效反馈

官方服务：

资源简介：

# SongFormDB 🎵 [English ｜ [中文](README_ZH.md)] **A Large-Scale Multilingual Music Structure Analysis Dataset for Training [SongFormer](https://huggingface.co/ASLP-lab/SongFormer) 🚀** <div align="center"> ![Python](https://img.shields.io/badge/Python-3.10-brightgreen) ![License](https://img.shields.io/badge/License-CC%20BY%204.0-lightblue) [![arXiv Paper](https://img.shields.io/badge/arXiv-2510.02797-blue)](https://arxiv.org/abs/2510.02797) [![GitHub](https://img.shields.io/badge/GitHub-SongFormer-black)](https://github.com/ASLP-lab/SongFormer) [![HuggingFace Space](https://img.shields.io/badge/HuggingFace-space-yellow)](https://huggingface.co/spaces/ASLP-lab/SongFormer) [![HuggingFace Model](https://img.shields.io/badge/HuggingFace-model-blue)](https://huggingface.co/ASLP-lab/SongFormer) [![Dataset SongFormDB](https://img.shields.io/badge/HF%20Dataset-SongFormDB-green)](https://huggingface.co/datasets/ASLP-lab/SongFormDB) [![Dataset SongFormBench](https://img.shields.io/badge/HF%20Dataset-SongFormBench-orange)](https://huggingface.co/datasets/ASLP-lab/SongFormBench) [![Discord](https://img.shields.io/badge/Discord-join%20us-purple?logo=discord&logoColor=white)](https://discord.gg/p5uBryC4Zs) [![lab](https://img.shields.io/badge/🏫-ASLP-grey?labelColor=lightgrey)](http://www.npu-aslp.org/) </div> <div align="center"> <h3> Chunbo Hao1*, Ruibin Yuan2,5*, Jixun Yao1, Qixin Deng3,5, Xinyi Bai4,5, Wei Xue2, Lei Xie1† </h3> *Equal contribution    †Corresponding author 1Audio, Speech and Language Processing Group (ASLP@NPU), Northwestern Polytechnical University 2Hong Kong University of Science and Technology 3Northwestern University 4Cornell University 5Multimodal Art Projection (M-A-P) </div> --- ## 🌟 What is SongFormDB? SongFormDB is a **comprehensive, large-scale, multilingual dataset** designed to revolutionize Music Structure Analysis (MSA). This dataset serves as the training foundation for our state-of-the-art SongFormer model, providing unprecedented scale and diversity for MSA research. --- ## ✨ Key Highlights ### 🎯 **Three Powerful Subsets** #### 🎸 **SongForm-HX (HX)** - *Precision & Quality* - ✅ **Rule-corrected HarmonixSet** with improved annotation accuracy - 🎛️ **Custom BigVGAN vocoder** trained on internal data for superior mel spectrogram reconstruction - 📊 **Unified train/validation/test splits** for consistent evaluation #### 🎵 **SongForm-Hook (H)** - *Scale & Diversity* - 🎼 **5,933 songs** with precise structural annotations - 🌍 Helps improve the model's **generalization ability** #### 💎 **SongForm-Gem (G)** - *Global Coverage* - 🌐 **47 different languages** for true multilingual coverage - 🎶 **Diverse BPMs and musical styles** for comprehensive training - 🤖 **Gemini-annotated** with strong performance on ACC and HR3F metrics - 🎯 **4,387 high-quality songs** with music structure analysis --- ## 📊 Dataset Composition ### 🎸 SongForm-HX (HX) - 712 Songs Enhanced HarmonixSet with rule-based corrections and unified evaluation protocol. **Data Location:** `data/HX/SongFormDB-HX.jsonl` | Field | Description | |-------|-------------| | `id` | Unique song identifier | | `youtube_url` | Original YouTube source (⚠️ Note: May differ from HarmonixSet audio) | | `split` | Dataset split (`train`/`val`) | | `subset` | Always "HX" | | `duration` | Total song duration in seconds | | `mel_path` | Path to mel spectrogram file | | `label_path` | Path to structural annotation file | | `labels` | JSON-formatted structural information | ### 🎵 SongForm-Hook (H) - 5,933 Songs Large-scale dataset with precise structural annotations for enhanced generalization. **Data Location:** `data/Hook/SongFormDB-Hook.jsonl` | Field | Description | |-------|-------------| | `id` | Unique song identifier | | `youtube_url` | YouTube source URL | | `split` | Always `train` | | `subset` | Always "Hook" | | `duration` | Total song duration | | `mel_path` | Mel spectrogram file path | | `start` | Segment start time | | `end` | Segment end time | | `label` | List of structural labels for this segment | **⚠️ Important Notes:** - Each row corresponds to a structurally annotated segment - One song may have multiple annotation rows - Labels are provided as lists (multi-label support) ### 💎 SongForm-Gem (G) - 4,387 Songs Globally diverse dataset with Gemini-powered annotations across 47 languages. **Data Location:** `data/Gem/SongFormDB-Gem.jsonl` **⚠️ Important Notes:** - Some YouTube links might be inactive, so the actual number of available samples is slightly reduced. - Format similar to SongForm-HX - YouTube URLs correspond to actual used data - Gaps between segments labeled as `NO_LABEL` due to Gemini's time resolution limitations --- ## 🚀 Quick Start ### Download Options You can speed up the download by skipping the `mels` folder and downloading other parts you need. ### Getting the Audio Files The dataset contains annotations only. To get the actual audio files, follow these instructions based on the dataset version: #### SongForm-HX You have two options: **Option 1 (Recommended): Audio Reconstruction** - Use the mel-spectrograms provided in the official HarmonixSet dataset, which are also included in this repository. - Follow the `Audio Reconstruction` steps described later in this document **Option 2: YouTube Download** - Download songs from YouTube using [*this list*](https://github.com/urinieto/harmonixset/blob/main/dataset/youtube_urls.csv) - **Important:** Pay attention to the notes in brackets after each link - YouTube versions may differ from the original HarmonixSet - If needed, you can align the audio using: [*Reference code*](https://github.com/urinieto/harmonixset/blob/main/notebooks/Audio%20Alignment.ipynb) and mel-spectrograms from the HarmonixSet README - **Note:** Alignment may cause audio discontinuities, so Option 1 is preferred #### SongForm-Hook (H) and SongForm-Gem (G) Choose either method: - **Direct download from YouTube** (better quality) - **Use a vocoder** to reconstruct from mel-spectrograms (may have lower quality) --- ## 🎼 Audio Reconstruction If YouTube sources become unavailable, reconstruct audio using mel spectrograms: ### For SongForm-HX: ```bash # Clone BigVGAN repository git clone https://github.com/NVIDIA/BigVGAN.git cd utils/HarmonixSet # Update BIGVGAN_REPO_DIR in inference_e2e.sh bash inference_e2e.sh ``` ### For SongForm-Hook & SongForm-Gem: Use [bigvgan_v2_44khz_128band_256x](https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x): ```python # Add BigVGAN to PYTHONPATH, then: # See implementation in utils/CN/infer.py ``` --- ## 📈 Impact & Applications - 🎯 **Enhanced MSA Performance:** Train more robust and accurate music structure analysis models - 🌍 **Cross-lingual Music Understanding:** Enable comprehensive multilingual music analysis capabilities that transcend language barriers - 🎵 **Genre Adaptability:** Strengthen model generalization across diverse musical styles and genres for broader applicability --- ## 📚 Resources - 📖 **Paper:** Coming Soon - 🧑‍💻 **Model:** [SongFormer](https://huggingface.co/ASLP-lab/SongFormer) - 📊 **Benchmark:** [SongFormBench](https://huggingface.co/datasets/ASLP-lab/SongFormBench) - 💻 **Code:** [GitHub Repository](https://github.com/ASLP-lab/SongFormer) --- ## 🤝 Citation ```bibtex @misc{hao2025songformer, title = {SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision}, author = {Chunbo Hao and Ruibin Yuan and Jixun Yao and Qixin Deng and Xinyi Bai and Wei Xue and Lei Xie}, year = {2025}, eprint = {2510.02797}, archivePrefix = {arXiv}, primaryClass = {eess.AS}, url = {https://arxiv.org/abs/2510.02797} } ``` --- ## 📧 Contact & Support 🐛 **Issues?** Open an issue on our [GitHub repository](https://github.com/ASLP-lab/SongFormer) 📧 **Collaboration?** Contact us through GitHub

# SongFormDB 🎵 [English ｜ [中文](README_ZH.md)] **面向训练[SongFormer](https://huggingface.co/ASLP-lab/SongFormer)的大规模多语言音乐结构分析数据集** 🚀 <div align="center"> ![Python](https://img.shields.io/badge/Python-3.10-brightgreen) ![License](https://img.shields.io/badge/License-CC%20BY%204.0-lightblue) [![arXiv Paper](https://img.shields.io/badge/arXiv-2510.02797-blue)](https://arxiv.org/abs/2510.02797) [![GitHub](https://img.shields.io/badge/GitHub-SongFormer-black)](https://github.com/ASLP-lab/SongFormer) [![HuggingFace Space](https://img.shields.io/badge/HuggingFace-space-yellow)](https://huggingface.co/spaces/ASLP-lab/SongFormer) [![HuggingFace Model](https://img.shields.io/badge/HuggingFace-model-blue)](https://huggingface.co/ASLP-lab/SongFormer) [![Dataset SongFormDB](https://img.shields.io/badge/HF%20Dataset-SongFormDB-green)](https://huggingface.co/datasets/ASLP-lab/SongFormDB) [![Dataset SongFormBench](https://img.shields.io/badge/HF%20Dataset-SongFormBench-orange)](https://huggingface.co/datasets/ASLP-lab/SongFormBench) [![Discord](https://img.shields.io/badge/Discord-join%20us-purple?logo=discord&logoColor=white)](https://discord.gg/p5uBryC4Zs) [![lab](https://img.shields.io/badge/🏫-ASLP-grey?labelColor=lightgrey)](http://www.npu-aslp.org/) </div> <div align="center"> <h3> 郝春博1*, 袁瑞彬2,5*, 姚吉勋1, 邓启欣3,5, 白欣怡4,5, 薛巍2, 谢磊1† </h3> *共同第一作者    †通讯作者 1音频、语音与语言处理小组（ASLP@NPU）， 西北工业大学 2香港科技大学 3美国西北大学 4康奈尔大学 5多模态艺术投影实验室（M-A-P） </div> --- ## 🌟 什么是SongFormDB？ SongFormDB是一款**综合性、大规模多语言数据集**，旨在革新音乐结构分析（Music Structure Analysis, MSA）领域。本数据集作为当前领先的SongFormer模型的训练基础，为音乐结构分析研究提供了前所未有的规模与多样性。 --- ## ✨ 核心亮点 ### 🎯 **三大优质子集** #### 🎸 **SongForm-HX（HX）—— 精准高质** - ✅ **经过规则修正的HarmonixSet**，提升了标注准确率 - 🎛️ **基于内部数据训练的定制化BigVGAN声码器**，可实现更优质的梅尔频谱图重建 - 📊 **统一的训练/验证/测试划分**，保障评估一致性 #### 🎵 **SongForm-Hook（H）—— 规模多元** - 🎼 **5933首歌曲，均带有精确的结构标注** - 🌍 有效提升模型的**泛化能力** #### 💎 **SongForm-Gem（G）—— 全球覆盖** - 🌐 **涵盖47种不同语言**，实现真正的多语言覆盖 - 🎶 **丰富的节拍速度（BPM）与音乐风格**，支持全面训练 - 🤖 **由Gemini标注**，在ACC与HR3F指标上表现优异 - 🎯 **4387首高质量歌曲，带有音乐结构分析标注** --- ## 📊 数据集构成 ### 🎸 SongForm-HX（HX）—— 712首歌曲基于HarmonixSet优化而来，采用基于规则的修正与统一评估协议。 **数据存储路径：** `data/HX/SongFormDB-HX.jsonl` | 字段名 | 说明 | |-------|-------------| | `id` | 唯一歌曲标识符 | | `youtube_url` | 原始YouTube来源（⚠️ 注意：可能与HarmonixSet的音频来源不一致） | | `split` | 数据集划分（`train`/`val`） | | `subset` | 固定为“HX” | | `duration` | 歌曲总时长（单位：秒） | | `mel_path` | 梅尔频谱图文件路径 | | `label_path` | 结构标注文件路径 | | `labels` | JSON格式的结构信息 | ### 🎵 SongForm-Hook（H）—— 5933首歌曲大规模数据集，带有精确的结构标注，可提升模型泛化能力。 **数据存储路径：** `data/Hook/SongFormDB-Hook.jsonl` | 字段名 | 说明 | |-------|-------------| | `id` | 唯一歌曲标识符 | | `youtube_url` | YouTube来源链接 | | `split` | 固定为`train` | | `subset` | 固定为“Hook” | | `duration` | 歌曲总时长 | | `mel_path` | 梅尔频谱图文件路径 | | `start` | 片段起始时间 | | `end` | 片段结束时间 | | `label` | 当前片段的结构标签列表 | **⚠️ 重要说明：** - 每一行对应一个带有结构标注的音频片段 - 单首歌曲可能对应多条标注行 - 标签以列表形式提供（支持多标签） ### 💎 SongForm-Gem（G）—— 4387首歌曲全球多元化数据集，由Gemini完成标注，涵盖47种语言。 **数据存储路径：** `data/Gem/SongFormDB-Gem.jsonl` **⚠️ 重要说明：** - 部分YouTube链接可能已失效，实际可用样本数量略有减少 - 数据格式与SongForm-HX类似 - YouTube链接对应实际使用的音频来源 - 由于Gemini的时间分辨率限制，未标注的片段间隙将标记为`NO_LABEL` --- ## 🚀 快速上手 ### 下载选项您可跳过`mels`文件夹，按需下载其他组件，以加快下载速度。 ### 获取音频文件本数据集仅包含标注信息。如需获取原始音频文件，请根据数据集版本遵循以下步骤： #### SongForm-HX 您有两种选择： **方案1（推荐）：音频重建** - 使用官方HarmonixSet数据集提供的梅尔频谱图，本仓库也已包含相关文件 - 遵循本文档后续的「音频重建」步骤操作 **方案2：YouTube下载** - 通过[此列表](https://github.com/urinieto/harmonixset/blob/main/dataset/youtube_urls.csv)从YouTube下载歌曲 - **重要提示：** 请注意每个链接后的括号注释 - YouTube版本可能与原始HarmonixSet音频存在差异 - 如有需要，可通过[参考代码](https://github.com/urinieto/harmonixset/blob/main/notebooks/Audio%20Alignment.ipynb)进行音频对齐，并参考HarmonixSet README中的梅尔频谱图说明 - **注意：** 音频对齐可能导致音频断连，因此推荐使用方案1 #### SongForm-Hook（H）与SongForm-Gem（G）可任选以下方式： - **直接从YouTube下载**（音质更佳） - **使用声码器**从梅尔频谱图重建音频（音质可能较低） --- ## 🎼 音频重建若YouTube来源失效，可通过梅尔频谱图重建音频： ### 针对SongForm-HX： bash # 克隆BigVGAN仓库 git clone https://github.com/NVIDIA/BigVGAN.git cd utils/HarmonixSet # 更新inference_e2e.sh中的BIGVGAN_REPO_DIR路径 bash inference_e2e.sh ### 针对SongForm-Hook与SongForm-Gem：使用[bigvgan_v2_44khz_128band_256x](https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x)： python # 将BigVGAN添加至PYTHONPATH，随后： # 具体实现请参考utils/CN/infer.py --- ## 📈 应用价值与影响 - 🎯 **提升音乐结构分析性能：** 训练更鲁棒、更精准的音乐结构分析模型 - 🌍 **跨语言音乐理解：** 实现跨越语言壁垒的全面多语言音乐分析能力 - 🎵 **音乐风格适配性：** 增强模型在多样音乐风格与流派中的泛化能力，拓展应用场景 --- ## 📚 相关资源 - 📖 **论文：** 即将上线 - 🧑‍💻 **模型：** [SongFormer](https://huggingface.co/ASLP-lab/SongFormer) - 📊 **基准测试集：** [SongFormBench](https://huggingface.co/datasets/ASLP-lab/SongFormBench) - 💻 **代码：** [GitHub仓库](https://github.com/ASLP-lab/SongFormer) --- ## 🤝 引用格式 bibtex @misc{hao2025songformer, title = {SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision}, author = {Chunbo Hao and Ruibin Yuan and Jixun Yao and Qixin Deng and Xinyi Bai and Wei Xue and Lei Xie}, year = {2025}, eprint = {2510.02797}, archivePrefix = {arXiv}, primaryClass = {eess.AS}, url = {https://arxiv.org/abs/2510.02797} } --- ## 📧 联系与支持 🐛 **遇到问题？** 请在我们的[GitHub仓库](https://github.com/ASLP-lab/SongFormer)提交Issue 📧 **合作洽谈？** 请通过GitHub联系我们

提供机构：

maas

创建时间：

2025-09-15

搜集汇总

数据集介绍