Sleeping-DISCO-9M
收藏魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/sleeping-ai/Sleeping-DISCO-9M
下载链接
链接失效反馈官方服务:
资源简介:
# Sleeping-DISCO-9M
**Sleeping-DISCO-9M** is a large-scale foundation dataset for **generative music modeling**, featuring **9 million songs** along with associated metadata, lyric embeddings, and song IDs. These IDs backlink to the original Genius pages, where the data was sourced.
## 🔹 Dataset Structure
**Sleeping-DISCO** is split into two components:
### 1. Sleeping-DISCO-Public
- Metadata for 8.89M songs
- Lyric embeddings
- YouTube video links for each song
- YouTube video metadata
### 2. Sleeping-DISCO-Private *(restricted)*
- Full lyrics
- Genius annotations
> ⚠️ Lyrics and annotations are **not included** in the public release. Access is available **only** to verified academic or research institutions for a limited period, upon request.
To request access, please email: **[sleeping4cat@gmail.com](mailto:sleeping4cat@gmail.com)**
## 📄 Paper
Read the first-version research paper on arXiv:
👉 [https://arxiv.org/abs/2506.14293](https://arxiv.org/abs/2506.14293)
A full arXiv + conference version will be released in **2026**.
## ⚖️ License
This dataset is released under the **Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)** license.
More details: [https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en)
- ✅ **Attribution required**
- 🚫 **Non-commercial use only**
- 🚫 **No derivatives or redistribution allowed** unless by the original authors.
> For academic access to the private subset, contact: **sleeping4cat@gmail.com**
# Sleeping-DISCO-9M
**Sleeping-DISCO-9M** 是一款面向**生成式音乐建模(generative music modeling)**的大规模基础数据集,包含**900万首歌曲**及其关联元数据、歌词嵌入向量与歌曲ID。这些歌曲ID可回溯至数据来源的原始Genius页面。
## 🔹 数据集架构
**Sleeping-DISCO** 分为两个子集:
### 1. Sleeping-DISCO-Public
- 889万首歌曲的元数据
- 歌词嵌入向量
- 每首歌曲对应的YouTube视频链接
- YouTube视频元数据
### 2. Sleeping-DISCO-Private(受限访问)
- 完整歌词
- Genius平台的歌曲注释
> ⚠️ 公开版本未包含歌词与注释内容。仅对经过认证的学术或科研机构开放有限期限的访问权限,需提交申请方可获取。
申请访问请发送邮件至:**[sleeping4cat@gmail.com](mailto:sleeping4cat@gmail.com)**
## 📄 相关论文
可在arXiv平台查阅该数据集的首版研究论文:
👉 [https://arxiv.org/abs/2506.14293](https://arxiv.org/abs/2506.14293)
完整版arXiv及会议论文将于**2026年**发布。
## ⚖️ 授权协议
本数据集采用**知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议(Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International, CC BY-NC-ND 4.0)** 进行发布。
详细信息请参阅:[https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en)
- ✅ **需注明原作者署名**
- 🚫 **仅可用于非商业用途**
- 🚫 **未经原作者许可,禁止进行演绎创作或二次分发**
> 如需获取私有子集的学术访问权限,请联系:**sleeping4cat@gmail.com**
提供机构:
maas
创建时间:
2025-07-07



