five

Sleeping-DISCO-9M

收藏
魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/sleeping-ai/Sleeping-DISCO-9M
下载链接
链接失效反馈
官方服务:
资源简介:
# Sleeping-DISCO-9M **Sleeping-DISCO-9M** is a large-scale foundation dataset for **generative music modeling**, featuring **9 million songs** along with associated metadata, lyric embeddings, and song IDs. These IDs backlink to the original Genius pages, where the data was sourced. ## 🔹 Dataset Structure **Sleeping-DISCO** is split into two components: ### 1. Sleeping-DISCO-Public - Metadata for 8.89M songs - Lyric embeddings - YouTube video links for each song - YouTube video metadata ### 2. Sleeping-DISCO-Private *(restricted)* - Full lyrics - Genius annotations > ⚠️ Lyrics and annotations are **not included** in the public release. Access is available **only** to verified academic or research institutions for a limited period, upon request. To request access, please email: **[sleeping4cat@gmail.com](mailto:sleeping4cat@gmail.com)** ## 📄 Paper Read the first-version research paper on arXiv: 👉 [https://arxiv.org/abs/2506.14293](https://arxiv.org/abs/2506.14293) A full arXiv + conference version will be released in **2026**. ## ⚖️ License This dataset is released under the **Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)** license. More details: [https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en) - ✅ **Attribution required** - 🚫 **Non-commercial use only** - 🚫 **No derivatives or redistribution allowed** unless by the original authors. > For academic access to the private subset, contact: **sleeping4cat@gmail.com**

# Sleeping-DISCO-9M **Sleeping-DISCO-9M** 是一款面向**生成式音乐建模(generative music modeling)**的大规模基础数据集,包含**900万首歌曲**及其关联元数据、歌词嵌入向量与歌曲ID。这些歌曲ID可回溯至数据来源的原始Genius页面。 ## 🔹 数据集架构 **Sleeping-DISCO** 分为两个子集: ### 1. Sleeping-DISCO-Public - 889万首歌曲的元数据 - 歌词嵌入向量 - 每首歌曲对应的YouTube视频链接 - YouTube视频元数据 ### 2. Sleeping-DISCO-Private(受限访问) - 完整歌词 - Genius平台的歌曲注释 > ⚠️ 公开版本未包含歌词与注释内容。仅对经过认证的学术或科研机构开放有限期限的访问权限,需提交申请方可获取。 申请访问请发送邮件至:**[sleeping4cat@gmail.com](mailto:sleeping4cat@gmail.com)** ## 📄 相关论文 可在arXiv平台查阅该数据集的首版研究论文: 👉 [https://arxiv.org/abs/2506.14293](https://arxiv.org/abs/2506.14293) 完整版arXiv及会议论文将于**2026年**发布。 ## ⚖️ 授权协议 本数据集采用**知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议(Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International, CC BY-NC-ND 4.0)** 进行发布。 详细信息请参阅:[https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en) - ✅ **需注明原作者署名** - 🚫 **仅可用于非商业用途** - 🚫 **未经原作者许可,禁止进行演绎创作或二次分发** > 如需获取私有子集的学术访问权限,请联系:**sleeping4cat@gmail.com**
提供机构:
maas
创建时间:
2025-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作