Mureka-384K
收藏魔搭社区2026-01-02 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/sleeping-ai/Mureka-384K
下载链接
链接失效反馈官方服务:
资源简介:
<h1 align='center'>Mureka-384K</h1>
<div align="center">
<img src="mureka.gif" alt="loading", width="100">
</div>
Excited to share, Mureka-384K. World's first dataset for Music reasoning model generated AI songs. It is the largest of known its category till this date.
### Why Mureka AI?
It is a sister concern of Skyworks AI, who built world's first Music reasoning models called Mureka O1 and Mureka V6. Currently, they have V7. Mureka is a different model from SUNO and its other counterparts as it is a reasoning model which uses CoT (Chain of Thoughts) to remix and generate its music and the quality of Mureka AI's songs is nearly impossible to differentiate between real and AI generated songs.
While the corpus is limited and not as vast as SUNO and other datasets of Sleeping-Imagination family but it is the most exciting and impressive one.
### How many songs we provide and metadata?
384K songs and the metadata are following
1. `uuid`: tracing each individual song
2. `song_id`: each individual song has an unique identifier
3. `title`: song title
4. `version`: version of the song
5. `duration in milliseconds`: telling how long is the actual song
6. `generated_at`: telling when the song was made (timestamp)
7. `genres`: telling the song style
8. `moods`: it has to be first. We provide mood tags to tell what the song actually feels like
9. `model`: model version number
10. `audio_url`: LINK to audio
11. `video_url`: LINK to video
We provide more metadata but those are not relevant for training models and research.
### is this part of a paper?
Yes, we plan to include all these datasets under Sleeping-Imagination initiative to provide the largest and most robust collection of music and its metadata.
### is this multilingual dataset?
Yes, we provide Korean, Chinese and Japanese. (+ English)
### Ethics statement
We have downloaded the data as a responsible internet user and compiled them under local and EU laws for scientific and research purpose.
### LICENCE
We are releasing this under restricted Licence of CC-by-nc-nd 4.0. That means
1. Nobody is allowed to copy and share derivatives of this dataset.
2. You need the explicit permission of Sleeping AI to modify this dataset and we limit this for research use.
### Acknowledgements
We want to thank one of our contributors Azeem, who helped us in identifying this resource and a few hints on compilation.
<h1 align='center'>Mureka-384K</h1>
<div align="center">
<img src="mureka.gif" alt="加载中", width="100">
</div>
十分荣幸向大家发布Mureka-384K。这是全球首个面向音乐推理模型生成的AI歌曲的数据集,也是截至目前该类别中规模最大的数据集。
### 为何选择Mureka AI?
该项目隶属于Skyworks AI旗下,其团队曾打造全球首个音乐推理模型Mureka O1与Mureka V6,目前已迭代至V7版本。Mureka模型与SUNO及其他同类产品不同,它是采用思维链(Chain of Thoughts,CoT)进行音乐混音与生成的推理模型,其产出的歌曲音质几乎可以以假乱真,难以区分真人演唱与AI生成作品。
尽管该数据集的语料规模不及SUNO以及Sleeping-Imagination系列的其他数据集,但它却是同类数据集中最令人振奋且极具价值的一款。
### 数据集规模与元数据说明
本次共提供38.4万首歌曲,其元数据如下:
1. `uuid`:用于追踪每一首独立歌曲的唯一标识
2. `song_id`:每首歌曲的专属标识符
3. `title`:歌曲标题
4. `version`:歌曲版本号
5. `duration in milliseconds`:歌曲实际时长(单位:毫秒)
6. `generated_at`:歌曲生成时间(时间戳格式)
7. `genres`:歌曲风格类型
8. `moods`:为首要标注项,我们提供情绪标签以描述歌曲的整体情感氛围
9. `model`:所用模型的版本号
10. `audio_url`:音频文件下载链接
11. `video_url`:视频文件下载链接
此外我们还提供其他元数据,但这些内容对于模型训练与学术研究而言并无实际价值。
### 是否收录于学术论文?
是的,我们计划将该数据集纳入Sleeping-Imagination项目体系,该项目旨在打造规模最大、最完善的音乐及其元数据合集。
### 是否为多语言数据集?
是的,我们提供韩语、中文、日语(外加英语)版本的歌曲。
### 伦理声明
我们以负责任的互联网用户身份下载相关数据,并依据本地及欧盟相关法律法规完成数据汇编,仅用于科学研究用途。
### 许可协议
本数据集采用CC-by-nc-nd 4.0限制性许可协议进行发布,具体条款如下:
1. 严禁对本数据集进行复制、分享或制作衍生作品
2. 如需修改本数据集,需获得Sleeping AI的明确许可,且仅允许用于学术研究用途。
### 致谢
在此我们感谢贡献者Azeem,他协助我们确认了本次数据集的相关资源,并为数据汇编工作提供了若干关键建议。
提供机构:
maas
创建时间:
2025-07-07



