five

LAION-DISCO-12M

收藏
魔搭社区2025-11-27 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/LAION-DISCO-12M
下载链接
链接失效反馈
官方服务:
资源简介:
The LAION-DISCO-12M dataset contains 12M links to music on YouTube, inspired by the methodology of DISCO-10M. Starting from an initial seed list of artists, we can discover new artists by recursively exploring the artists listed in the "Fans might also like" section. We explore the related artists graph for as long as we are able to find new artists. For a given artist, we can extract their metadata, such as their name and number of subscribers, as well as a list of all of their songs and music videos. Importantly, each song or music video is associated with a YouTube URL (obtained from its ID). The collected metadata fields are: song_id, title, artist_names, artist_ids, album_name, album_id, isExplicit, views, duration. The authors of DISCO-10M used a seed list of 18 artists, chosen to represent a variety of genres. However, we found that this is not sufficient for exploring the artist graph of YouTube Music. Starting from this seed list, we were able to discover only 90,007 artists and 5,399,389 songs. We therefore compiled a larger seed list by considering the artists that appear on YouTube Music charts of top songs by country and genre playlists. This resulted in an initial list of 45,218 artists. The artist graph exploration starting from this seed list resulted in 250,516 artists and 12,648,485 songs. This work was inspired by [DISCO-10M](https://arxiv.org/abs/2306.13512), consider citing them if you use this dataset.

LAION-DISCO-12M 数据集(LAION-DISCO-12M dataset)包含1200万条YouTube音乐链接,其构建方法借鉴了DISCO-10M 数据集(DISCO-10M)的研究思路。 我们以初始种子艺术家列表为起点,通过递归探索“粉丝可能还喜欢”(Fans might also like)板块中列出的艺术家,持续发掘新的艺术家。只要能够找到新的艺术家,我们就会持续遍历相关艺术家图谱。 针对给定艺术家,我们可提取其元数据,包括姓名、订阅者数量,以及其全部歌曲与音乐视频列表。值得注意的是,每首歌曲或音乐视频均关联了由其ID生成的YouTube统一资源定位符(URL)。本次收集的元数据字段包括:song_id、title、artist_names、artist_ids、album_name、album_id、isExplicit、views、duration。 DISCO-10M 数据集(DISCO-10M)的研发团队选用了18位艺术家组成种子列表,旨在覆盖多元音乐流派。但我们发现,该种子列表不足以支撑对YouTube Music(YouTube Music)艺术家图谱的遍历探索。基于该种子列表,我们仅发掘出90007位艺术家与5399389首歌曲。 为此,我们通过梳理YouTube Music中按国家与流派分类的热门歌曲榜单播放列表中的艺术家,构建了规模更大的种子列表。该初始列表共包含45218位艺术家。基于此种子列表开展的艺术家图谱遍历,最终得到250516位艺术家与12648485首歌曲。 本数据集的研发灵感来源于[DISCO-10M](https://arxiv.org/abs/2306.13512),若您使用本数据集,请考虑引用该原始研究。
提供机构:
maas
创建时间:
2024-11-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作