five

Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5552287
下载链接
链接失效反馈
官方服务:
资源简介:
This is a contextual music dataset labeled with the listening situation associated with each stream.  Each stream is composed of the user, track, and device data labelled with a situation. The dataset is collected from Deezer for the period of August 2019 from France and Brazil. The dataset is composed of 3 subsets of situations corresponding to 4, 8, and 12 different situations.  The situations are extracted based on keyword matching with the associated playlist title in the Deezer catalog. The full set of situational tags are: "work, gym, party, sleep | morning, run, night, dance | car, train, relax, club". Each instance contains the track/user/deviice triplets, and a situational tag indicating that this user listens to the track in the associated situation wth the corresponding data recieved from the device. The device data contain: "linear-time, linear-day, circular-time X, circular-time Y,circular-day X, circular-day Y, device-type, network-type". The users are represented as embeddings based on their listening history computed through the matrix factorization of the user/track matrix. Additionally, the users are also represented with their demographic data of : "age, country, gender". The creation of the dataset and our experimental results are described in the paper: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, and Gaël Richard. "Audio Autotagging as Proxy for Contextual MusicRecommendation" [Under Revision]. The source code of the paper is available here: https://github.com/KarimMibrahim/Situational_Session_Generator.git The dataset is composed of the media_id which is the ID of the track in the Deezer catalog. The 30 seconds track previews used to train the model in the paper can be accessed through the Deezer API: https://developers.deezer.com/api. Each user is represented with an anonymized user_id which is associated with the user embedding available in the user_embeddings.npy file. Note: The index of the embeddings in the user_embeddings arrary corresponds to the user_id, i.e. user_id = 100 have its embeddings at  user_embeddings[100].  Finally, the dataset also contains the splits used in our experiments. Our splits were conditioned by one of three conditions: ColdTrack (no overlap of tracks between the splits), ColdUser (no overlap of users between the splits), and WarmCase (overlaps allowed). Each condition is split into 4 subsets for cross-validation marked with a "fold" number in each condition.
创建时间:
2021-10-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作