five

Yambda-5B — A Large-Scale Multi-modal Dataset for Ranking And Retrieval

收藏
Figshare2025-07-30 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Yambda-5B_A_Large-Scale_Multi-modal_Dataset_for_Ranking_And_Retrieval/29672570/1
下载链接
链接失效反馈
官方服务:
资源简介:
The Yambda-5B dataset is a large-scale open database comprising 4.79 billion user-item interactions collected from 1 million users and spanning 9.39 million tracks. This release contains the 50 million‑event and 500 million‑event subsets.Each event is time‑stamped (5 s bins) and labelled with an <i>is_organic</i> flag that separates organic discovery from recommendation‑driven actions, enabling counterfactual and causal studies. The package also includes look‑up tables: <i>album_item_mapping.parquet</i> and <i>artist_item_mapping.parquet</i> – link tracks with their albums and artists.Note: audio‑embedding files are not included in this Figshare release; they remain part of the complete Yambda‑5B dataset on Hugging Face.The full 5 B‑event version<b> </b>(with embeddings) is hosted on Hugging Face.All data are released under the Apache License 2.0.
提供机构:
Krofto, Eugene; Pismenny, Alexey; Savushkin, Nikolay; Tytskiy, Vladislav; Burlakov, Daniil; Taychinov, Evgeny; Ploshkin, Alexander; Baikalov, Vladimir; Permiakov, Artem
创建时间:
2025-07-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作