Yambda-5B — A Large-Scale Multi-modal Dataset for Ranking And Retrieval
收藏Figshare2025-07-30 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Yambda-5B_A_Large-Scale_Multi-modal_Dataset_for_Ranking_And_Retrieval/29672570
下载链接
链接失效反馈官方服务:
资源简介:
The Yambda-5B dataset is a large-scale open database comprising 4.79 billion user-item interactions collected from 1 million users and spanning 9.39 million tracks. This release contains the 50 million‑event and 500 million‑event subsets.Each event is time‑stamped (5 s bins) and labelled with an is_organic flag that separates organic discovery from recommendation‑driven actions, enabling counterfactual and causal studies. The package also includes look‑up tables: album_item_mapping.parquet and artist_item_mapping.parquet – link tracks with their albums and artists.Note: audio‑embedding files are not included in this Figshare release; they remain part of the complete Yambda‑5B dataset on Hugging Face.The full 5 B‑event version (with embeddings) is hosted on Hugging Face.All data are released under the Apache License 2.0.
创建时间:
2025-07-30



