Yambda-5B — A Large-Scale Multi-modal Dataset for Ranking And Retrieval
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Yambda-5B_A_Large-Scale_Multi-modal_Dataset_for_Ranking_And_Retrieval/29672570
下载链接
链接失效反馈官方服务:
资源简介:
The Yambda-5B dataset is a large-scale open database comprising 4.79 billion user-item interactions collected from 1 million users and spanning 9.39 million tracks. This release contains the 50 million‑event and 500 million‑event subsets.
Each event is time‑stamped (5 s bins) and labelled with an is_organic flag that separates organic discovery from recommendation‑driven actions, enabling counterfactual and causal studies. The package also includes look‑up tables: album_item_mapping.parquet and artist_item_mapping.parquet – link tracks with their albums and artists.
Note: audio‑embedding files are not included in this Figshare release; they remain part of the complete Yambda‑5B dataset on Hugging Face.
The full 5 B‑event version (with embeddings) is hosted on Hugging Face.
All data are released under the Apache License 2.0.
创建时间:
2025-07-30



