five

Yambda-5B — A Large-Scale Multi-modal Dataset for Ranking And Retrieval

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Yambda-5B_A_Large-Scale_Multi-modal_Dataset_for_Ranking_And_Retrieval/29672570
下载链接
链接失效反馈
官方服务:
资源简介:
The Yambda-5B dataset is a large-scale open database comprising 4.79 billion user-item interactions collected from 1 million users and spanning 9.39 million tracks. This release contains the 50 million‑event and 500 million‑event subsets. Each event is time‑stamped (5 s bins) and labelled with an is_organic flag that separates organic discovery from recommendation‑driven actions, enabling counterfactual and causal studies. The package also includes look‑up tables: album_item_mapping.parquet and artist_item_mapping.parquet – link tracks with their albums and artists. Note: audio‑embedding files are not included in this Figshare release; they remain part of the complete Yambda‑5B dataset on Hugging Face. The full 5 B‑event version (with embeddings) is hosted on Hugging Face. All data are released under the Apache License 2.0.
创建时间:
2025-07-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作