five

MovieLens 20M Posters and Subtitles Multi-modal

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14571725
下载链接
链接失效反馈
官方服务:
资源简介:
Multi-modal composite dataset derived from the well-established MovieLens 20M dataset, which provides 20 million movie ratings and tagging activities collected through the MovieLens project. While MovieLens 20M is rich in user-movie interaction data, it lacks multi-modal characteristics. To address this limitation, we have enhanced the dataset by integrating additional modalities from complementary sources. Visual data, in the form of movie posters, was obtained from the PosterLens 25M dataset , which associates MovieLens movies with corresponding poster images and precomputed ResNet-34 embeddings. Textual data was introduced through subtitles sourced from the Sublens-20M dataset, which provides detailed subtitle files for 71\% of the movies in MovieLens 20M and covers 98\% of user interactions. Graph data, including comprehensive cast and crew information, was incorporated from The Movies Dataset, originally extracted from The Movie Database (TMDB)\footnote{\url{https://www.themoviedb.org}}, to provide detailed contextual insights into each movie All modalities have been referenced directly to the MovieLens movieId.
创建时间:
2024-12-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作