Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/8037432

下载链接

链接失效反馈

官方服务：

资源简介：

Nowadays, there are lots of datasets available for training and experimentation in the field of recommender systems. Specifically, in the recommendation of audiovisual content, the MovieLens dataset is a prominent example. It is focused on the user-item relationship, providing actual interaction data between users and movies. However, although movies can be described with several characteristics, this dataset only offers limited information about the movie genres. In this work, we propose enriching the MovieLens dataset by incorporating metadata available on the web (such as cast, description, keywords, etc.) and movie trailers. By leveraging the trailers, we extract audio information and generate transcriptions for each trailer, introducing a crucial textual dimension to the dataset. The audio information was extracted by the waveform and frequency analysis, followed by the application of dimensionality reduction techniques. For the transcription generation, the deep learning model Whisper was used. Finally, metadata was obtained from TMDB, and the BERT model was applied to extract embeddings. These additional attributes enrich the original dataset, providing deeper and more precise analysis. Then, the use of this extended and enhanced dataset could drive significant advancements in recommendation systems, enhancing user experiences by providing more relevant and tailored movie recommendations based on their tastes and preferences.

创建时间：

2023-06-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集