five

krishnakamath/movielens-32m-sequential-recommender

收藏
Hugging Face2025-11-29 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/krishnakamath/movielens-32m-sequential-recommender
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: input_sequence dtype: string - name: target_item dtype: string splits: - name: train num_bytes: 267855410 num_examples: 250000 - name: validation num_bytes: 158391666 num_examples: 50000 - name: test num_bytes: 159385395 num_examples: 50000 download_size: 240900536 dataset_size: 585632471 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* pretty_name: MovieLens 32M Sequential Recommender size_categories: - n<1M source_datasets: - movielens --- # MovieLens 32M Sequential Recommender Dataset This dataset is a processed version of the [MovieLens 32M dataset](https://grouplens.org/datasets/movielens/32m/), specifically formatted for sequential recommendation tasks. It contains user-item interaction sequences, enriched with rating and timestamp information, split into training, validation, and test sets. ## Dataset Structure The dataset is provided as a `DatasetDict` with three splits: `train`, `validation`, and `test`. Each split contains: - `input_sequence`: A string representing a user's interaction history. Each interaction is formatted as `movieId:rating:timestamp`. Sequences vary in length and starting points to provide diverse training examples. - `target_item`: A string representing the `movieId` of the next item the user interacted with, which the model is expected to predict. ## Generation Parameters This dataset was generated with the following parameters: - `NUM_USERS`: 50000 (Number of unique users included in the dataset) - `MAX_SEQUENCES_PER_USER`: 5 (Maximum number of training sequences sampled from each user's history) These parameters are also embedded in the dataset's metadata for reproducibility. ## Citation Please cite the original MovieLens dataset if you use this data in your research: F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872 ## Acknowledgement The Python scripts used to generate and process this dataset were developed with the assistance of Google's Gemini.
提供机构:
krishnakamath
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作