krishnakamath/movielens-32m-sequential-recommender
收藏Hugging Face2025-11-29 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/krishnakamath/movielens-32m-sequential-recommender
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input_sequence
dtype: string
- name: target_item
dtype: string
splits:
- name: train
num_bytes: 267855410
num_examples: 250000
- name: validation
num_bytes: 158391666
num_examples: 50000
- name: test
num_bytes: 159385395
num_examples: 50000
download_size: 240900536
dataset_size: 585632471
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
pretty_name: MovieLens 32M Sequential Recommender
size_categories:
- n<1M
source_datasets:
- movielens
---
# MovieLens 32M Sequential Recommender Dataset
This dataset is a processed version of the [MovieLens 32M dataset](https://grouplens.org/datasets/movielens/32m/), specifically formatted for sequential recommendation tasks. It contains user-item interaction sequences, enriched with rating and timestamp information, split into training, validation, and test sets.
## Dataset Structure
The dataset is provided as a `DatasetDict` with three splits: `train`, `validation`, and `test`. Each split contains:
- `input_sequence`: A string representing a user's interaction history. Each interaction is formatted as `movieId:rating:timestamp`. Sequences vary in length and starting points to provide diverse training examples.
- `target_item`: A string representing the `movieId` of the next item the user interacted with, which the model is expected to predict.
## Generation Parameters
This dataset was generated with the following parameters:
- `NUM_USERS`: 50000 (Number of unique users included in the dataset)
- `MAX_SEQUENCES_PER_USER`: 5 (Maximum number of training sequences sampled from each user's history)
These parameters are also embedded in the dataset's metadata for reproducibility.
## Citation
Please cite the original MovieLens dataset if you use this data in your research:
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872
## Acknowledgement
The Python scripts used to generate and process this dataset were developed with the assistance of Google's Gemini.
提供机构:
krishnakamath



