five

Tianyi-Lab/Agentic_MovieLens

收藏
Hugging Face2026-04-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Tianyi-Lab/Agentic_MovieLens
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: data/train-* - config_name: mixed data_files: - split: train path: data_mixed/train-* - config_name: movies data_files: metadata/movies/*.parquet - config_name: users data_files: metadata/users/*.parquet dataset_info: config_name: mixed features: - name: user_id dtype: int64 - name: movie_id dtype: int64 - name: rating dtype: float64 - name: reasoning dtype: string splits: - name: train num_bytes: 12879921379 num_examples: 100000000 download_size: 5726140888 dataset_size: 12879921379 --- # Dataset Card for Agentic_Movielens ## Dataset Description This dataset contains movie ratings and related information. ## Usage ### 1. Rating & Reasoning data The dataset was sorted with regard to `user_id` and `movied_id` to support efficient query. Quick-start with the following helper function ```python from datasets import load_dataset class MovieMatrix: def __init__(self, dataset_name="Tianyi-Lab/Agentic_Movielens"): # Load in standard mode to enable memory mapping (instant access) self.ds = load_dataset(dataset_name, split="train") self.COLS = 10000 # Total movies per user def get_interaction(self, user_id: int, movie_id: int): """ Retrieves interaction in O(1) time using matrix indexing. """ # 1. Validate IDs if not (0 <= user_id <= 9999): raise ValueError("User ID must be 0-9999") if not (1 <= movie_id <= 10000): raise ValueError("Movie ID must be 1-10000") # 2. Calculate Index: (Row * Width) + Column # Note: movie_id is 1-based, so we subtract 1 to get 0-based offset index = (user_id * self.COLS) + (movie_id - 1) # 3. Direct Access return self.ds[index] # --- Usage --- matrix = MovieMatrix() # Instant lookup data = matrix.get_interaction(user_id=42, movie_id=500) print(data) ``` By default, this will load the 100M dataset generated by Qwen. To access the dataset generated by mixed models, use the following command ```python ds = load_dataset("Tianyi-Lab/Agentic_MovieLens", "mixed", split="train") ``` The mixed dataset is constructed via the following models: | Users | Model | Records | | :--- | :--- | :--- | | 0-1250 | Gemini Flash | 12.5M | | 1250-1500 | Gemini Pro | 2.5M | | 1500-2750 | GPT-5 Mini | 12.5M | | 2750-3000 | GPT-5 | 2.5M | | 3000-3540 | Claude Haiku 4.5 | 5.4M | | 3540-4000 | Qwen (default) | 4.6M | | 4000-5500 | Gemini Flash | 15M | | 5500-7000 | GPT-5 Mini | 15M | | 7000-8000 | DeepSeek v3.2 | 10M | | 8000-10000 | Qwen (default) | 20M | | **Total** | **6 models** | **100M** | ### 2. Movie Metadata Use the following function call ```python ds_movies = load_dataset("Tianyi-Lab/Agentic_MovieLens", "movies", split="train") ``` ### 3. User Metadata Use the following function call ```python ds_users = load_dataset("Tianyi-Lab/Agentic_MovieLens", "users", split="train") ``` The features are defined in `metadata/users/metadata_mappings.json`. <td>Specifically, the quiz item contains four elements representing the answer to a single question in the main self report quiz.<br/> [0] -> the position of that question in the quiz (question order was shuffled for each user).<br/> [1] -> the question ID, sey key for the text of each item<br/> [2] -> the user's response (originally 1 - 100 scale, but rounded to nearest 10 here for privacy protection)<br/> [3] -> time between question load and answer in milliseconds<br/> Answers were not included in the dataset if the answer was a skip, or the answer was done in less than 1000ms. </td> User features are defined according to [Statistical "Which Character" Personality Quiz (SWCPQ)](https://openpsychometrics.org/tests/characters/). ## Dataset Structure The dataset is provided in the `train` split and includes all collected data. ## Additional Information For questions or issues, please refer to the repository documentation.
提供机构:
Tianyi-Lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作