jamesding0302/memgen-annotations

Name: jamesding0302/memgen-annotations
Creator: jamesding0302
Published: 2026-03-25 04:12:04
License: 暂无描述

Hugging Face2026-03-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/jamesding0302/memgen-annotations

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - other configs: - config_name: AmazonReviews2014-Beauty data_files: - split: val path: AmazonReviews2014-Beauty/val.jsonl - split: test path: AmazonReviews2014-Beauty/test.jsonl - config_name: AmazonReviews2014-Sports_and_Outdoors data_files: - split: val path: AmazonReviews2014-Sports_and_Outdoors/val.jsonl - split: test path: AmazonReviews2014-Sports_and_Outdoors/test.jsonl - config_name: AmazonReviews2023-Industrial_and_Scientific data_files: - split: val path: AmazonReviews2023-Industrial_and_Scientific/val.jsonl - split: test path: AmazonReviews2023-Industrial_and_Scientific/test.jsonl - config_name: AmazonReviews2023-Musical_Instruments data_files: - split: val path: AmazonReviews2023-Musical_Instruments/val.jsonl - split: test path: AmazonReviews2023-Musical_Instruments/test.jsonl - config_name: AmazonReviews2023-Office_Products data_files: - split: val path: AmazonReviews2023-Office_Products/val.jsonl - split: test path: AmazonReviews2023-Office_Products/test.jsonl - config_name: Steam data_files: - split: val path: Steam/val.jsonl - split: test path: Steam/test.jsonl - config_name: Yelp-Yelp_2020 data_files: - split: val path: Yelp-Yelp_2020/val.jsonl - split: test path: Yelp-Yelp_2020/test.jsonl --- # MemGen Annotations This is the annotation dataset for the paper **[How Well Does Generative Recommendation Generalize?](https://huggingface.co/papers/2603.19809)**. <a href="https://huggingface.co/papers/2603.19809"><img src="https://img.shields.io/badge/Paper-ArXiv-red"></a> <a href="https://github.com/Jamesding000/MemGen-GR"><img src="https://img.shields.io/badge/Code-GitHub-green"></a> <a href="https://huggingface.co/jamesding0302/memgen-checkpoints"><img src="https://img.shields.io/badge/Models-Hugging%20Face-blue"></a> The annotations categorize evaluation instances under the leave-one-out protocol: - **test** split uses the **last** item in the user history sequence as target, - **val** split uses the **second-to-last** item as target. ## Columns - `sample_id`: row index within the split in the original dataset. - `user_id`: raw user identifier (join key). - `master`: one of `memorization`, `generalization`, `uncategorized`. - `subcategories`: list of `{rule, hop}` for fine-grained generalization types. - `all_labels`: all string labels (e.g., `["generalization", "symmetry_3"]`). ## Load in M&G annotations ```python from datasets import load_dataset labels = load_dataset( "jamesding0302/memgen-annotations", "AmazonReviews2014-Beauty", split="test", ) print(labels[0]) ``` ## Merge with processed dataset ```python # 1) Load your processed dataset split (must be aligned with labels by row order) ds = pipeline.split_datasets["test"] # 2) Append label columns to the original dataset ds = (ds .add_column("master", labels["master"]) .add_column("subcategories", labels["subcategories"]) .add_column("all_labels", labels["all_labels"])) ```

提供机构：

jamesding0302

5,000+

优质数据集

54 个

任务类型

进入经典数据集