five

jamesding0302/memgen-annotations

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jamesding0302/memgen-annotations
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - other configs: - config_name: AmazonReviews2014-Beauty data_files: - split: val path: AmazonReviews2014-Beauty/val.jsonl - split: test path: AmazonReviews2014-Beauty/test.jsonl - config_name: AmazonReviews2014-Sports_and_Outdoors data_files: - split: val path: AmazonReviews2014-Sports_and_Outdoors/val.jsonl - split: test path: AmazonReviews2014-Sports_and_Outdoors/test.jsonl - config_name: AmazonReviews2023-Industrial_and_Scientific data_files: - split: val path: AmazonReviews2023-Industrial_and_Scientific/val.jsonl - split: test path: AmazonReviews2023-Industrial_and_Scientific/test.jsonl - config_name: AmazonReviews2023-Musical_Instruments data_files: - split: val path: AmazonReviews2023-Musical_Instruments/val.jsonl - split: test path: AmazonReviews2023-Musical_Instruments/test.jsonl - config_name: AmazonReviews2023-Office_Products data_files: - split: val path: AmazonReviews2023-Office_Products/val.jsonl - split: test path: AmazonReviews2023-Office_Products/test.jsonl - config_name: Steam data_files: - split: val path: Steam/val.jsonl - split: test path: Steam/test.jsonl - config_name: Yelp-Yelp_2020 data_files: - split: val path: Yelp-Yelp_2020/val.jsonl - split: test path: Yelp-Yelp_2020/test.jsonl --- # MemGen Annotations This is the annotation dataset for the paper **[How Well Does Generative Recommendation Generalize?](https://huggingface.co/papers/2603.19809)**. <a href="https://huggingface.co/papers/2603.19809"><img src="https://img.shields.io/badge/Paper-ArXiv-red"></a> <a href="https://github.com/Jamesding000/MemGen-GR"><img src="https://img.shields.io/badge/Code-GitHub-green"></a> <a href="https://huggingface.co/jamesding0302/memgen-checkpoints"><img src="https://img.shields.io/badge/Models-Hugging%20Face-blue"></a> The annotations categorize evaluation instances under the leave-one-out protocol: - **test** split uses the **last** item in the user history sequence as target, - **val** split uses the **second-to-last** item as target. ## Columns - `sample_id`: row index within the split in the original dataset. - `user_id`: raw user identifier (join key). - `master`: one of `memorization`, `generalization`, `uncategorized`. - `subcategories`: list of `{rule, hop}` for fine-grained generalization types. - `all_labels`: all string labels (e.g., `["generalization", "symmetry_3"]`). ## Load in M&G annotations ```python from datasets import load_dataset labels = load_dataset( "jamesding0302/memgen-annotations", "AmazonReviews2014-Beauty", split="test", ) print(labels[0]) ``` ## Merge with processed dataset ```python # 1) Load your processed dataset split (must be aligned with labels by row order) ds = pipeline.split_datasets["test"] # 2) Append label columns to the original dataset ds = (ds .add_column("master", labels["master"]) .add_column("subcategories", labels["subcategories"]) .add_column("all_labels", labels["all_labels"])) ```
提供机构:
jamesding0302
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作