THEATLAS/PENS
收藏Hugging Face2024-10-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/THEATLAS/PENS
下载链接
链接失效反馈官方服务:
资源简介:
PENS(个性化新闻标题)是一个专为个性化新闻标题生成研究设计的英文数据集。该数据集分为训练集和测试集,以支持模型的开发和评估。训练集包含约113k篇英文新闻文章和来自超过445k用户的500k条印象日志,每篇新闻文章包括标题、正文、类别和相关实体。测试集由103名英语母语者手动创建,包含超过100k个个性化新闻标题。PENS通过安全哈希匿名化用户ID来保护用户隐私。
PENS (PErsonalized News headlineS) is an English dataset tailored for Personalized News Headline Generation research. The dataset is divided into training and test sets to support both model development and evaluation. The training set contains approximately 113k English news articles across 15 categories and 500k impression logs from over 445k users. Each news article includes a title, body, category, and associated entities. The test set, manually created by 103 native English speakers, includes over 100k personalized news headlines. PENS ensures user privacy by anonymizing user IDs through secure hashing.
提供机构:
THEATLAS
搜集汇总
数据集介绍

背景与挑战
背景概述
PENS is an English dataset designed for Personalized News Headline Generation, featuring 113k news articles and 500k user impression logs for training, and over 100k manually-created personalized headlines for testing. It supports research in tailoring news content to individual preferences while ensuring user privacy through anonymization.
以上内容由遇见数据集搜集并总结生成



