five

PersonalLab/PersonalSum

收藏
Hugging Face2024-06-13 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/PersonalLab/PersonalSum
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - summarization language: - 'no' pretty_name: PersonalSum --- # PersonalSum: A User-Subjective Guided Personalized Summarization Dataset PersonalSum is a dataset designed to support research in the domain of personalized textual summarization. It offers high quality, manually annotated news summaries that reflect individual users preferences and focuses. The dataset is constructed to facilitate the development of personalized summarization models, filling the gap in existing research, which often relies on generic summaries or pseudo datasets. PersonalSum allows for the exploration of how personal interests and preferences can be incorporated into summarization tasks. ## Functions of the Dataset 1. **Personalized Summarization**: Facilitates the creation of summaries that align with individual user preferences by incorporating user profiles and personalized annotations. 2. **Generic Summarization**: Includes machine generated summaries for comparative analysis with personalized summaries. ## Dataset Structure The dataset consists of two primary CSV files, each serving distinct purposes: 1. **PersonalSum_original.csv**: The original dataset with personalized summaries created by human annotators reflecting their personal interests and preferences. This file also includes user profiles and the source sentences from the articles. 2. **Topic_centric_PersonalSum.csv**: The dataset organized around specific topics, allowing for focused analysis and comparison across different thematic areas. The data in this file is almost identical to PersonalSum_original.csv, with the key difference being that each assignment had the same topic. This structure aims to investigate the correlation between the quality of summaries and the users topic preferences. ### Difference Between the Two CSV Files - **PersonalSum_original.csv**: - Contains human annotated summaries that reflect individual user preferences. - **Topic_centric_PersonalSum.csv**: - Organizes summaries around specific topics. - Facilitates analysis and comparison of summaries within specific thematic areas. - The data collection was performed after PersonalSum_original.csv, with each assignment focused on the same topic to examine the potential correlation between summary quality and users topic preferences. ## Main Attributes of the Dataset - **User Profiles**: Each annotator is assigned a unique WorkerID, which identifies the individual performing the annotation. This allows tracking of annotations by the same person across different tasks. - **AssignmentID**: Represents a specific annotation task. Each annotator summarizes three different news articles under the same AssignmentID, indicating that they were part of the same annotation session. - **Duration**: Indicates the total time taken by each worker to complete an annotation assignment. The duration is the combined time used for finishing the annotations of three news articles. - **Summaries**: Both generic and personalized summaries with corresponding source sentences from news articles are provided. - **Question Answer Sets**: Three question and answer pairs related to each article are included, correlating directly to the content of the articles. ## License This dataset is made available under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You are free to share and adapt the material for non-commercial purposes as long as appropriate credit is given, and any changes made are indicated.
提供机构:
PersonalLab
原始信息汇总

PersonalSum: A User-Subjective Guided Personalized Summarization Dataset

PersonalSum 是一个旨在支持个性化文本摘要研究的数据集。它提供高质量、人工标注的新闻摘要,反映个人用户偏好和关注点。该数据集旨在促进个性化摘要模型的发展,填补现有研究中常依赖通用摘要或伪数据集的空白。PersonalSum 允许探索如何将个人兴趣和偏好融入摘要任务中。

数据集功能

  1. 个性化摘要:通过结合用户配置文件和个性化标注,促进与个人用户偏好对齐的摘要创建。
  2. 通用摘要:包括机器生成的摘要,用于与个性化摘要进行比较分析。

数据集结构

数据集包含两个主要的 CSV 文件,每个文件服务于不同的目的:

  1. PersonalSum_original.csv:原始数据集,包含由人工标注者创建的反映其个人兴趣和偏好的个性化摘要。该文件还包括用户配置文件和文章的源句子。
  2. Topic_centric_PersonalSum.csv:围绕特定主题组织的数据集,允许在不同主题领域进行集中分析和比较。该文件中的数据与 PersonalSum_original.csv 几乎相同,主要区别在于每个任务都围绕同一主题。这种结构旨在研究摘要质量与用户主题偏好之间的潜在关联。

两个 CSV 文件的区别

  • PersonalSum_original.csv

    • 包含反映个人用户偏好的手工标注摘要。
  • Topic_centric_PersonalSum.csv

    • 围绕特定主题组织摘要。
    • 促进在特定主题领域内对摘要的分析和比较。
    • 数据收集在 PersonalSum_original.csv 之后进行,每个任务都聚焦于同一主题,以检查摘要质量与用户主题偏好之间的潜在关联。

数据集的主要属性

  • 用户配置文件:每个标注者被分配一个唯一的 WorkerID,用于标识执行标注的个人。这允许在不同任务中跟踪同一个人的标注。
  • AssignmentID:表示特定的标注任务。每个标注者在相同的 AssignmentID 下总结三篇不同的新闻文章,表明它们是同一标注会话的一部分。
  • 持续时间:表示每个工人完成标注任务所花费的总时间。持续时间是完成三篇新闻文章标注所用的总时间。
  • 摘要:提供通用和个性化摘要以及相应的新闻文章源句子。
  • 问题答案集:每篇文章包含三个与文章内容直接相关的问题和答案对。

许可证

该数据集根据知识共享署名-非商业性使用 4.0 国际 (CC BY-NC 4.0) 许可证发布。您可以自由分享和改编材料,用于非商业目的,只要给予适当的信用,并指明所做的任何更改。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作