five

SynthPAI

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/robinsta/synthpai
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为synthPAI,是一个经过人工验证的合成对话数据集,专为个人属性推断研究设计。它包含了7,823条由300个不同合成用户生成的评论,覆盖了8个个人属性,包括年龄、性别、收入水平、地理位置、出生地、教育水平、职业和婚姻状况。SynthPAI中的评论在风格和质量上与真实的Reddit评论高度相似,该数据集被用于构建一个扰动数据库,以评估防御方法的有效性。规模上,数据集包含了来自300个合成用户的7,823条评论,其研究任务旨在个人属性推断。

The dataset, termed synthPAI, is a manually validated synthetic conversation dataset tailored for research on personal attribute inference. It encompasses 7,823 comments generated by 300 unique synthetic users, spanning 8 personal attributes: age, gender, income level, geographic location, place of birth, educational attainment, occupation, and marital status. Comments within synthPAI closely resemble real Reddit comments in both style and quality. This dataset is employed to build a perturbation database for assessing the efficacy of defense methods. In terms of scale, the dataset consists of 7,823 comments sourced from 300 synthetic users, with its core research objective being personal attribute inference.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作