five

Paraphrase choice based on user traits

收藏
DataCite Commons2025-06-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Paraphrase_choice_based_on_user_traits/1613525/2
下载链接
链接失效反馈
官方服务:
资源简介:
PPDB paraphrase pairs and clusters with their associated usage score across three user traits:<br>a. Gender: male or female<br>b. Age: &lt;25 or &gt;30<br>c. Occupational Class: low or high Contents:<br>frequencies.tar.gz - contains the raw frequency statistics for all phrases and each trait<br>pairs.tar.gz - contains files with pairwise usage scores for each trait<br>clusters.tar.gz - contains files with cluster usage scores for each trait In pairs and clusters, the negative values are phrases which are more associated with: females, lower occupational class and users over 30 years old.  If you are using this dataset, please reference our work: @inproceedings{paraphrase16aaai,<br>author = {Preo\c{t}iuc-Pietro, Daniel and Xu, Wei and Ungar, Lyle},<br>title = {{Discovering user attribute stylistic differences via paraphrasing}},<br>booktitle = {{Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence}},<br>series = {AAAI},<br>year = {2016}<br>}

PPDB(Paraphrase Database)释义对与簇集数据集,涵盖三类用户特征下的关联使用评分: a. 性别:男性或女性 b. 年龄:25岁以下或30岁以上 c. 职业阶层:低职业阶层或高职业阶层 数据集包含以下文件: frequencies.tar.gz - 包含所有短语及各用户特征的原始频率统计数据 pairs.tar.gz - 包含针对各用户特征的成对使用评分文件 clusters.tar.gz - 包含针对各用户特征的簇集使用评分文件 在成对数据与簇集数据中,负值表示与以下群体关联度更高的短语:女性、低职业阶层用户以及30岁以上人群。 若使用本数据集,请引用以下文献: @inproceedings{paraphrase16aaai, author = {Preoc{t}iuc-Pietro, Daniel and Xu, Wei and Ungar, Lyle}, title = {{通过释义挖掘用户属性的文体差异}}, booktitle = {{第三十届 AAAI 人工智能大会论文集}}, series = {AAAI}, year = {2016} }
提供机构:
figshare
创建时间:
2016-01-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作