SocialCD-3K
收藏DataCite Commons2024-06-09 更新2024-07-13 收录
下载链接:
https://ieee-dataport.org/documents/socialcd-3k
下载链接
链接失效反馈官方服务:
资源简介:
We sourced our data by crawling comments from the “Zoufan” blog within the Weibo social platform. Subsequently, a team of qualified psychologists were enlisted to annotate the data. In our study, strict data preprocessing measures were adopted to protect users’ privacy.SocialCD-3K (Cognitive Distortion Classification)Labels and Number of Samples:All-or-nothing thinking: 77Over-generalization: 141Mental filter: 378Disqualifying the positive: 27Mind reading: 121The fortune teller error: 652Magnification: 321Emotional reasoning: 16Should statements: 84Labeling and mislabeling: 1961Blaming oneself: 188Blaming others: 27Data Split:Training set: 2725 samplesTest set: 682 samplesAverage Number of Labels per Sample: 1.71Average Number of Words per Post: 42.56
本数据集通过爬取微博社交平台上「走饭(Zoufan)」博主的评论数据采集获得。随后,由具备专业资质的心理学研究者组成的团队完成了该数据的标注工作。本研究采用严格的数据预处理流程以保护用户隐私。
SocialCD-3K(认知歪曲分类,Cognitive Distortion Classification)
标签及对应样本量:
非黑即白思维(All-or-nothing thinking):77
过度概括(Over-generalization):141
心理过滤(Mental filter):378
否定积极面(Disqualifying the positive):27
读心术(Mind reading):121
算命师谬误(The fortune teller error):652
夸大化(Magnification):321
情绪推理(Emotional reasoning):16
应该陈述(Should statements):84
贴标签与错贴标签(Labeling and mislabeling):1961
自责(Blaming oneself):188
指责他人(Blaming others):27
数据划分:
训练集:2725条样本
测试集:682条样本
单样本平均标签数:1.71
单条博文平均词数:42.56
提供机构:
IEEE DataPort
创建时间:
2024-06-09
搜集汇总
数据集介绍

背景与挑战
背景概述
SocialCD-3K是一个用于认知扭曲分类的数据集,包含来自微博评论的3407个样本,分为12个认知扭曲类别,数据经过严格预处理以保护用户隐私。数据集分为训练集和测试集,平均每个样本有1.71个标签,每个帖子平均有42.56个词。
以上内容由遇见数据集搜集并总结生成



