five

Social Media Corpus: Stigma Identification in Vaccination Discourse

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Social_Media_Corpus_Stigma_Identification_in_Vaccination_Discourse_COVID-19_/23277392
下载链接
链接失效反馈
官方服务:
资源简介:
Current research introduces an annotated gold standard dataset based on 2,663 comments from Meta (Facebook). The dataset is manually labelled for stigma, not stigma, and ambiguous sentiment. Each comment is labelled three times (four times in case of dissensus) by independent expert annotators. The overall observed share of agreement reached 68% and Fleiss Kappa agreement rate achieved 0.62 on the annotation task with three labels ("stigma, "not stigma", and "ambiguous" category). Annotation share of agreement between two labels ("stigma, "not stigma") is 89% and Fleiss Kappa is 0.84. The labels are consequently propagated from the annotated Facebook (Meta) to a dataset discussing COVID vaccines with 40,084 comments from Twitter, Reddit, and YouTube corpora. In addition, the corpora are annotated with linguistic features from LIWC (Linguistic Inquiry and Word Count) [1], [2] and additional features: number of characters in the comment string, sentiment score, subjectivity score. 1. Pennebaker, J. W., Francis, M. E. & Booth, R. J. Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Assoc. 71, 2001 (2001). 2. Tausczik, Y. R. & Pennebaker, J. W. The psychological meaning of words: Liwc and computerised text analysis methods. J. language social psychology 29, 24–54 (2010)
创建时间:
2023-06-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作