Social Media Corpus: Stigma Identification in Vaccination Discourse

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://figshare.com/articles/dataset/Social_Media_Corpus_Stigma_Identification_in_Vaccination_Discourse_COVID-19_/23277392

下载链接

链接失效反馈

官方服务：

资源简介：

Current research introduces an annotated gold standard dataset based on 2,663 comments from Meta (Facebook). The dataset is manually labelled for stigma, not stigma, and ambiguous sentiment. Each comment is labelled three times (four times in case of dissensus) by independent expert annotators. The overall observed share of agreement reached 68% and Fleiss Kappa agreement rate achieved 0.62 on the annotation task with three labels ("stigma, "not stigma", and "ambiguous" category). Annotation share of agreement between two labels ("stigma, "not stigma") is 89% and Fleiss Kappa is 0.84. The labels are consequently propagated from the annotated Facebook (Meta) to a dataset discussing COVID vaccines with 40,084 comments from Twitter, Reddit, and YouTube corpora. In addition, the corpora are annotated with linguistic features from LIWC (Linguistic Inquiry and Word Count) [1], [2] and additional features: number of characters in the comment string, sentiment score, subjectivity score. 1. Pennebaker, J. W., Francis, M. E. & Booth, R. J. Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Assoc. 71, 2001 (2001). 2. Tausczik, Y. R. & Pennebaker, J. W. The psychological meaning of words: Liwc and computerised text analysis methods. J. language social psychology 29, 24–54 (2010)

创建时间：

2023-06-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集