COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes
收藏DataCite Commons2026-02-16 更新2026-05-03 收录
下载链接:
https://www.openicpsr.org/openicpsr/project/120321/version/V12/view?path=/openicpsr/120321/fcr:versions/V12/Twitter-COVID-dataset---June2022/tweetid_userid_keyword_sentiments_emotions_United-Kingdom.zip&type=file
下载链接
链接失效反馈官方服务:
资源简介:
This paper
describes a large global dataset on people’s discourse and responses to the
COVID-19 pandemic over the Twitter platform. From 28 January 2020 to 1 June 2022,
we collected and processed over 252
million Twitter posts from more than 29 million unique users using four
keywords: “corona”, “wuhan”, “nCov” and “covid”. Leveraging probabilistic topic
modelling and pre-trained machine learning-based emotion recognition algorithms,
we labelled each tweet with seventeen attributes, including a) ten binary
attributes indicating the tweet’s relevance (1) or irrelevance (0) to the top
ten detected topics, b) five quantitative emotion attributes indicating the
degree of intensity of the valence or sentiment (from 0: extremely negative to
1: extremely positive) and the degree of intensity of fear, anger, sadness and happiness
emotions (from 0: not at all to 1: extremely intense), and c) two categorical attributes
indicating the sentiment (very negative, negative, neutral or mixed, positive,
very positive) and the dominant emotion (fear, anger, sadness, happiness, no
specific emotion) the tweet is mainly expressing. We discuss the technical
validity and report the descriptive statistics of these attributes, their
temporal distribution, and geographic representation. The paper concludes with
a discussion of the dataset’s usage in communication, psychology, public
health, economics, and epidemiology.
提供机构:
ICPSR - Interuniversity Consortium for Political and Social Research
创建时间:
2026-02-16



