Secondary data from Brazilian tweets about COVID-19
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/kv52cwvskc
下载链接
链接失效反馈官方服务:
资源简介:
These datasets contain secondary data extracted from Brazilian tweets about the COVID-19 pandemic. These data were generated between March, 2020 and May, 2021. Due to the Twitter' restrictions , it was not possible to share the raw data. Also, respecting the users privacy and in accordance with the recommendations of the Brazilian Research Ethics Committee, neither tweets ID can be shared here. So, a set of scripts were applied on raw data to extract useful information and create these datasets for further research.
The files initiated with top prefix contain the most cited words by day in three categories: general tweets, vaccine-related tweets and verified accounts tweets. Files starting with the prefix subject, on the other hand, contain the daily count of mentions according to the following categories: symptoms, drugs, vaccines, brands or vaccine manufacturers and the count per day.
As the data are associated with the post date, studies can be developed considering the temporal aspect in order to compare the perception of users on a given subject over time. It is important to note that some issues have become more important over time, especially in relation to vaccines.
本数据集包含2020年3月至2021年5月期间,从涉及新冠疫情(COVID-19 pandemic)的巴西推特(Twitter)内容中提取的二次数据。
受推特(Twitter)平台的相关使用限制,无法共享原始数据;同时为保护用户隐私,并遵循巴西研究伦理委员会(Brazilian Research Ethics Committee)的相关建议,此处亦不得公开推文ID。因此,研究人员通过一系列脚本对原始数据进行处理,提取有效信息并构建本数据集,以供后续科研工作使用。
以top为前缀的文件,按三类分别存储每日高频提及词汇:普通推文、疫苗相关推文及认证账号推文。而以subject为前缀的文件,则按症状、药物、疫苗、疫苗品牌或生产商等类别,存储每日的提及量统计数据。
由于本数据集与推文发布日期相关联,研究可结合时间维度展开,以对比不同时段用户对特定议题的认知变化。值得注意的是,部分议题的关注度随时间推移显著提升,尤以疫苗相关议题最为突出。
创建时间:
2021-06-29



