five

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

收藏
Mendeley Data2024-01-31 更新2024-06-29 收录
下载链接:
https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the IDs of geo-tagged tweets. The tweets were captured by an on-going project deployed at https://live.rlamsal.com.np. The geolocation data was extracted from the tweets which mentioned anything about “corona”, "coronavirus", "covid" and possible variants of "sarscov2", "nCov", "covid-19", "ncov2019". Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location. Please note that this dataset should be solely used for non-commercial research purposes (ignore every other LICENSE category given on this page).-------------------------------------------------------------------------Also: Coronavirus (COVID-19) Tweets Dataset (180+ Million English Language Tweets; ongoing collection)-------------------------------------------------------------------------Note: I started sharing the IDs of the tweets that had exact 'point' location information, only since April 28, 2020, with some genuine requests coming in from academic researchers who did not want to hydrate the whole lists of IDs shared in the Coronavirus (COVID-19) Tweets Dataset.If you need the geolocation-based data starting March 20, 2020, then use the Coronavirus (COVID-19) Tweets Dataset and hydrate the IDs while adding the following condition:data = json.loads(data)if data["coordinates"]: longitude, latitude = data["coordinates"]["coordinates"]The data is available in two formats: CSV and JSON. I'll be sharing new files every day, and the files will be named period-wise. For example, april28-june5.zip will contain tweet ID and sentiment score of the tweets (in CSV and JSON formats) that were created between April 28, 2020, and June 05, 2020.Why are only tweet IDs being shared? Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet or make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.

本数据集收录带地理标记的推文ID。相关推文由部署于https://live.rlamsal.com.np的持续性项目采集。地理定位数据从提及“corona”“coronavirus”“covid”以及“SARS-CoV-2”“nCoV”“COVID-19”“nCoV2019”等新冠病毒相关变体的推文中提取。根据推特(Twitter)的内容再分发政策,本数据集仅共享推文ID,研究者可通过对这些ID进行水化(hydrating)操作以还原完整数据集。本数据集内的推文ID均来自带有精确地理位置的推文。请注意,本数据集仅可用于非商业研究用途(请忽略本页面上列出的其他所有许可类别)。 ------------------------------------------------------------------------- 附:新冠病毒(COVID-19)推文数据集(1.8亿余条英文推文;持续收集中) ------------------------------------------------------------------------- 注:自2020年4月28日起,应部分学术研究者的合理请求,我开始分享带有精确“点”状位置信息的推文ID,这些研究者不愿对新冠病毒(COVID-19)推文数据集中共享的全部ID列表进行水化处理。若您需要2020年3月20日起的基于地理定位的数据,请使用新冠病毒(COVID-19)推文数据集,并在对ID进行水化处理时添加如下条件: data = json.loads(data) if data["coordinates"]: longitude, latitude = data["coordinates"]["coordinates"] 本数据集提供CSV和JSON两种格式。我将每日更新文件,文件将按时间段命名。例如,`april28-june5.zip` 将包含2020年4月28日至2020年6月5日期间发布的推文的ID与情感评分(以CSV和JSON格式存储)。 为何仅共享推文ID? 推特(Twitter)的内容再分发政策限制了除推文ID和/或用户ID之外的推文信息共享。推特要求研究者始终获取新鲜数据,原因在于用户可能会删除推文或将其账号设为私密。若已将某条推文拉取并共享至公开平台,可能会使用户/相关群体面临来自共享数据的多种推理攻击风险,而此类风险在当前并不存在或属于隐私范畴。
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作