five

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

收藏
Mendeley Data2024-01-31 更新2024-06-29 收录
下载链接:
https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the IDs of geo-tagged tweets. The tweets are captured by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for these keywords - “corona”, "coronavirus", "covid", "pandemic", "lockdown", "quarantine", "hand sanitizer", "ppe", "n95", different possible variants of "sarscov2", "nCov", "covid-19", "ncov2019", "2019ncov", "flatten(ing) the curve", "social distancing", "work(ing) from home" and the respective hashtag of all these keywords. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location. Please note that this dataset should be solely used for non-commercial research purposes (ignore every other LICENSE category given on this page).-------------------------------------------------------------------------Coronavirus (COVID-19) Tweets Dataset(190+ Million English Language Tweets; ongoing collection)-------------------------------------------------------------------------Note: I started sharing the IDs of the tweets that had exact 'point' location information, only since April 28, 2020, with some genuine requests coming in from academic researchers who did not want to hydrate the whole lists of IDs shared in the Coronavirus (COVID-19) Tweets Dataset.Update: I have received a lot of requests, especially from Social Science researchers, to also make the geo-tagged tweets created between March 20, 2020, and April 28, 2020, available in this dataset. Hydrating the millions of tweet IDs may come as a tedious task for people with less technical expertise. Therefore, I have started hydrating the IDs provided in the Coronavirus (COVID-19) Tweets Dataset, and I will be sharing the geo-tagged tweets posted in between these dates as the hydration task goes on. I'll be adding new CSV files, and the naming convention for these newly added files will be day-wise (instead of period-wise). Bookmark this page for further updates.The data is available in two formats: CSV and JSON. I'll be sharing new files every day, and the files will be named period-wise. For example, april28-june5.zip will contain tweet ID and sentiment score of the tweets (in CSV and JSON formats) that were created between April 28, 2020, and June 05, 2020.Why are only tweet IDs being shared? Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet or make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.

本数据集收录地理标记推文(geo-tagged tweets)的ID。该数据集由部署于https://live.rlamsal.com.np的持续运行项目采集。该模型实时监控Twitter平台信息流,抓取包含以下关键词及其对应话题标签的推文:"corona"、"coronavirus"、"covid"、"pandemic"、"lockdown"、"quarantine"、"hand sanitizer"、"ppe"、"n95"、"sarscov2"的各类可能变体、"nCov"、"covid-19"、"ncov2019"、"2019ncov"、"flatten(ing) the curve"、"social distancing"、"work(ing) from home"。依据Twitter内容再分发政策,本数据集仅共享推文ID,用户可通过水化(hydrate)操作还原完整推文内容。本数据集内的所有推文ID均对应带有精确地理位置的推文。请注意,本数据集仅可用于非商业研究用途,请忽略本页面中其他所有许可类别。 -------------------------------------------------------------------------新冠病毒(COVID-19)推文数据集(1.9亿+条英语推文;持续采集中)------------------------------------------------------------------------- 注:应部分学术研究者的合理请求,自2020年4月28日起,我开始分享带有精确“点位”位置信息的推文ID,此类研究者无需对已共享的完整推文ID列表执行水化操作。 更新说明:近期我收到大量请求,尤其是来自社会科学领域的研究者,希望本数据集同步提供2020年3月20日至2020年4月28日期间发布的带地理标记推文。对于技术能力有限的用户而言,对海量推文ID进行水化操作较为繁琐。因此,我已启动对新冠病毒(COVID-19)推文数据集中ID的水化工作,并将在水化进程中陆续发布上述时间段内的带地理标记推文。后续新增文件将按日命名(替代原有的按时间段命名规则),请收藏本页面以获取最新更新。 数据格式说明:本数据集提供CSV与JSON两种存储格式。我将每日更新新文件,原命名规则为按时间段命名,例如april28-june5.zip将包含2020年4月28日至2020年6月5日期间发布的推文ID与情感得分,文件格式分别为CSV与JSON。 为何仅共享推文ID?依据Twitter内容再分发政策,平台限制共享除推文ID和/或用户ID之外的推文信息。Twitter要求研究者始终拉取新鲜数据,原因在于用户可能删除推文或将账号设为私密。若已拉取并公开分享过同一推文,可能使用户或相关群体面临来自该共享数据的各类推论风险,而此类风险在当前并不存在或属于隐私范畴。
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作