five

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

收藏
Mendeley Data2024-01-31 更新2024-06-29 收录
下载链接:
https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the IDs of geo-tagged tweets. The tweets were captured by an on-going project deployed at https://live.rlamsal.com.np. The geolocation data was extracted from the tweets which mentioned anything about “corona”, "coronavirus", "covid", "pandemic", "quarantine", "ppe", "n95" along with different possible variants of: "sarscov2", "nCov", "covid-19", "ncov2019", "2019ncov", "flatten(ing) the curve", "social distancing", "work from home" and their respective hashtags.. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location. Please note that this dataset should be solely used for non-commercial research purposes (ignore every other LICENSE category given on this page).-------------------------------------------------------------------------Also: Coronavirus (COVID-19) Tweets Dataset (180+ Million English Language Tweets; ongoing collection)-------------------------------------------------------------------------Note: I started sharing the IDs of the tweets that had exact 'point' location information, only since April 28, 2020, with some genuine requests coming in from academic researchers who did not want to hydrate the whole lists of IDs shared in the Coronavirus (COVID-19) Tweets Dataset.If you need the geolocation-based data starting March 20, 2020, then use the Coronavirus (COVID-19) Tweets Dataset and hydrate the IDs while adding the following condition:data = json.loads(data)if data["coordinates"]: longitude, latitude = data["coordinates"]["coordinates"]The data is available in two formats: CSV and JSON. I'll be sharing new files every day, and the files will be named period-wise. For example, april28-june5.zip will contain tweet ID and sentiment score of the tweets (in CSV and JSON formats) that were created between April 28, 2020, and June 05, 2020.Why are only tweet IDs being shared? Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet or make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.

本数据集收录带有地理标记的推文ID。这些推文由部署于https://live.rlamsal.com.np的一项在研项目采集所得。地理定位数据均提取自提及以下任一内容的推文:"corona"、"冠状病毒(coronavirus)"、"新冠(covid)"、"大流行病(pandemic)"、"隔离(quarantine)"、"个人防护装备(PPE)"、"N95口罩(n95)",以及"严重急性呼吸综合征冠状病毒2(SARS-CoV-2)"、"新型冠状病毒(nCoV)"、"新型冠状病毒肺炎(COVID-19)"、"2019新型冠状病毒(ncov2019、2019ncov)"、"拉平疫情曲线(flatten(ing) the curve)"、"社交距离(social distancing)"、"居家办公(work from home)"及其各类变体形式与对应话题标签。为遵守推特(Twitter)的内容再分发政策,本数据集仅共享推文ID。你可通过数据水化(hydrate)这些ID来重构本数据集。本数据集内的推文ID均对应带有精确地理位置信息的推文。请注意,本数据集仅可用于非商业性研究用途(请忽略本页面上列出的其他许可证类别)。 ------------------------------------------------------------------------- 补充说明:冠状病毒(COVID-19)推文数据集(收录1.8亿余条英文推文,持续收集中) ------------------------------------------------------------------------- 备注:鉴于部分学术研究者提出合理需求,不愿对冠状病毒(COVID-19)推文数据集中共享的全部ID列表进行数据水化操作,本项目自2020年4月28日起,开始共享带有精确"点位"位置信息的推文ID。若你需要2020年3月20日起的地理定位数据,请使用冠状病毒(COVID-19)推文数据集,并在对ID进行数据水化时添加以下判断条件: data = json.loads(data) if data["coordinates"]: longitude, latitude = data["coordinates"]["coordinates"] 本数据集提供CSV与JSON两种格式。项目每日都会更新共享新文件,文件将按时间周期命名。例如,april28-june5.zip 压缩包内将收录2020年4月28日至6月5日期间发布的推文ID与推文情感得分(包含CSV与JSON两种格式)。 为何仅共享推文ID? 推特(Twitter)的内容再分发政策仅允许共享推文ID与/或用户ID,禁止共享其他推文相关信息。推特要求研究人员始终获取最新数据。这是因为用户可能会删除推文或将其个人主页设为私密状态。若某条推文已被获取并在公共领域共享,则可能会使用户/相关群体面临来自该共享数据的诸多推断风险,而此类风险在当前状态下并不存在或仅为隐私风险。
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作