Coronavirus (COVID-19) Geo-tagged Tweets Dataset
收藏Mendeley Data2024-01-31 更新2024-06-29 收录
下载链接:
https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the IDs of geo-tagged tweets. The tweets were captured by an on-going project deployed at https://live.rlamsal.com.np. The geolocation data was extracted from the tweets which mentioned anything about “corona”, "coronavirus", "covid" and possible variants of "sarscov2", "nCov", "covid-19", "ncov2019". Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location. Please note that this dataset should be solely used for non-commercial research purposes (ignore every other LICENSE category given on this page).Note: I started sharing the IDs of the tweets that had exact 'point' location information, only since April 28, 2020, with some genuine requests coming in from academic researchers who did not want to hydrate the whole lists of IDs (above 180+ million tweets) shared in the Coronavirus (COVID-19) Tweets Dataset.-------------------------------------------------------------------------Project: Live Twitter Sentiment | Author's URL-------------------------------------------------------------------------Also: Coronavirus (COVID-19) Tweets Dataset-------------------------------------------------------------------------If you need the geolocation-based data starting March 20, 2020, then use the Coronavirus (COVID-19) Tweets Dataset and hydrate the IDs while adding the following condition:data = json.loads(data)if data["coordinates"]: longitude, latitude = data["coordinates"]["coordinates"]The data is available in two formats: CSV and JSON. I'll be sharing new files every day, and the files will be named period-wise. For example, april28-june5.zip will contain tweet ID and sentiment score of the tweets (in CSV and JSON formats) that were created between April 28, 2020, and June 05, 2020.Why are only tweet IDs being shared? Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet or make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.
本数据集包含带地理标记的推文ID。所有推文均由部署于https://live.rlamsal.com.np的一项正在进行中的项目采集。
地理定位数据提取自提及“corona”、“coronavirus”、“covid”以及“SARS-CoV-2”、“nCoV”、“covid-19”、“ncov2019”等潜在变异相关内容的推文。为遵守推特(Twitter)的内容再分发政策,本数据集仅共享推文ID,研究人员可通过对这些ID进行水化操作重建本数据集。
本数据集内的推文ID均来自带有精确地理位置信息的推文。请注意,本数据集仅可用于非商业性研究用途,请忽略本页面上列出的其他许可类别。
注:应部分学术研究者的合理请求——这些研究者不愿对冠状病毒(COVID-19)推文数据集中共享的超1.8亿条推文ID进行全量水化操作,本项目自2020年4月28日起,开始共享带有精确「点位」地理位置信息的推文ID。
-------------------------------------------------------------------------
项目:实时推特情感分析(Live Twitter Sentiment) | 作者网址
-------------------------------------------------------------------------
补充说明:冠状病毒(COVID-19)推文数据集
-------------------------------------------------------------------------
若您需要2020年3月20日起的地理定位类数据,请使用冠状病毒(COVID-19)推文数据集,并在对ID进行水化操作时添加如下条件:
data = json.loads(data)
if data["coordinates"]: longitude, latitude = data["coordinates"]["coordinates"]
本数据集提供CSV与JSON两种格式。项目组每日都会更新数据集文件,文件将按时间段命名。例如,april28-june5.zip包含2020年4月28日至6月5日期间发布的推文的ID与情感评分,文件格式包含CSV与JSON。
为何仅共享推文ID?
根据推特的内容再分发政策,仅允许共享推文ID或用户ID,不得共享其他推文相关信息。推特要求研究人员始终拉取最新数据,原因在于用户可能删除推文或将账号设为私密。若某条推文已被拉取并在公开平台共享,可能会使用户/社区面临来自该共享数据的多种推断风险,而此类风险在当前尚未出现或属于隐私范畴。
创建时间:
2024-01-31



