five

GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

收藏
IEEE2020-06-24 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/open-access/geocov19-dataset-hundreds-millions-multilingual-covid-19-tweets-location-information
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract:We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.The dataset was collected using more than 800 multilingual keywords and hashtags. The complete list of keywords can be downloaded from here: https://crisisnlp.qcri.org/covid19 For more details, please refer to this paper: https://arxiv.org/abs/2005.11177Explore interesting trends in GeoCoV19 dataset using our new service: https://covid19-trends.qcri.org/
提供机构:
Ofli, Ferda; Qazi, Umair; Imran, Muhammad
创建时间:
2020-06-24
二维码
社区交流群
二维码
科研交流群
商业服务