five

2019 CA earthquake words dictionaries and time-series labeled data

收藏
Mendeley Data2020-06-12 更新2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/z9xjcmg6s2/1
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset consists of two files related to the 2019 CA earthquake, including the dictionaries of words to clean the data and the time-series labels for damage level. Twitter Standard Search API was utilized with key search terms “earthquake” to search against related tweets from 07/04/2019 to 07/10/2019 (UTC). The original data were stored in JavaScript Object Notation (.json) files, which were converted to Excel (.xlsx) files for subsequent processing. The first data file contains three dictionaries of words, including the locations to filter out tweets related to California earthquake, the damage-related terms to filter in damage-related tweets, and words to filter out the tweets not implying real physical damage. It also contains descriptive words with respects to each of four identified damage levels. This dataset provides a clue for building word dictionaries to clean the dataset for the study of text-based damage assessment for disaster events. The second data file contains 20,708 records labeled by our model. Due to the Twitter Developer Policy, https://developer.twitter.com/en/developer-terms/agreement-and-policy#id34, user information (screen name, verified, descriptions, user-input location, favorites, and followers, etc.) or messages are not allowed to distribute. We attach this file for the sole purpose of validation of our research results on the temporal dimension.
创建时间:
2020-06-12
二维码
社区交流群
二维码
科研交流群
商业服务