HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

Name: HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks
Creator: Harvard Dataverse
Published: 2025-05-12 00:07:47
License: 暂无描述

DataCite Commons2025-05-12 更新2025-05-17 收录

下载链接：

https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/A7NVF7

下载链接

链接失效反馈

官方服务：

资源简介：

The <strong>HumAID</strong> Twitter dataset consists of several thousands of manually annotated tweets that have been collected during nineteen major natural disaster events including earthquakes, hurricanes, wildfires, and floods, which happened during 2016 to 2019 across different parts of the World. It is the largest social media dataset (~77K) for crisis informatics so far (for details please refer to our paper). The annotations consist of following humanitarian categories. <br> <br> <h4><strong>Humanitarian categories</strong></h4> <ul> <li>Caution and advice</li> <li>Displaced people and evacuations</li> <li>Dont know cant judge</li> <li>Infrastructure and utility damage</li> <li>Injured or dead people</li> <li>Missing or found people</li> <li>Not humanitarian</li> <li>Other relevant information</li> <li>Requests or urgent needs</li> <li>Rescue volunteering or donation effort</li> <li>Sympathy and support</li> </ul> <br> <h4><strong>Data format and directories</strong></h4> <p>===========================<br> The data directory contains the following three sub-directories:</p> <ul> <li> events/ This directory contains sub-directories for each event. In which each event directory contains tab-separated (i.e., TSV) three files, i.e., train, dev and test. Each TSV file stores ground-truth annotations for the aforementioned humanitarian categories. The data format of these files is described in detail below.</li> <li>event_type/ This directory contains combined event type data, we combined the training, development, and test sets of all the events that belong to the same event type.</li> <li>all_combined/ This directory contains the whole combined set.</li> <li>HumAID_ICWSM_data.jsonl: Json objects of tweets</li> </ul> <h5><strong>Format of the TSV files</strong></h5> <p> ---------------------------------------------------------<br> Each TSV file contains the following columns, separated by a tab:<br> </p> </ul> <ul> <li>tweet_id: corresponds to the actual tweet id from Twitter.</li> <li>tweet_text: corresponds to the tweet text.</li> <li>class_label: corresponds to a label assigned to a given tweet text.</li> </ul> <h4><strong>More details can also be found in:</strong></h4> <a href="https://crisisnlp.qcri.org/humaid_dataset" target="_blank">https://crisisnlp.qcri.org/humaid_dataset</a> <br> <br>

提供机构：

Harvard Dataverse

创建时间：

2021-04-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集