five

HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

收藏
DataCite Commons2025-05-12 更新2025-05-17 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/A7NVF7
下载链接
链接失效反馈
官方服务:
资源简介:
The <strong>HumAID</strong> Twitter dataset consists of several thousands of manually annotated tweets that have been collected during nineteen major natural disaster events including earthquakes, hurricanes, wildfires, and floods, which happened during 2016 to 2019 across different parts of the World. It is the largest social media dataset (~77K) for crisis informatics so far (for details please refer to our paper). The annotations consist of following humanitarian categories. <br> <br> <h4><strong>Humanitarian categories</strong></h4> <ul> <li>Caution and advice</li> <li>Displaced people and evacuations</li> <li>Dont know cant judge</li> <li>Infrastructure and utility damage</li> <li>Injured or dead people</li> <li>Missing or found people</li> <li>Not humanitarian</li> <li>Other relevant information</li> <li>Requests or urgent needs</li> <li>Rescue volunteering or donation effort</li> <li>Sympathy and support</li> </ul> <br> <h4><strong>Data format and directories</strong></h4> <p>===========================<br> The data directory contains the following three sub-directories:</p> <ul> <li> events/ This directory contains sub-directories for each event. In which each event directory contains tab-separated (i.e., TSV) three files, i.e., train, dev and test. Each TSV file stores ground-truth annotations for the aforementioned humanitarian categories. The data format of these files is described in detail below.</li> <li>event_type/ This directory contains combined event type data, we combined the training, development, and test sets of all the events that belong to the same event type.</li> <li>all_combined/ This directory contains the whole combined set.</li> <li>HumAID_ICWSM_data.jsonl: Json objects of tweets</li> </ul> <h5><strong>Format of the TSV files</strong></h5> <p> ---------------------------------------------------------<br> Each TSV file contains the following columns, separated by a tab:<br> </p> </ul> <ul> <li>tweet_id: corresponds to the actual tweet id from Twitter.</li> <li>tweet_text: corresponds to the tweet text.</li> <li>class_label: corresponds to a label assigned to a given tweet text.</li> </ul> <h4><strong>More details can also be found in:</strong></h4> <a href="https://crisisnlp.qcri.org/humaid_dataset" target="_blank">https://crisisnlp.qcri.org/humaid_dataset</a> <br> <br>
提供机构:
Harvard Dataverse
创建时间:
2021-04-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作