Name: QCRI/HumAID-events
Creator: QCRI
Published: 2024-11-06 16:07:03
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/QCRI/HumAID-events

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 task_categories: - text-classification language: - en tags: - Disaster - Crisis Informatics pretty_name: 'HumAID: Human-Annotated Disaster Incidents Data from Twitter -- Event wise dataset' size_categories: - 10K<n<100K dataset_info: - config_name: hurricane_florence_2018 splits: - name: train num_examples: 4384 - name: dev num_examples: 639 - name: test num_examples: 1241 - config_name: kaikoura_earthquake_2016 splits: - name: train num_examples: 1536 - name: dev num_examples: 224 - name: test num_examples: 435 - config_name: kerala_floods_2018 splits: - name: train num_examples: 5588 - name: dev num_examples: 814 - name: test num_examples: 1582 - config_name: hurricane_harvey_2017 splits: - name: train num_examples: 6378 - name: dev num_examples: 929 - name: test num_examples: 1805 - config_name: hurricane_maria_2017 splits: - name: train num_examples: 5094 - name: dev num_examples: 742 - name: test num_examples: 1442 - config_name: midwestern_us_floods_2019 splits: - name: train num_examples: 1316 - name: dev num_examples: 191 - name: test num_examples: 373 - config_name: puebla_mexico_earthquake_2017 splits: - name: train num_examples: 1410 - name: dev num_examples: 205 - name: test num_examples: 400 - config_name: maryland_floods_2018 splits: - name: train num_examples: 519 - name: dev num_examples: 75 - name: test num_examples: 148 - config_name: hurricane_irma_2017 splits: - name: train num_examples: 6579 - name: dev num_examples: 958 - name: test num_examples: 1862 - config_name: ecuador_earthquake_2016 splits: - name: train num_examples: 1094 - name: dev num_examples: 159 - name: test num_examples: 310 - config_name: cyclone_idai_2019 splits: - name: train num_examples: 2753 - name: dev num_examples: 401 - name: test num_examples: 779 - config_name: canada_wildfires_2016 splits: - name: train num_examples: 1569 - name: dev num_examples: 228 - name: test num_examples: 445 - config_name: italy_earthquake_aug_2016 splits: - name: train num_examples: 840 - name: dev num_examples: 122 - name: test num_examples: 239 - config_name: greece_wildfires_2018 splits: - name: train num_examples: 1060 - name: dev num_examples: 154 - name: test num_examples: 301 - config_name: hurricane_dorian_2019 splits: - name: train num_examples: 5329 - name: dev num_examples: 776 - name: test num_examples: 1508 - config_name: .git splits: - name: train num_examples: 0 - name: dev num_examples: 0 - name: test num_examples: 0 - config_name: california_wildfires_2018 splits: - name: train num_examples: 5163 - name: dev num_examples: 752 - name: test num_examples: 1461 - config_name: pakistan_earthquake_2019 splits: - name: train num_examples: 1370 - name: dev num_examples: 199 - name: test num_examples: 389 - config_name: hurricane_matthew_2016 splits: - name: train num_examples: 1157 - name: dev num_examples: 168 - name: test num_examples: 329 - config_name: srilanka_floods_2017 splits: - name: train num_examples: 392 - name: dev num_examples: 57 - name: test num_examples: 111 configs: - config_name: hurricane_florence_2018 data_files: - split: train path: hurricane_florence_2018/train.json - split: dev path: hurricane_florence_2018/dev.json - split: test path: hurricane_florence_2018/test.json - config_name: kaikoura_earthquake_2016 data_files: - split: train path: kaikoura_earthquake_2016/train.json - split: dev path: kaikoura_earthquake_2016/dev.json - split: test path: kaikoura_earthquake_2016/test.json - config_name: kerala_floods_2018 data_files: - split: train path: kerala_floods_2018/train.json - split: dev path: kerala_floods_2018/dev.json - split: test path: kerala_floods_2018/test.json - config_name: hurricane_harvey_2017 data_files: - split: train path: hurricane_harvey_2017/train.json - split: dev path: hurricane_harvey_2017/dev.json - split: test path: hurricane_harvey_2017/test.json - config_name: hurricane_maria_2017 data_files: - split: train path: hurricane_maria_2017/train.json - split: dev path: hurricane_maria_2017/dev.json - split: test path: hurricane_maria_2017/test.json - config_name: midwestern_us_floods_2019 data_files: - split: train path: midwestern_us_floods_2019/train.json - split: dev path: midwestern_us_floods_2019/dev.json - split: test path: midwestern_us_floods_2019/test.json - config_name: puebla_mexico_earthquake_2017 data_files: - split: train path: puebla_mexico_earthquake_2017/train.json - split: dev path: puebla_mexico_earthquake_2017/dev.json - split: test path: puebla_mexico_earthquake_2017/test.json - config_name: maryland_floods_2018 data_files: - split: train path: maryland_floods_2018/train.json - split: dev path: maryland_floods_2018/dev.json - split: test path: maryland_floods_2018/test.json - config_name: hurricane_irma_2017 data_files: - split: train path: hurricane_irma_2017/train.json - split: dev path: hurricane_irma_2017/dev.json - split: test path: hurricane_irma_2017/test.json - config_name: ecuador_earthquake_2016 data_files: - split: train path: ecuador_earthquake_2016/train.json - split: dev path: ecuador_earthquake_2016/dev.json - split: test path: ecuador_earthquake_2016/test.json - config_name: cyclone_idai_2019 data_files: - split: train path: cyclone_idai_2019/train.json - split: dev path: cyclone_idai_2019/dev.json - split: test path: cyclone_idai_2019/test.json - config_name: canada_wildfires_2016 data_files: - split: train path: canada_wildfires_2016/train.json - split: dev path: canada_wildfires_2016/dev.json - split: test path: canada_wildfires_2016/test.json - config_name: italy_earthquake_aug_2016 data_files: - split: train path: italy_earthquake_aug_2016/train.json - split: dev path: italy_earthquake_aug_2016/dev.json - split: test path: italy_earthquake_aug_2016/test.json - config_name: greece_wildfires_2018 data_files: - split: train path: greece_wildfires_2018/train.json - split: dev path: greece_wildfires_2018/dev.json - split: test path: greece_wildfires_2018/test.json - config_name: hurricane_dorian_2019 data_files: - split: train path: hurricane_dorian_2019/train.json - split: dev path: hurricane_dorian_2019/dev.json - split: test path: hurricane_dorian_2019/test.json - config_name: california_wildfires_2018 data_files: - split: train path: california_wildfires_2018/train.json - split: dev path: california_wildfires_2018/dev.json - split: test path: california_wildfires_2018/test.json - config_name: pakistan_earthquake_2019 data_files: - split: train path: pakistan_earthquake_2019/train.json - split: dev path: pakistan_earthquake_2019/dev.json - split: test path: pakistan_earthquake_2019/test.json - config_name: hurricane_matthew_2016 data_files: - split: train path: hurricane_matthew_2016/train.json - split: dev path: hurricane_matthew_2016/dev.json - split: test path: hurricane_matthew_2016/test.json - config_name: srilanka_floods_2017 data_files: - split: train path: srilanka_floods_2017/train.json - split: dev path: srilanka_floods_2017/dev.json - split: test path: srilanka_floods_2017/test.json --- # HumAID: Human-Annotated Disaster Incidents Data from Twitter ## Dataset Description - **Homepage:** https://crisisnlp.qcri.org/humaid_dataset - **Repository:** https://crisisnlp.qcri.org/data/humaid/humaid_data_all.zip - **Paper:** https://ojs.aaai.org/index.php/ICWSM/article/view/18116/17919 ### Dataset Summary The HumAID Twitter dataset consists of several thousands of manually annotated tweets that has been collected during 19 major natural disaster events including earthquakes, hurricanes, wildfires, and floods, which happened from 2016 to 2019 across different parts of the World. The annotations in the provided datasets consists of following humanitarian categories. The dataset consists only english tweets and it is the largest dataset for crisis informatics so far. ** Humanitarian categories ** - Caution and advice - Displaced people and evacuations - Dont know cant judge - Infrastructure and utility damage - Injured or dead people - Missing or found people - Not humanitarian - Other relevant information - Requests or urgent needs - Rescue volunteering or donation effort - Sympathy and support The resulting annotated dataset consists of 11 labels. ### Supported Tasks and Benchmark The dataset can be used to train a model for multiclass tweet classification for disaster response. The benchmark results can be found in https://ojs.aaai.org/index.php/ICWSM/article/view/18116/17919. Dataset is also released with event-wise and JSON objects for further research. Full set of the dataset can be found in https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/A7NVF7 ### Languages English ## Dataset Structure ### Data Instances ``` { "tweet_text": "@RT_com: URGENT: Death toll in #Ecuador #quake rises to 233 \u2013 President #Correa #1 in #Pakistan", "class_label": "injured_or_dead_people" } ``` ### Data Fields * tweet_text: corresponds to the tweet text. * class_label: corresponds to a label assigned to a given tweet text ### Data Splits * Train * Development * Test ## Dataset Creation Tweets has been collected during several disaster events. ### Annotations AMT has been used to annotate the dataset. Please check the paper for a more detail. #### Who are the annotators? - crowdsourced ### Licensing Information - cc-by-nc-4.0 ### Citation Information ``` @inproceedings{humaid2020, Author = {Firoj Alam, Umair Qazi, Muhammad Imran, Ferda Ofli}, booktitle={Proceedings of the Fifteenth International AAAI Conference on Web and Social Media}, series={ICWSM~'21}, Keywords = {Social Media, Crisis Computing, Tweet Text Classification, Disaster Response}, Title = {HumAID: Human-Annotated Disaster Incidents Data from Twitter}, Year = {2021}, publisher={AAAI}, address={Online}, } ```

应用场景：