five

joker122322222/natural-disasters-from-social-media

收藏
Hugging Face2025-12-06 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/joker122322222/natural-disasters-from-social-media
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-classification language: - en tags: - natural disasters - tweets - classification - catastrophic events pretty_name: Natural Disasters from Social Media size_categories: - 100K<n<1M annotations_creators: - crowdsourced - expert-generated source_datasets: - "Kaggle 1 - jannesklaas/disasters-on-social-media" - "Kaggle 2 - vstepanenko/disaster-tweets" - "Kaggle 3 - sidharth178/disaster-response-messages" - "Zahra et al. - doi: 10.1016/j.ipm.2019.102107" - "CrisisMMD - arxiv: 1805.00713" - "Alam et al. - arxiv: 1805.05151" - "CrisisLexT26 - doi: 10.1145/2675133.2675242" - "Imran et al. - aclanthology: L16-1259" - "CrisisLexT6 - doi: 10.1609/icwsm.v8i1.14538" - "HumAID - doi: 10.1609/icwsm.v15i1.18116" - "CrisisBench - doi: 10.1609/icwsm.v15i1.18115" configs: - config_name: default default: true data_files: - split: train path: "train.csv" - split: validation path: "validation.csv" - split: test path: "test.csv" - config_name: full data_files: "meta/natural-disasters-from-social-media.csv" - config_name: meta data_files: "meta/distributions/*.csv" dataset_info: config_name: default splits: - name: train num_bytes: 39817704 num_examples: 169109 - name: validation num_bytes: 4977163 num_examples: 21139 - name: test num_bytes: 4981112 num_examples: 21139 dataset_size: 49775824 --- # Description Dataset created for Master's thesis "Detection of Catastrophic Events from Social Media" at the Slovak Technical University Faculty of Informatics. Contains posts from social media that are split into two categories: - Informative - related and informative in regards to natural disasters - Non-Informative - unrelated to natural disasters Other metadata include event type, source dataset etc. To balance classes, 50k tweets from twitter archive for years 2017-2022 were added. # Distributions ![Distributions](meta/distributions/split_distribution.png) # Source Datasets: | **Name** | **Count** | |:----------------------------------------------------------------------------------------:|:---------:| | Kaggle 1 - [URL](https://www.kaggle.com/datasets/jannesklaas/disasters-on-social-media) | 951 | | Kaggle 2 - [URL](https://www.kaggle.com/datasets/vstepanenko/disaster-tweets) | 579 | | Kaggle 3 - [URL](https://www.kaggle.com/datasets/sidharth178/disaster-response-messages) | 3782 | | Zahra et al. - [URL](https://doi.org/10.1016/j.ipm.2019.102107) | 6494 | | CrisisMMD - [URL](https://arxiv.org/abs/1805.00713) | 11043 | | Alam et al. - [URL](https://arxiv.org/abs/1805.05151) | 11133 | | CrisisLexT26 - [URL](https://doi.org/10.1145/2675133.2675242) | 14998 | | Imran et al. - [URL](https://aclanthology.org/L16-1259) | 16549 | | CrisisLexT6 - [URL](https://doi.org/10.1609/icwsm.v8i1.14538) | 22672 | | HumAID - [URL](https://doi.org/10.1609/icwsm.v15i1.18116) | 42837 | | CrisisBench - [URL](https://doi.org/10.1609/icwsm.v15i1.18115) | 31158 | | ArchiveTeam - [URL](https://archive.org/details/twitterstream) | 49191 | | **Total** | 211387 | # Total Event counts: | **Type** | **Non-Informative** | **Informative** | **Total** | |:----------:|:-------------------:|:---------------:|:---------:| | Unknown | 61880 | 14740 | 76620 | | Storm | 20944 | 47301 | 68245 | | Flood | 13104 | 14637 | 27741 | | Earthquake | 7844 | 15549 | 23393 | | Fire | 2343 | 8595 | 10938 | | Landslide | 2392 | 384 | 2776 | | Meteorite | 193 | 545 | 738 | | Haze | 51 | 503 | 554 | | Volcano | 243 | 139 | 382 |
提供机构:
joker122322222
二维码
社区交流群
二维码
科研交流群
商业服务