five

HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/zppcyk2fdt
下载链接
链接失效反馈
官方服务:
资源简介:
Social networks are widely used for information consump- tion and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large vol- ume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, catego- rize, and concisely summarize the available content to facil- itate effective consumption and decision-making. To address such issues automatic classification systems have been de- veloped using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with ∼77K human-labeled tweets, sampled from a pool of ∼24 million tweets across 19 disas- ter events that happened between 2016 and 2019. Moreover, we propose a data collection and sampling pipeline, which is important for social media data sampling for human annota- tion. We report multiclass classification results using classic and deep learning (fastText and transformer) based models to set the ground for future studies. The dataset and associated resources are publicly available: https://crisisnlp.qcri.org/humaid_ dataset.html
创建时间:
2021-04-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作