five

HumAID (Human-Annotated Disaster Incidents Data)

收藏
OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/HumAID
下载链接
链接失效反馈
官方服务:
资源简介:
社交网络被广泛用于信息消费和传播,尤其是在自然灾害等时间紧迫的事件中。尽管社交媒体内容的容量非常大,但通常过于嘈杂,无法在任何应用程序中直接使用。因此,对可用内容进行过滤、分类和简明总结以促进有效消费和决策非常重要。为了解决这些问题,已经使用监督建模方法开发了自动分类系统,这要归功于早期创建标记数据集的努力。然而,现有数据集在不同方面(例如,大小、包含重复项)受到限制,不太适合支持更高级和数据密集型的深度学习模型。 HumAID 是一个用于危机信息学研究的大规模数据集,包含约 77,000 条人工标记的推文,从 2016 年至 2019 年间发生的 19 次灾难事件的约 2,400 万条推文中抽取。提供的数据集中的注释包括以下人道主义类别.该数据集仅包含英文推文,它是迄今为止最大的危机信息学数据集。人道主义类别: * 警告和建议 * 流离失所者和疏散 * 不知道无法判断 * 基础设施和公用设施损坏 * 受伤或死亡的人 * 失踪或被发现的人 * 非人道主义 * 其他相关信息 * 请求或紧急需求 * 救援志愿服务或捐赠努力 * 同情和支持

Social networks are widely used for information consumption and dissemination, especially in time-sensitive events such as natural disasters. While the volume of social media content is extremely large, it is often overly noisy for direct use in any application. Therefore, it is critical to filter, classify, and succinctly summarize available content to facilitate effective information consumption and decision-making. To address these issues, automated classification systems have been developed using supervised modeling approaches, thanks to early efforts in creating labeled datasets. However, existing datasets are limited in various aspects (e.g., scale, presence of duplicates) and are not well-suited to support more advanced and data-intensive deep learning models. HumAID is a large-scale dataset for crisis informatics research, containing approximately 77,000 manually labeled tweets extracted from around 24 million tweets across 19 disaster events occurring between 2016 and 2019. The annotations in this dataset cover the following humanitarian categories. This dataset only contains English-language tweets, and it is the largest crisis informatics dataset to date. Humanitarian categories: * Warnings and Advisories * Displaced Persons and Evacuations * Unknown/Unclassifiable * Infrastructure and Utility Damage * Injured or Deceased Individuals * Missing or Found Persons * Non-Humanitarian * Other Relevant Information * Requests or Urgent Needs * Rescue, Volunteerism, or Donation Efforts * Sympathy and Support
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
HumAID是一个大规模危机信息学数据集,包含约77,000条人工标注的灾难相关英文推文,覆盖19次灾难事件,标注了11个人道主义类别,是目前该领域最大的数据集。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作