five

Replication Data: A Foundation Model Approach for Disaster Detection from Social Media, News, and Weather Data

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20038115
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the replication data for the paper: Hanny, D., Dastidar, K.G., Wieland, M., Granitzer, M. & Resch, B. (2026). Towards Multimodal Geospatial Reasoning: A Foundation Model Approach for Disaster Detection from Social Media, News, and Weather Data. [Accepted for publication in Natural Hazards] To run our disaster detection benchmark, the required files are h3_aggregates_valid_cells_only.parquet, bsky_posts.parquet, gdelt_articles.parquet. All additional files are for reproducing the specific experiments outlined in the paper. 📱 Social Media Data bsky_posts.parquet: Bluesky social media posts with geoparsed location and semantic/emotion attributes. Contains 676,337 posts. Attributes: cid, uri, author_displayName, author_handle, author_did, createdAt, langs, text, replyCount, repostCount, likeCount, quoteCount, reply_parent_cid, reply_root_cid, image_thumbnails, image_fullsizes, urls, geocoded_dict, language, place, geometry, cleaned_text, topic_id, keywords, topic_label, p_anger, p_fear, p_joy, p_sadness, anger, fear, joy, sadness, no_emotion, disaster_related, event 📰 News Articles Data gdelt_articles.parquet: GDELT global news articles with geographic and thematic metadata. Contains 58,707 articles. Attributes: modeSearch, gkgthemeSearch, datetimeStartSearch, datetimeEndSearch, isOR, countMaximumRecords, urlGDELTV2FTAPI, urlArticle, urlArticleMobile, titleArticle, datetimeArticle, urlImage, domainArticle, languageArticle, countryArticle, idGKG, idDateTimeArticle, dateTimeDocument, idSourceCollectionIdentifier, isDocumentURL, nameSource, domainSource, counts, countsCharLoc, themes, themesCharLoc, locations, locationsCharLoc, persons, personsCharLoc, organizations, organizationsCharLoc, tone, dates, gcam, urlImageRelated, urlSocialMediaImageEmbeds, urlSocialMediaVideoEmbeds, quotations, mentionedNamesCounts, mentionedNumericsCounts, xmlExtras, idTypeLocation*, typeLocation*, location*, idCountry*, idADM1Code*, latitude*, longitude*, idFeature*, geometry, disaster_related, event ℹ️ Few-Shot Examples few_shot_examples.parquet: Curated and anynmised examples for few-shot learning prompts with aggregated social media and news data. Contains 8 examples. Attributes: event, date, h3_index, h3_polygon, h3_resolution, bsky_post_indices, bsky_posts_disaster_related_indices, gdelt_article_indices, gdelt_articles_disaster_related_indices, bsky_posts_count, bsky_posts_disaster_related_count, ratio_bsky_posts_disaster_related, gdelt_articles_count, gdelt_articles_disaster_related_count, ratio_gdelt_articles_disaster_related, ground_truth_indices, ground_truth_count, ground_truth_date, tavg, tmin, tmax, prcp, snow, wdir, wspd, wpgt, pres, tsun, h3_centroid, centroid_location, event_type, prompt_dict, event_detected, event_probability, summary, classification_result 🌍 H3 Aggregated Data h3_aggregates_valid_cells_only.parquet H3 aggregation of social media, news and weather data, with ground reference labels derived from sastellite data. It contains only validated cells with ground truth data, and a total of 2,024 aggregated cells. Attributes: event, date, h3_index, h3_polygon, h3_resolution, bsky_post_indices, bsky_posts_disaster_related_indices, gdelt_article_indices, gdelt_articles_disaster_related_indices, bsky_posts_count, bsky_posts_disaster_related_count, ratio_bsky_posts_disaster_related, gdelt_articles_count, gdelt_articles_disaster_related_count, ratio_gdelt_articles_disaster_related, ground_truth_indices, ground_truth_count, ground_truth_date, tavg, tmin, tmax, prcp, snow, wdir, wspd, wpgt, pres, tsun, h3_centroid, centroid_location, event_type, ground_truth_presence h3_aggregates_cell_based.parquet: Cell-based aggregation of social media posts, news articles, and weather data using H3 hexagonal grids on resolution 4. Contains 2,024 aggregated cells. Attributes: event, date, h3_index, h3_polygon, h3_resolution, bsky_post_indices, bsky_posts_disaster_related_indices, gdelt_article_indices, gdelt_articles_disaster_related_indices, bsky_posts_count, bsky_posts_disaster_related_count, ratio_bsky_posts_disaster_related, gdelt_articles_count, gdelt_articles_disaster_related_count, ratio_gdelt_articles_disaster_related, ground_truth_indices, ground_truth_count, ground_truth_date, tavg, tmin, tmax, prcp, snow, wdir, wspd, wpgt, pres, tsun, h3_centroid, centroid_location, event_type, ground_truth_presence, prompt_dict h3_aggregates_salience.parquet: Score-based aggregation of social media posts, news articles, and weather data using H3 hexagonal grids on resolution 4. Contains 2,024 aggregated cells. Attributes: Same as h3_aggregates_cell_based.parquet h3_aggregates_salience_val_balanced.parquet: Balanced validation subset. Contains 100 aggregated cells. Attributes: Same as h3_aggregates_cell_based.parquet h3_aggregates_salience_val_random.parquet: Random validation subset. Contains 202 aggregated cells. Attributes: Same as h3_aggregates_cell_based.parquet 📖 Citation If you use this code or material in your research, please cite our work accordingly. @article{Hanny.2026, title = {Towards Multimodal Geospatial Reasoning: A Foundation Model Approach for Disaster Detection from Social Media, News, and Weather Data}, author = {Hanny, David and Dastidar, Kanishka Ghosh and Wieland, Marc and Granitzer, Michael and Resch, Bernd}, journal = {Natural Hazards}, year = {2026} }
提供机构:
Zenodo
创建时间:
2026-05-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作