Replication Data: A Foundation Model Approach for Disaster Detection from Social Media, News, and Weather Data
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20038115
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the replication data for the paper:
Hanny, D., Dastidar, K.G., Wieland, M., Granitzer, M. & Resch, B. (2026). Towards Multimodal Geospatial Reasoning: A Foundation Model Approach for Disaster Detection from Social Media, News, and Weather Data. [Accepted for publication in Natural Hazards]
To run our disaster detection benchmark, the required files are h3_aggregates_valid_cells_only.parquet, bsky_posts.parquet, gdelt_articles.parquet. All additional files are for reproducing the specific experiments outlined in the paper.
📱 Social Media Data
bsky_posts.parquet: Bluesky social media posts with geoparsed location and semantic/emotion attributes. Contains 676,337 posts.
Attributes: cid, uri, author_displayName, author_handle, author_did, createdAt, langs, text, replyCount, repostCount, likeCount, quoteCount, reply_parent_cid, reply_root_cid, image_thumbnails, image_fullsizes, urls, geocoded_dict, language, place, geometry, cleaned_text, topic_id, keywords, topic_label, p_anger, p_fear, p_joy, p_sadness, anger, fear, joy, sadness, no_emotion, disaster_related, event
📰 News Articles Data
gdelt_articles.parquet: GDELT global news articles with geographic and thematic metadata. Contains 58,707 articles.
Attributes: modeSearch, gkgthemeSearch, datetimeStartSearch, datetimeEndSearch, isOR, countMaximumRecords, urlGDELTV2FTAPI, urlArticle, urlArticleMobile, titleArticle, datetimeArticle, urlImage, domainArticle, languageArticle, countryArticle, idGKG, idDateTimeArticle, dateTimeDocument, idSourceCollectionIdentifier, isDocumentURL, nameSource, domainSource, counts, countsCharLoc, themes, themesCharLoc, locations, locationsCharLoc, persons, personsCharLoc, organizations, organizationsCharLoc, tone, dates, gcam, urlImageRelated, urlSocialMediaImageEmbeds, urlSocialMediaVideoEmbeds, quotations, mentionedNamesCounts, mentionedNumericsCounts, xmlExtras, idTypeLocation*, typeLocation*, location*, idCountry*, idADM1Code*, latitude*, longitude*, idFeature*, geometry, disaster_related, event
ℹ️ Few-Shot Examples
few_shot_examples.parquet: Curated and anynmised examples for few-shot learning prompts with aggregated social media and news data. Contains 8 examples.
Attributes: event, date, h3_index, h3_polygon, h3_resolution, bsky_post_indices, bsky_posts_disaster_related_indices, gdelt_article_indices, gdelt_articles_disaster_related_indices, bsky_posts_count, bsky_posts_disaster_related_count, ratio_bsky_posts_disaster_related, gdelt_articles_count, gdelt_articles_disaster_related_count, ratio_gdelt_articles_disaster_related, ground_truth_indices, ground_truth_count, ground_truth_date, tavg, tmin, tmax, prcp, snow, wdir, wspd, wpgt, pres, tsun, h3_centroid, centroid_location, event_type, prompt_dict, event_detected, event_probability, summary, classification_result
🌍 H3 Aggregated Data
h3_aggregates_valid_cells_only.parquet H3 aggregation of social media, news and weather data, with ground reference labels derived from sastellite data. It contains only validated cells with ground truth data, and a total of 2,024 aggregated cells.
Attributes: event, date, h3_index, h3_polygon, h3_resolution, bsky_post_indices, bsky_posts_disaster_related_indices, gdelt_article_indices, gdelt_articles_disaster_related_indices, bsky_posts_count, bsky_posts_disaster_related_count, ratio_bsky_posts_disaster_related, gdelt_articles_count, gdelt_articles_disaster_related_count, ratio_gdelt_articles_disaster_related, ground_truth_indices, ground_truth_count, ground_truth_date, tavg, tmin, tmax, prcp, snow, wdir, wspd, wpgt, pres, tsun, h3_centroid, centroid_location, event_type, ground_truth_presence
h3_aggregates_cell_based.parquet: Cell-based aggregation of social media posts, news articles, and weather data using H3 hexagonal grids on resolution 4. Contains 2,024 aggregated cells.
Attributes: event, date, h3_index, h3_polygon, h3_resolution, bsky_post_indices, bsky_posts_disaster_related_indices, gdelt_article_indices, gdelt_articles_disaster_related_indices, bsky_posts_count, bsky_posts_disaster_related_count, ratio_bsky_posts_disaster_related, gdelt_articles_count, gdelt_articles_disaster_related_count, ratio_gdelt_articles_disaster_related, ground_truth_indices, ground_truth_count, ground_truth_date, tavg, tmin, tmax, prcp, snow, wdir, wspd, wpgt, pres, tsun, h3_centroid, centroid_location, event_type, ground_truth_presence, prompt_dict
h3_aggregates_salience.parquet: Score-based aggregation of social media posts, news articles, and weather data using H3 hexagonal grids on resolution 4. Contains 2,024 aggregated cells.
Attributes: Same as h3_aggregates_cell_based.parquet
h3_aggregates_salience_val_balanced.parquet: Balanced validation subset. Contains 100 aggregated cells.
Attributes: Same as h3_aggregates_cell_based.parquet
h3_aggregates_salience_val_random.parquet: Random validation subset. Contains 202 aggregated cells.
Attributes: Same as h3_aggregates_cell_based.parquet
📖 Citation
If you use this code or material in your research, please cite our work accordingly.
@article{Hanny.2026,
title = {Towards Multimodal Geospatial Reasoning: A Foundation Model Approach for Disaster Detection from Social Media, News, and Weather Data},
author = {Hanny, David and Dastidar, Kanishka Ghosh and Wieland, Marc and Granitzer, Michael and Resch, Bernd},
journal = {Natural Hazards},
year = {2026}
}
提供机构:
Zenodo
创建时间:
2026-05-05



