five

sachinkg12/us-county-hazard-features

收藏
Hugging Face2026-03-14 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sachinkg12/us-county-hazard-features
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - tabular-classification tags: - disaster-prediction - fema - hazard-assessment - climate - geospatial - us-counties - multi-hazard - cascade-interactions pretty_name: US County Multi-Hazard Features for Disaster Declaration Prediction size_categories: - 1M<n<10M --- # US County Multi-Hazard Features Dataset A curated, ML-ready dataset of **1,014,930 county-month observations** spanning **3,222 US counties** from 2000 to 2026, integrating 7 federal data sources into 42 engineered features for predicting FEMA disaster declarations 90 days in advance. ## Dataset Summary | Property | Value | |----------|-------| | **Rows** | 1,014,930 | | **Columns** | 50 (42 features + target + metadata) | | **Counties** | 3,222 (all US counties with available data) | | **Time span** | 2000-01 to 2026-03 | | **Granularity** | County-month | | **Target** | `declaration_next_90d` — binary, FEMA disaster declaration within 90 days | | **Positive rate** | 11.02% | | **Format** | Apache Parquet (Snappy compression) | ## Data Sources This dataset integrates 7 federal data sources — all free, no API keys required: | Source | What | |--------|------| | **FEMA Disaster Declarations** | Every federal disaster declaration by county (1953–present) | | **USGS Earthquakes** | M2.5+ seismic events with coordinates (1964–present) | | **NOAA Storm Events** | Tornadoes, floods, hurricanes, hail + casualties/damage (2000–present) | | **US Census Bureau** | County demographics, housing, economics | | **US Drought Monitor** | Weekly drought severity by county, D0–D4 (2000–present) | | **NIFC Wildfires** | Wildfire incidents with acres burned (2000–present) | | **NFIP Flood Claims** | National Flood Insurance Program claims and payouts (1978–present) | ## Features (42 total) ### FEMA History (7) Rolling window declaration counts and recency metrics. | Feature | Type | Description | |---------|------|-------------| | `declarations_1yr` | int | Disaster declarations in prior 1 year | | `declarations_3yr` | int | Disaster declarations in prior 3 years | | `declarations_5yr` | int | Disaster declarations in prior 5 years | | `declarations_10yr` | int | Disaster declarations in prior 10 years | | `months_since_last_decl` | int | Months since most recent declaration (-1 if none) | | `major_disaster_ratio` | float | Fraction of declarations that were major disasters | | `ia_program_ratio` | float | Fraction with Individual Assistance programs | ### Storm Events (10) NOAA severe weather aggregations. | Feature | Type | Description | |---------|------|-------------| | `storm_event_count_1yr` | int | Storm events in prior 1 year | | `storm_event_count_5yr` | int | Storm events in prior 5 years | | `storm_deaths_5yr` | int | Storm-related deaths in prior 5 years | | `storm_injuries_5yr` | int | Storm-related injuries in prior 5 years | | `storm_property_damage_5yr` | float | Property damage ($) in prior 5 years | | `storm_crop_damage_5yr` | float | Crop damage ($) in prior 5 years | | `tornado_count_5yr` | int | Tornado events in prior 5 years | | `flood_count_5yr` | int | Flood events in prior 5 years | | `hail_count_5yr` | int | Hail events in prior 5 years | | `max_tor_f_scale_5yr` | int | Maximum tornado F-scale in prior 5 years | ### Socioeconomic (5) US Census demographic and economic indicators. | Feature | Type | Description | |---------|------|-------------| | `population` | long | County population | | `housing_units` | long | Number of housing units | | `median_home_value` | long | Median home value ($) | | `population_density` | float | People per square mile | | `land_area_sq_mi` | float | County land area in square miles | ### Drought (4) US Drought Monitor severity metrics. | Feature | Type | Description | |---------|------|-------------| | `drought_severity_avg_5yr` | float | Average drought severity score (5yr) | | `drought_max_severity_5yr` | float | Maximum drought severity score (5yr) | | `severe_drought_weeks_5yr` | int | Weeks of severe drought (D2+) in 5 years | | `drought_d4_pct_max_5yr` | float | Peak percentage of county in D4 (exceptional) drought | ### Wildfire (4) NIFC wildfire incident metrics. | Feature | Type | Description | |---------|------|-------------| | `wildfire_count_1yr` | int | Wildfire incidents in prior 1 year | | `wildfire_count_5yr` | int | Wildfire incidents in prior 5 years | | `wildfire_acres_burned_5yr` | float | Total acres burned in prior 5 years | | `wildfire_max_acres_5yr` | float | Largest single wildfire (acres) in 5 years | ### NFIP Flood Insurance (3) National Flood Insurance Program claim patterns. | Feature | Type | Description | |---------|------|-------------| | `nfip_claim_count_5yr` | int | NFIP claims in prior 5 years | | `nfip_total_payout_5yr` | float | Total NFIP payouts ($) in prior 5 years | | `nfip_avg_payout_5yr` | float | Average NFIP payout ($) in prior 5 years | ### Spatial (2) Neighborhood and state-level context. | Feature | Type | Description | |---------|------|-------------| | `neighbor_avg_declarations_5yr` | float | Average 5yr declarations of neighboring counties | | `state_avg_declarations_5yr` | float | Average 5yr declarations across the state | ### Cascade Interaction Features (7) **Novel contribution**: Multiplicative interaction terms capturing multi-hazard co-occurrence. | Feature | Type | Description | |---------|------|-------------| | `cascade_drought_fire_risk` | float | `drought_severity_6mo × log1p(wildfire_acres_1yr)` | | `cascade_fire_flood_risk` | float | `log1p(burn_scar_acres_18mo) × flood_events_1yr` | | `cascade_hurricane_flood_risk` | float | `hurricane_declarations_60d × flood_events_30d` | | `cascade_earthquake_landslide_risk` | float | `significant_quakes_90d × severe_storms_30d` | | `cascade_storm_compound_count` | int | Severe storms in prior 30 days (compound events) | | `cascade_active_chains` | int | Count of active cascade interactions (0–5) | | `cascade_max_chain_length` | int | Longest active hazard chain (1–3) | ### Target & Metadata | Column | Type | Description | |--------|------|-------------| | `fips` | string | 5-digit FIPS county code | | `year_month` | string | Observation month (YYYY-MM) | | `declaration_next_90d` | bool | **Target**: FEMA declaration within 90 days | | `declaration_type_next_90d` | string | Declaration type if positive (DR, EM, etc.) | ## Usage ```python import pandas as pd df = pd.read_parquet("us-county-hazard-features.parquet") # Temporal train/test split (recommended) train = df[df["year_month"] < "2022-01"] val = df[(df["year_month"] >= "2022-01") & (df["year_month"] < "2023-01")] test = df[(df["year_month"] >= "2023-01") & (df["year_month"] <= "2024-12")] # Note: Exclude months after 2024-12 — FEMA declaration data is incomplete # Feature columns (42 features, no temporal — see paper for ablation justification) FEATURE_COLS = [c for c in df.columns if c not in [ "fips", "year_month", "declaration_next_90d", "declaration_type_next_90d", "month_of_year", "is_hurricane_season", "is_tornado_season", "is_wildfire_season" ]] ``` ## Benchmark Results Using XGBoost with temporal split: | Model | ROC-AUC | PR-AUC | F1 | |-------|---------|--------|-----| | Naive (prior) | 0.500 | 0.549 | 0.000 | | Logistic Regression | 0.542 | 0.139 | 0.199 | | Random Forest | 0.845 | 0.287 | 0.334 | | **XGBoost** | **0.893** | **0.555** | **0.482** | 95% Bootstrap CI: ROC-AUC [0.890, 0.896] ## Key Findings 1. **FEMA Dominance**: Removing FEMA features drops AUC from 0.89 to 0.63 — declaration history is the strongest predictor, suggesting the federal process is path-dependent. 2. **Cascade Interactions**: Multi-hazard cascade features improve compound disaster detection (recall lift +2.9% for cascade events, ROC-AUC 0.907 vs 0.893 overall). 3. **Declaration Equity**: Low-income counties (Q1) show 2.3x higher prediction residuals than wealthy counties (Q4) at the same hazard exposure level (p < 1e-100), suggesting structural inequities in federal disaster declarations. ## Citation ```bibtex @dataset{gupta2026uscountyhazard, title={US County Multi-Hazard Features for Disaster Declaration Prediction}, author={Gupta, Sachin}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/datasets/sachinkg12/us-county-hazard-features} } ``` ## License Apache 2.0 ## Source Code [HazardCast](https://github.com/sachinkg12/HazardCast) — Full pipeline: data ingestion, feature engineering, model training, and REST API.

license: Apache-2.0 任务类别: - 表格分类(tabular-classification) 标签: - 灾害预测(disaster-prediction) - 联邦应急管理局(FEMA) - 灾害评估(hazard-assessment) - 气候(climate) - 地理空间(geospatial) - 美国郡县(us-counties) - 多灾害(multi-hazard) - 级联交互(cascade-interactions) 友好名称:用于灾害申报预测的美国郡县多灾害特征数据集 数据规模类别: - 100万<样本量<1000万 # 美国郡县多灾害特征数据集 这是一份经过精心整理、可直接用于机器学习的数据集,包含2000年至2026年间覆盖3222个美国郡县的1,014,930条郡县-月度观测样本,整合了7个联邦数据源的信息,构建了42个工程化特征,用于提前90天预测联邦应急管理局(FEMA)的灾害申报。 ## 数据集概览 | 属性 | 取值 | |----------|-------| | **行数** | 1,014,930 | | **列数** | 50(42个特征 + 目标变量 + 元数据) | | **覆盖郡县数** | 3,222(所有有可用数据的美国郡县) | | **时间跨度** | 2000-01 至 2026-03 | | **数据粒度** | 郡县-月度 | | **目标变量** | `declaration_next_90d` —— 二分类变量,表示90天内是否会发生联邦应急管理局(FEMA)灾害申报 | | **正样本比例** | 11.02% | | **数据格式** | Apache Parquet(Snappy压缩) | ## 数据来源 本数据集整合了7个联邦数据源的信息——所有数据源均免费,无需API密钥: | 数据源 | 内容说明 | |--------|----------| | **联邦应急管理局(FEMA)灾害申报数据** | 1953年至今的所有郡县级别联邦灾害申报记录 | | **美国地质调查局(USGS)地震数据** | 1964年至今的震级M2.5及以上、带坐标的地震事件 | | **美国国家海洋和大气管理局(NOAA)风暴事件数据** | 2000年至今的龙卷风、洪水、飓风、冰雹事件及人员伤亡、财产损失记录 | | **美国人口普查局数据** | 郡县人口统计、住房、经济相关指标 | | **美国干旱监测数据** | 2000年至今的每周郡县干旱严重程度等级(D0–D4) | | **国家野火协调机构(NIFC)野火数据** | 2000年至今的野火事件及过火面积记录 | | **国家洪水保险计划(NFIP)洪水索赔数据** | 1978年至今的国家洪水保险计划索赔及赔付记录 | ## 特征(共42个) ### FEMA申报历史(7个) 滚动窗口申报计数及近期性指标。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `declarations_1yr` | int | 过去1年内的灾害申报次数 | | `declarations_3yr` | int | 过去3年内的灾害申报次数 | | `declarations_5yr` | int | 过去5年内的灾害申报次数 | | `declarations_10yr` | int | 过去10年内的灾害申报次数 | | `months_since_last_decl` | int | 距上一次灾害申报的月数(无申报记录则为-1) | | `major_disaster_ratio` | float | 重大灾害申报占总申报的比例 | | `ia_program_ratio` | float | 包含个人援助项目的申报占比 | ### 风暴事件(10个) 美国国家海洋和大气管理局(NOAA)强天气聚合特征。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `storm_event_count_1yr` | int | 过去1年内的风暴事件总数 | | `storm_event_count_5yr` | int | 过去5年内的风暴事件总数 | | `storm_deaths_5yr` | int | 过去5年内的风暴相关死亡人数 | | `storm_injuries_5yr` | int | 过去5年内的风暴相关受伤人数 | | `storm_property_damage_5yr` | float | 过去5年内的风暴造成的财产损失(美元) | | `storm_crop_damage_5yr` | float | 过去5年内的风暴造成的农作物损失(美元) | | `tornado_count_5yr` | int | 过去5年内的龙卷风事件数 | | `flood_count_5yr` | int | 过去5年内的洪水事件数 | | `hail_count_5yr` | int | 过去5年内的冰雹事件数 | | `max_tor_f_scale_5yr` | int | 过去5年内出现的最大龙卷风F级评级 | ### 社会经济特征(5个) 美国人口普查局的人口统计与经济指标。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `population` | long | 郡县总人口 | | `housing_units` | long | 住房单元总数 | | `median_home_value` | long | 住房均价(美元) | | `population_density` | float | 人口密度(人/平方英里) | | `land_area_sq_mi` | float | 郡县陆地面积(平方英里) | ### 干旱特征(4个) 美国干旱监测严重程度指标。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `drought_severity_avg_5yr` | float | 过去5年的平均干旱严重程度得分 | | `drought_max_severity_5yr` | float | 过去5年的最大干旱严重程度得分 | | `severe_drought_weeks_5yr` | int | 过去5年内遭遇严重干旱(D2及以上)的周数 | | `drought_d4_pct_max_5yr` | float | 过去5年内郡县遭遇极端干旱(D4级)的最大占比 | ### 野火特征(4个) 国家野火协调机构(NIFC)野火事件指标。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `wildfire_count_1yr` | int | 过去1年内的野火事件数 | | `wildfire_count_5yr` | int | 过去5年内的野火事件数 | | `wildfire_acres_burned_5yr` | float | 过去5年内的总过火面积 | | `wildfire_max_acres_5yr` | float | 过去5年内单次最大野火的过火面积 | ### 国家洪水保险计划(NFIP)洪水保险特征(3个) 国家洪水保险计划索赔模式特征。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `nfip_claim_count_5yr` | int | 过去5年内的NFIP索赔次数 | | `nfip_total_payout_5yr` | float | 过去5年内的NFIP总赔付金额(美元) | | `nfip_avg_payout_5yr` | float | 过去5年内的NFIP平均赔付金额(美元) | ### 空间特征(2个) 邻域及州级上下文特征。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `neighbor_avg_declarations_5yr` | float | 邻郡过去5年的平均灾害申报次数 | | `state_avg_declarations_5yr` | float | 全州过去5年的平均灾害申报次数 | ### 级联交互特征(7个) **创新贡献**:用于捕捉多灾害共现关系的乘法交互项。 | 特征名称 | 数据类型 | 描述 | |---------|------|-------------| | `cascade_drought_fire_risk` | float | 6个月干旱严重程度 × log1p(1年内过火面积) | | `cascade_fire_flood_risk` | float | log1p(18个月过火疤痕面积) × 1年内洪水事件数 | | `cascade_hurricane_flood_risk` | float | 60天内飓风申报次数 × 30天内洪水事件数 | | `cascade_earthquake_landslide_risk` | float | 90天内显著地震次数 × 30天内强风暴次数 | | `cascade_storm_compound_count` | int | 过去30天内的复合强风暴事件数 | | `cascade_active_chains` | int | 活跃级联交互链的数量(0–5) | | `cascade_max_chain_length` | int | 最长活跃灾害链的长度(1–3) | ### 目标变量与元数据 | 列名 | 数据类型 | 描述 | |--------|------|-------------| | `fips` | string | 5位FIPS郡县代码 | | `year_month` | string | 观测月份(格式为YYYY-MM) | | `declaration_next_90d` | bool | **目标变量**:90天内是否会发生FEMA灾害申报 | | `declaration_type_next_90d` | string | 若为正样本则为灾害申报类型(如DR、EM等) | ## 使用方法 python import pandas as pd df = pd.read_parquet("us-county-hazard-features.parquet") # 时序划分训练集/验证集/测试集(推荐方案) train = df[df["year_month"] < "2022-01"] val = df[(df["year_month"] >= "2022-01") & (df["year_month"] < "2023-01")] test = df[(df["year_month"] >= "2023-01") & (df["year_month"] <= "2024-12")] # 注意:排除2024年12月之后的月份——联邦应急管理局(FEMA)的申报数据尚未完整 # 特征列(共42个,不含时间相关特征——详见论文中的消融实验依据) FEATURE_COLS = [c for c in df.columns if c not in [ "fips", "year_month", "declaration_next_90d", "declaration_type_next_90d", "month_of_year", "is_hurricane_season", "is_tornado_season", "is_wildfire_season" ]] ## 基准测试结果 采用XGBoost模型与时序划分方案: | 模型 | ROC-AUC | PR-AUC | F1分数 | |-------|---------|--------|-----| | 朴素基准(先验概率) | 0.500 | 0.549 | 0.000 | | 逻辑回归 | 0.542 | 0.139 | 0.199 | | 随机森林 | 0.845 | 0.287 | 0.334 | | **XGBoost** | **0.893** | **0.555** | **0.482** | 95% Bootstrap置信区间:ROC-AUC为[0.890, 0.896] ## 关键发现 1. **FEMA申报历史的主导作用**:移除FEMA相关特征后,AUC从0.89降至0.63——灾害申报历史是最强的预测因子,表明联邦灾害申报流程具有路径依赖性。 2. **级联交互特征的增益**:多灾害级联特征提升了复合灾害的检测能力(级联事件的召回率提升+2.9%,整体ROC-AUC从0.893提升至0.907)。 3. **灾害申报公平性问题**:在相同灾害暴露水平下,低收入郡县(Q1)的预测残差是高收入郡县(Q4)的2.3倍(p < 1e-100),表明联邦灾害申报流程存在结构性公平性缺陷。 ## 引用格式 bibtex @dataset{gupta2026uscountyhazard, title={US County Multi-Hazard Features for Disaster Declaration Prediction}, author={Gupta, Sachin}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/datasets/sachinkg12/us-county-hazard-features} } ## 许可证 Apache 2.0 ## 源代码 [HazardCast](https://github.com/sachinkg12/HazardCast) — 完整流水线涵盖:数据摄取、特征工程、模型训练与REST API部署。
提供机构:
sachinkg12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作