KC-OverdoseModels2025/Dataset_Pairs_2021-2025_Census-Rate
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/KC-OverdoseModels2025/Dataset_Pairs_2021-2025_Census-Rate
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
size_categories:
- 1K<n<10K
---
# Fentanyl Overdose Forecasting — Lagged Census-to-Overdose Datasets
**Author: Ansh Gupta**
## Overview
Seven paired census-to-overdose datasets for King County, Washington, designed for temporal forecasting of opioid/fentanyl overdose death rates at the census tract level. Each dataset pairs American Community Survey (ACS) socioeconomic features from one year with overdose rates from a subsequent year, creating a lagged prediction framework.
## Dataset Structure
Each file pairs **input** census features from year *t* with **target** overdose rates from a future year, at either a 1-year lag (t-1) or 2-year lag (t-2).
### t-1 Lag (1-Year Gap)
| File | Census Input | Overdose Target | Tracts |
|---|---|---|---|
| `lagged_2021census_2022overdose_no_cRATE.csv` | 2021 ACS (2017–2021) | 2022 rates | 308 |
| `lagged_2022census_2023overdose.csv` | 2022 ACS (2018–2022) | 2023 rates | 463 |
| `lagged_2023census_2024overdose.csv` | 2023 ACS (2019–2023) | 2024 rates | 463 |
| `lagged_2024census_TTMoverdose.csv` | 2024 ACS (2019–2023) | 2025 TTM rates | 463 |
### t-2 Lag (2-Year Gap)
Located in the `t-2/` subdirectory.
| File | Census Input | Overdose Target | Tracts |
|---|---|---|---|
| `lagged_2021census_2023overdose_no_cRATE.csv` | 2021 ACS (2017–2021) | 2023 rates | 308 |
| `lagged_2022census_2024overdose.csv` | 2022 ACS (2018–2022) | 2024 rates | 463 |
| `lagged_2023census_TTMoverdose.csv` | 2023 ACS (2019–2023) | 2025 TTM rates | 463 |
> **Note**: The 2021 files contain 308 tracts and 57 columns. These files lack the `Rate_Independent` column (and associated neighbor rate columns) because no prior-year overdose rate data was available for the 2021 input year. All other files contain 463 tracts and 60 columns.
## Variables
### Identifiers
| Column | Description |
|---|---|
| `GIDTR` | Census tract FIPS code |
| `State` / `State_name` | State FIPS code and name |
| `County` / `County_name` | County FIPS code and name |
| `Tract` | Tract number |
| `Num_BGs_in_Tract` | Number of block groups in tract |
### Socioeconomic Features (ACS)
| Column | Description |
|---|---|
| `Med_HHD_Inc_Thousands_ACS__` | Median household income (thousands $) |
| `pct_Prs_Blw_Pov_Lev_ACS__` | % persons below poverty level |
| `pct_Civ_unemp_16p_ACS__` | % civilian unemployed (age 16+) |
| `pct_Not_HS_Grad_ACS__` | % without high school diploma |
| `pct_College_ACS__` | % with college degree |
| `pct_Renter_Occp_HU_ACS__` | % renter-occupied housing units |
| `pct_Vacant_Units_ACS__` | % vacant housing units |
### Demographics
| Column | Description |
|---|---|
| `Tot_Population_ACS__` | Total population |
| `pct_Males_ACS__` | % male |
| `pct_NH_White_alone_ACS__` | % non-Hispanic White |
| `pct_NH_Blk_alone_ACS__` | % non-Hispanic Black |
| `pct_Pop_25_44_ACS__` | % age 25–44 |
| `pct_Pop_Below25_ACS__` | % age below 25 |
| `pct_Pop_45plus_ACS__` | % age 45+ |
| `Pop_density_calculated_ACS` | Population density (per unit land area) |
| `LAND_AREA` | Land area of tract |
### Healthcare Access
| Column | Description |
|---|---|
| `No_Health_Ins_ACS__` | Count of persons without health insurance |
| `Pct_No_Health_Ins_CALCULATED_ACS__` | % without health insurance |
| `dist_general_km` | Distance to nearest general healthcare facility (km) |
| `dist_sud_km` | Distance to nearest substance use disorder (SUD) facility (km) |
| `general_facility_count` | General healthcare facilities in tract |
| `sud_facility_count` | SUD treatment facilities in tract |
| `general_facility_count_5km` | General facilities within 5 km |
| `sud_facility_count_5km` | SUD facilities within 5 km |
| `general_facilities_per_10k` | General facilities per 10,000 population |
| `sud_facilities_per_10k` | SUD facilities per 10,000 population |
| `general_facilities_per_10k_5km` | General facilities per 10k within 5 km |
| `sud_facilities_per_10k_5km` | SUD facilities per 10k within 5 km |
| `accessibility_general_2sfca_*_per10k` | 2-Step Floating Catchment Area accessibility scores (10/20/30 km) |
| `accessibility_sud_2sfca_*_per10k` | SUD-specific 2SFCA accessibility scores (10/20/30 km) |
### Overdose Outcome Variables
| Column | Description |
|---|---|
| `Rate` | **Target variable** — overdose death rate per 100,000 |
| `Count` | Overdose death count (may be suppressed as "1-9") |
| `Rate_M` | Rate margin/reliability indicator |
| `Rate_M_CI` | Rate confidence interval |
| `Intent` | Intent classification (Drug_OD) |
| `Period` | Time period of overdose data |
### Historical Rate Features
| Column | Description |
|---|---|
| `Rate_Independent` | Prior-year overdose rate (not available in 2021 files) |
| `Rate_Independent_Neighbor_1` | Prior-year rate of nearest neighbor tract |
| `Rate_Independent_Neighbor_Avg` | Average prior-year rate of 3 nearest neighbor tracts |
### Spatial Neighbor Features
| Column | Description |
|---|---|
| `Med_HHD_Inc_Thousands_ACS___Neighbor_1` | Median income of nearest neighbor tract |
| `Med_HHD_Inc_Thousands_ACS___Neighbor_Avg` | Average median income of 3 nearest neighbors |
| `pct_Renter_Occp_HU_ACS___Neighbor_1` | Renter rate of nearest neighbor |
| `pct_Renter_Occp_HU_ACS___Neighbor_Avg` | Average renter rate of 3 nearest neighbors |
| `pct_Prs_Blw_Pov_Lev_ACS___Neighbor_1` | Poverty rate of nearest neighbor |
| `pct_Prs_Blw_Pov_Lev_ACS___Neighbor_Avg` | Average poverty rate of 3 nearest neighbors |
| `pct_Vacant_Units_ACS___Neighbor_1` | Vacancy rate of nearest neighbor |
| `pct_Vacant_Units_ACS___Neighbor_Avg` | Average vacancy rate of 3 nearest neighbors |
| `Pct_No_Health_Ins_CALCULATED_ACS___Neighbor_1` | Uninsured rate of nearest neighbor |
| `Pct_No_Health_Ins_CALCULATED_ACS___Neighbor_Avg` | Average uninsured rate of 3 nearest neighbors |
## Data Sources
- **Census data**: American Community Survey (ACS) 5-year estimates, U.S. Census Bureau
- **Overdose rates**: CDC WONDER, drug overdose death rates per 100,000 population
- **Healthcare facilities**: SAMHSA treatment locator and general healthcare facility databases
- **Accessibility scores**: Computed using the 2-Step Floating Catchment Area (2SFCA) method
## Region
- **Geography**: King County, Washington (includes Seattle metro area)
- **Unit of observation**: Census tract
## Usage
```python
import pandas as pd
# Load a single dataset (e.g., 2024 census → 2025 overdose prediction)
df = pd.read_csv("lagged_2024census_TTMoverdose.csv")
# Features (X) and target (y)
X = df[["Med_HHD_Inc_Thousands_ACS__", "pct_Renter_Occp_HU_ACS__", ...]]
y = df["Rate"]
```
## License
MIT
提供机构:
KC-OverdoseModels2025



