five

alinutzal/CA-HVF2017

收藏
Hugging Face2026-03-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/alinutzal/CA-HVF2017
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc language: - en tags: - synthetic-data - transportation - vehicle-fleet - california - agent-based-models - emissions pretty_name: 'CA-HVF2017: California Household Vehicle Fleet Dataset' size_categories: - 10M<n<100M --- [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) # CA-HVF2017: California Household Vehicle Fleet Dataset (2017) A statewide synthetic vehicle fleet dataset containing 13 million households and over ~25+ million vehicles across California. This dataset provides detailed household-level vehicle ownership information, including vehicle type, powertrain, vintage, and body type, generated using a Multiple Discrete Continuous Extreme Value (MDCEV) model and a geographically explicit synthetic population. This dataset is designed to support transportation modeling, energy/emissions analysis, policy scenario evaluation, EV adoption studies, and agent-based simulations. --- ## Overview Modern transportation models require realistic household vehicle fleets, but privacy constraints limit access to micro-level vehicle ownership data. CA-HVF2017 fills this gap by providing a statewide, publicly available, synthetic, validated representation of vehicle ownership across all California census tracts. The dataset includes: - Household characteristics (demographics, size, income) - Detailed vehicle fleets for each household - Geographic attributes at the census-tract level - Location-based accessibility and built environment indicators All values are synthetic but statistically consistent with real-world data. --- # Data Dictionary: Input Below is the full set of household-level variables included in the CA-HVF2017 dataset. --- ## Household Demographics | Variable | Description | Values | |---------|-------------|--------| | child | Household has at least one child | 0, 1 | | HOUSEID | Household ID | numeric | | HHSIZE | Household size | numeric | | HHSIZE1 | Household size = 1 | 0, 1 | | HHSIZE2 | Household size = 2 | 0, 1 | | HHSIZE3 | Household size = 3 | 0, 1 | | HHSIZE4 | Household size = 4 or more | 0, 1 | | NUMADLT | Number of adults | numeric | | NUMCHILD | Number of children | numeric | | NUM_WORKERS | Number of workers in household | numeric | | retired | Household has at least one retiree | 0, 1 | --- ## Race / Ethnicity Indicators | Variable | Description | Values | |---------|-------------|--------| | hhwhite | Householder identifies as White | 0, 1 | | hhasian | Householder identifies as Asian | 0, 1 | | hhblack | Householder identifies as Black or African American | 0, 1 | | hhothers | Householder identifies as another race (not White/Black/Asian) | 0, 1 | --- ## Income Categories | Variable | Description | Values | |---------|-------------|--------| | income1 | Income < $25,000 | 0, 1 | | income2 | $25,000 ≤ income < $50,000 | 0, 1 | | income3 | $50,000 ≤ income < $75,000 | 0, 1 | | income4 | $75,000 ≤ income < $100,000 | 0, 1 | | income5 | Income ≥ $100,000 | 0, 1 | --- ## Life Cycle Categories | Variable | Description | Values | |---------|-------------|--------| | LIF_CYC1 | 1 adult, no children | 0, 1 | | LIF_CYC2 | 2+ adults, no children | 0, 1 | | LIF_CYC3 | 1 adult + child age 0–5 | 0, 1 | | LIF_CYC4 | 2+ adults + child age 0–5 | 0, 1 | | LIF_CYC5 | 1 adult + child age 6–15 | 0, 1 | | LIF_CYC6 | 2+ adults + child age 6–15 | 0, 1 | | LIF_CYC7 | 1 adult + child age 16–21 | 0, 1 | | LIF_CYC8 | 2+ adults + child age 16–21 | 0, 1 | | LIF_CYC9 | Household has at least one senior (65+) | 0, 1 | | LIF_CYC10 | Household has 2+ seniors (65+) | 0, 1 | --- ## Housing & Tenure | Variable | Description | Values | |---------|-------------|--------| | hhown | Household owns home | 0, 1 | | perrent | % rental housing in tract | numeric | | perrent1 | Rental housing < 25% | 0, 1 | | perrent2 | Rental housing 25–45% | 0, 1 | | perrent3 | Rental housing > 45% | 0, 1 | --- ## Work Status | Variable | Description | Values | |---------|-------------|--------| | work0 | No members employed | 0, 1 | | work1 | 1 worker in household | 0, 1 | | work2 | 2 workers in household | 0, 1 | | work3 | 3+ workers in household | 0, 1 | --- ## Geographic Identifiers | Variable | Description | Values | |---------|-------------|--------| | county_fips | County FIPS code | numeric | | county_name | County name | character | | state_fips | State FIPS code | numeric | | state_name | State name | character | | tractid | Census tract ID | numeric | --- ## Transit & Accessibility Variables | Variable | Description | Values | |---------|-------------|--------| | emp_zscore | Standardized jobs reachable by 30-min transit | numeric | | tractmean | Average number of jobs reachable from tract | numeric | | tas_acres | Total acres accessible via 30-minute transit | numeric | | tci | Transit Connectivity Index (0–100) | 0–100 | | hi_tps | AllTransit Performance Score ≥ 8 | 0, 1 | | transit_performance_score | Transit Performance Score (0–10) | 0–10 | --- ## Built Environment Variables | Variable | Description | Values | |---------|-------------|--------| | job_density | Jobs per km² | numeric | | pop_density | People per km² | numeric | | res_density | Housing units per acre (unprotected) | numeric | | pct_ag_land | % agricultural land | numeric | | pct_water | % water area | numeric | | urban_cbsa | Census tract is urban | 0, 1 | | walkndx | Walkability index (0–20) | 0–20 | --- ## Log-Transformed Built Environment / Transit Indicators | Variable | Description | Values | |---------|-------------|--------| | log_job_density | Log(job_density) | numeric | | log_job_above8 | Log(job_density) > 8 | 0, 1 | | log_job_below4 | Log(job_density) < 4 | 0, 1 | | log_pop_density | Log(pop_density) | numeric | | log_pop_above9 | Log(pop_density) > 9 | 0, 1 | | log_pop_below3 | Log(pop_density) < 3 | 0, 1 | | log_res_density | Log(res_density) | numeric | | log_pct_agland | Log(pct_ag_land) | numeric | | log_pct_water | Log(pct_water) | numeric | | log_lastyear_zevpct | Log(previous-year ZEV share) | numeric | --- ## Zero-Emission Vehicle (ZEV) Exposure | Variable | Description | Values | |---------|-------------|--------| | lastyear_zev_pct | Percentage of ZEVs in prior year | numeric | # Data Dictionary: Output ## Household-Level Variables | Variable name | Description | Value | |---------------|-------------|--------| | county_fips | County FIPS code | numeric | | tractid | Census tract ID | numeric | | HOUSEID | Household ID | numeric | | nvehicles | Number of vehicle(s) owned by the household | numeric | ## Vehicle-Level Variables | Variable name | Description | Value | |-------------------|-----------------------------------------------|--------| | county_fips | County FIPS code | numeric | | tractid | Census tract ID | numeric | | HOUSEID | Household ID | numeric | | VEHID | Vehicle ID associated with the household | numeric | | bodytype | The vehicle's body type | car, van, suv (Sport Utility Vehicle), pickup (Light-duty pick-up truck) | | vintage_category | Vehicle age range | 0–5 years, 6–11 years, 12+ years | | annual_mileage | The vehicle's annual mileage | numeric | | pred_power | The vehicle's powertrain | ICE (Internal Combustion Engine), AEV (All-Electric Vehicle), PHEV (Plug-in Hybrid Electric Vehicle), Hybrid (hybrid vehicle) | | modelyear | Vehicle model year (year manufactured) | numeric | --- ## Methodology Summary ### 1. Synthetic Population Generated using PopulationSim, producing approximately 13 million California households with demographics matched to ACS distributions. ### 2. Multiple Discrete-Choice Vehicle Ownership Model The dataset extends the MDCEV-based fleet composition model by Garikapati et al. (2014). The model jointly predicts: - Number of vehicles per household - Vehicle category combinations - Powertrain shares - Vintage distributions Predictors include: - Income, household size, and workers - Built environment metrics - Accessibility indices - Regional land-use patterns ### 3. Validation The synthetic fleet is externally validated against: - California DMV vehicle registration data - County-level vintage and powertrain distributions - Household vehicle count statistics The dataset reproduces observed distributions with high fidelity. --- ## Geographic and Temporal Scope - Region: California - Spatial resolution: Census tract (GEOID) - Households: ~13 million - Vehicles: ~25+ million - Base demographic year: 2017 - Fleet calibration year: 2017 --- ## License This dataset is licensed under the **Creative Commons Attribution 4.0 International (CC BY 4.0)** license. You are free to: - **Share** — copy and redistribute the material in any medium or format - **Adapt** — remix, transform, and build upon the material for any purpose, even commercially Under the following terms: - **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in a way that suggests the licensor endorses you or your use. Full license text: https://creativecommons.org/licenses/by/4.0/ --- ## How to Load the Dataset ### Python (pandas) ```python import polars as pl hh = pl.read_parquet("households.parquet") veh = pl.read_parquet("vehicles.parquet")
提供机构:
alinutzal
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作