five

Causal Feature Importance Dataset for Urban Traffic Level of Service Across Four U.S. Metropolitan Areas

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/tbdn8yhs83
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset supports the study 'What Really Drives Urban Traffic Congestion? A Causal Feature Importance Analysis Across Four Major U.S. Metropolitan Areas' . The dataset contains the processed analytical feature matrix for 134,530 census blocks across Chicago, Houston, Los Angeles, and New York City. Each block record includes 46 causally upstream predictor features spanning six thematic categories (Built Environment, Accessibility/Network, Safety & Environment, Demographics, Mode Choice, and Land Use) along with the outcome variables (Congestion Index and three-class Level of Service classification). All features were derived from five primary sources: TomTom GPS probe traffic data (2024 AM peak), US Census Bureau Decennial 2020 and ACS 5-Year 2019-2023 estimates, city Open Data Portal building footprints and parcel land use records, OpenStreetMap and city street network centerlines, and EPA AirNow PM2.5 monitoring and state DOT crash records. Twenty-three circular and endogenous variables were excluded prior to model training as described in the accompanying manuscript. The Python analysis code (scikit-learn, XGBoost, LightGBM) for model training, evaluation, and feature importance extraction is included.

本数据集为提交至《Journal of Transport Geography》(《交通地理学报》)的研究《What Really Drives Urban Traffic Congestion? A Causal Feature Importance Analysis Across Four Major U.S. Metropolitan Areas》(《究竟是什么驱动城市交通拥堵?美国四大都会区因果特征重要性分析》)提供支撑。本数据集包含针对芝加哥、休斯顿、洛杉矶及纽约市共134530个普查街区(census block)的预处理分析特征矩阵。每个街区记录包含46个因果上游预测特征,涵盖六大主题类别:建成环境、可达性/路网、安全与环境、人口统计特征、出行方式选择及土地利用,同时附带结果变量:拥堵指数(Congestion Index)与三级服务水平分级。所有特征均源自五大核心数据源:TomTom GPS浮动车交通数据(2024年早高峰时段)、美国人口普查局2020年十年一次人口普查数据与2019-2023年美国社区调查(ACS)五年期估计数据、各城市开放数据门户的建筑轮廓与宗地土地使用记录、开放街道地图(OpenStreetMap)及城市路网中心线、美国环境保护署AirNow细颗粒物(PM2.5)监测数据与州交通部事故记录。如随附手稿所述,在模型训练前已剔除23个循环内生变量。本数据集还附带用于模型训练、评估及特征重要性提取的Python分析代码,所使用的工具库包括scikit-learn、XGBoost及LightGBM。
创建时间:
2026-04-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作