five

Comprehensive multi-level dataset of motor vehicle crashes in Ohio, USA (2017–2023): Crash, vehicle, and occupant-level records with detailed attributes and severity outcomes

收藏
DataCite Commons2025-07-24 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Comprehensive_multi-level_dataset_of_motor_vehicle_crashes_in_Ohio_USA_2017_2023_Crash_vehicle_and_occupant-level_records_with_detailed_attributes_and_severity_outcomes/29437694/1
下载链接
链接失效反馈
官方服务:
资源简介:
<b><i>Abstract</i></b>This dataset comprises detailed records of motor vehicle crashes occurring in Ohio, USA, from January 1, 2017, to December 31, 2023. Collected by law enforcement agencies using standardized OH-1 crash reporting forms and centralized by the Ohio Department of Public Safety, the dataset captures detailed information on 1,679,019 crashes involving 2,656,086 vehicles and 3,577,822 occupants. Structured across three levels—crash, vehicle, and occupant—the dataset includes attributes such as crash timing and location, environmental and road conditions, vehicle specifications, operational factors, occupant demographics, injury severity, safety equipment usage, and behavioral indicators like alcohol or drug involvement. Severity information is documented at both the crash and individual occupant levels, covering outcomes ranging from no injury to fatal incidents. The dataset features a total of 119 systematically named variables at the crash, vehicle, and occupant levels. A complete list of features, along with categorical value mappings, is provided in the accompanying documentation.<b><i>Description of the data and file structure</i></b>This dataset contains comprehensive records of motor vehicle crashes reported across the state of Ohio, USA, from January 1, 2017, to December 31, 2023. The data were collected by law enforcement agencies using standardized crash reporting forms (OH-1) and centralized through the Ohio Department of Public Safety’s data systems.It captures detailed, structured information related to crash events, vehicles involved, and individuals affected. Each data sample corresponds to an occupant of a vehicle. There are unique identifiers for each crash and involved vehicle. Hence, the dataset is organized into three primary levels:<b>Crash-Level Data:</b> Includes unique identifiers for each of the 1,679,019 reported crashes, along with temporal details (date, time), location attributes, environmental conditions (e.g., weather, light, road surface), and overall crash characteristics (e.g., number of units involved, severity classification, work zone presence). The identifier for the crash is the feature <i>“DocumentNumber”</i>.<b>Vehicle-Level Data:</b> Comprises identifiers for each of the 2,656,086 vehicles (units) involved in a crash. Attributes include vehicle type, make, model, year of manufacture, vehicle defects, and operational details such as posted speed, traffic control devices, and pre-crash actions. Interacting vehicle types and hazardous material indicators are also documented. Vehicle-Level features are identified by the prefix <i>”Units.”</i> in the feature name.<b>Occupant-Level Data:</b> Contains 3,577,822 records detailing individuals involved in crashes. This includes demographic information (age, gender), seating position, person injury severity, use of safety equipment (e.g., seat belts, airbags, helmets), and behavioral factors such as alcohol or drug involvement, distraction status, and test results where applicable. Occupant-Level features are identified by the prefix <i>“Units.People.”</i> in the feature name.The severity of the accident is also documented. The <i>“CrashSeverity”</i> feature document the severity of the crash in the following levels: <i>Fatal</i> (15021), <i>Suspected Serious Injury</i> (83764), <i>Suspected Minor Injury</i> (483026), <i>Possible Injury</i> (461019), and <i>No Apparent Injury</i> (2440823). Similarly, also individual people injury levels are recorded in the feature <i>“Units.People.Injury”</i>. The file <i>"summary_2023_new.pdf"</i> is a summary file that contains data analysis of the dataset (statistics and plots).There are 119 unique features in the data, and their complete list of name and type is reported below. Their categorical levels in case of integer-encoding is found in the file <i>“mapping.yaml”</i>.Access informationOther publicly accessible locations of the data:The full dataset submitted to figshare is not available elsewhere in its complete and curated form. However, data covering the most recent five years, including the current year, are publicly accessible through the following sources:Ohio Department of Public Safety Crash Retrieval Portal: https://ohtrafficdata.dps.ohio.gov/crashretrievalOhio Statistics and Analytics for Traffic Safety (OSTATS): https://statepatrol.ohio.gov/dashboards-statistics/ostats-dashboardsThese public portals provide access to selected crash data but do not include the full historical dataset or the cleaned, integrated, and reformatted version provided through this submission.Data was derived from the following sources:Ohio Department of Public SafetyHuman subjects dataThis dataset was derived entirely from publicly available traffic crash reports collected and disseminated by the Ohio Department of Public Safety through the Ohio Statistics and Analytics for Traffic Safety (OSTATS) platform.<br>To ensure compliance with ethical standards for data sharing, this dataset contains no direct identifiers (e.g., names, addresses, license plate numbers, or VINs linked to individuals). All personal identifiers have been removed or were not included in the public dataset. Furthermore, the dataset contains no more than three indirect identifiers per record. These indirect identifiers (e.g., crash year, crash county, and age group) were selected based on their relevance to the study while minimizing re-identification risk.<br>Where possible, continuous variables were converted to categories (e.g., age groups instead of exact age), and geographic detail was limited to broader regional indicators rather than precise location data. Data cleaning and aggregation procedures were conducted to further reduce identifiability while retaining the analytic value of the dataset for modeling injury risk across system domains.<br>As described in the associated manuscript, all analyses were conducted on this de-identified dataset, and no additional linkage to identifiable information was performed. As such, this dataset does not require IRB oversight or data use agreements and is suitable for open-access publication under CC-BY licence.No direct interaction or intervention with human participants occurred during the creation of this dataset, and no personally identifiable information (PII) is included.Given the publicly available nature of the source data and the absence of PII, explicit participant consent was not required. However, by relying exclusively on open-access government data and following de-identification protocols aligned with the Common Rule (45 CFR 46), this dataset meets ethical standards for public data sharing.
提供机构:
figshare
创建时间:
2025-07-24
二维码
社区交流群
二维码
科研交流群
商业服务