five

claritystorm/vehicle-safety-profile

收藏
Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/claritystorm/vehicle-safety-profile
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: public-domain task_categories: - tabular-classification - tabular-regression tags: - vehicles - safety - recalls - automotive - insurance - litigation - united-states pretty_name: Vehicle Safety Profile — Complaints + Recalls + Fatal Crashes size_categories: - 10K<n<100K --- # Vehicle Safety Profile — Complaints + Recalls + Fatal Crashes The only pre-built dataset linking **NHTSA complaints, recalls, and FARS fatal crash data** by make/model/year. **33K+ vehicle-year profiles** with complaint counts, recall flags, do-not-drive alerts, and fatality rates — ready for insurance scoring, litigation research, and automotive safety ML. No data engineering required — the three source datasets are already joined and aggregated. | 📊 Records | 📅 Coverage | 🏷️ License | 🔄 Updated | |-----------|-------------|-----------|-----------| | 33K+ vehicle profiles | All years on record | Public Domain | Annual | **This repo contains a free 1,000-row sample.** Full dataset (CSV + Parquet) → **[claritystorm.com/datasets/vehicle-safety-profile](https://claritystorm.com/datasets/vehicle-safety-profile)** --- ## Quick Start ```python from datasets import load_dataset import pandas as pd # Load the 1,000-row sample ds = load_dataset("claritystorm/vehicle-safety-profile") df = ds["train"].to_pandas() # Riskiest vehicles: most complaints with crash involvement risky = df.sort_values("complaints_with_crash", ascending=False) print(risky[["make", "model", "model_year", "complaints_with_crash", "recall_count", "fatality_count"]].head(10)) # Do-not-drive vehicles dnd = df[df["do_not_drive_flag"] == 1] print(f"Do-not-drive vehicles in dataset: {len(dnd)}") # Average mileage at failure by component print(df.groupby("top_complaint_component")["avg_miles_at_failure"] .mean().sort_values().head(10)) ``` ## Use Cases - **Auto insurance risk scoring** — one-row-per-vehicle-year feature set ready for underwriting models - **Product liability & litigation** — recall history, do-not-drive flags, and fatality counts by make/model/year - **Used vehicle valuation** — safety history as a feature for residual value and depreciation models - **Recall prediction modeling** — predict future recall campaigns from complaint patterns and component flags - **Fleet risk management** — screen fleet inventory for high-complaint or do-not-drive vehicles - **Consumer vehicle lookup** — instant safety profile for any make/model/year combination ## Schema (selected fields) | Field | Type | Description | |-------|------|-------------| | make | string | Vehicle manufacturer (normalized uppercase) | | model | string | Vehicle model (normalized uppercase) | | model_year | int | Model year | | complaint_count | int | Total NHTSA consumer complaints | | complaints_with_crash | int | Complaints where a crash was reported | | complaints_with_fire | int | Complaints where a fire was reported | | complaints_with_injury | int | Complaints reporting at least one injury | | avg_miles_at_failure | float | Average mileage at failure | | top_complaint_component | string | Most cited component (e.g. ENGINE, STEERING) | | recall_count | int | Number of distinct recall campaigns | | total_affected_units | int | Total units affected across all recalls | | has_safety_recall | int | 1 if any safety-critical recall, 0 otherwise | | do_not_drive_flag | int | 1 if any do-not-drive advisory issued | | fatality_count | int | FARS-reported fatalities for this vehicle | | fatal_crash_count | int | Distinct fatal crash cases | ## ⬇️ Get the Full Dataset | Tier | Price | Includes | |------|-------|----------| | Sample | Free | 1,000 rows, Public Domain (this repo) | | Complete | $99 | Full 33K+ profiles, CSV + Parquet, commercial license | | Annual | $199/yr | Complete + annual updates | 👉 **[Purchase at claritystorm.com/datasets/vehicle-safety-profile](https://claritystorm.com/datasets/vehicle-safety-profile)** ## Sources - **NHTSA ODI Vehicle Complaints** — consumer complaints since 1995 - **NHTSA ODI Vehicle Recalls** — recall campaigns since 1967 - **NHTSA FARS** — fatal crash data since 1975 All source data is US federal government work in the **public domain** (17 U.S.C. 105). Linked, aggregated, and processed by [ClarityStorm Data](https://claritystorm.com).
提供机构:
claritystorm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作