claritystorm/vehicle-safety-profile
收藏Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/claritystorm/vehicle-safety-profile
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: public-domain
task_categories:
- tabular-classification
- tabular-regression
tags:
- vehicles
- safety
- recalls
- automotive
- insurance
- litigation
- united-states
pretty_name: Vehicle Safety Profile — Complaints + Recalls + Fatal Crashes
size_categories:
- 10K<n<100K
---
# Vehicle Safety Profile — Complaints + Recalls + Fatal Crashes
The only pre-built dataset linking **NHTSA complaints, recalls, and FARS fatal crash data** by make/model/year.
**33K+ vehicle-year profiles** with complaint counts, recall flags, do-not-drive alerts, and fatality rates — ready for insurance scoring, litigation research, and automotive safety ML.
No data engineering required — the three source datasets are already joined and aggregated.
| 📊 Records | 📅 Coverage | 🏷️ License | 🔄 Updated |
|-----------|-------------|-----------|-----------|
| 33K+ vehicle profiles | All years on record | Public Domain | Annual |
**This repo contains a free 1,000-row sample.**
Full dataset (CSV + Parquet) → **[claritystorm.com/datasets/vehicle-safety-profile](https://claritystorm.com/datasets/vehicle-safety-profile)**
---
## Quick Start
```python
from datasets import load_dataset
import pandas as pd
# Load the 1,000-row sample
ds = load_dataset("claritystorm/vehicle-safety-profile")
df = ds["train"].to_pandas()
# Riskiest vehicles: most complaints with crash involvement
risky = df.sort_values("complaints_with_crash", ascending=False)
print(risky[["make", "model", "model_year", "complaints_with_crash",
"recall_count", "fatality_count"]].head(10))
# Do-not-drive vehicles
dnd = df[df["do_not_drive_flag"] == 1]
print(f"Do-not-drive vehicles in dataset: {len(dnd)}")
# Average mileage at failure by component
print(df.groupby("top_complaint_component")["avg_miles_at_failure"]
.mean().sort_values().head(10))
```
## Use Cases
- **Auto insurance risk scoring** — one-row-per-vehicle-year feature set ready for underwriting models
- **Product liability & litigation** — recall history, do-not-drive flags, and fatality counts by make/model/year
- **Used vehicle valuation** — safety history as a feature for residual value and depreciation models
- **Recall prediction modeling** — predict future recall campaigns from complaint patterns and component flags
- **Fleet risk management** — screen fleet inventory for high-complaint or do-not-drive vehicles
- **Consumer vehicle lookup** — instant safety profile for any make/model/year combination
## Schema (selected fields)
| Field | Type | Description |
|-------|------|-------------|
| make | string | Vehicle manufacturer (normalized uppercase) |
| model | string | Vehicle model (normalized uppercase) |
| model_year | int | Model year |
| complaint_count | int | Total NHTSA consumer complaints |
| complaints_with_crash | int | Complaints where a crash was reported |
| complaints_with_fire | int | Complaints where a fire was reported |
| complaints_with_injury | int | Complaints reporting at least one injury |
| avg_miles_at_failure | float | Average mileage at failure |
| top_complaint_component | string | Most cited component (e.g. ENGINE, STEERING) |
| recall_count | int | Number of distinct recall campaigns |
| total_affected_units | int | Total units affected across all recalls |
| has_safety_recall | int | 1 if any safety-critical recall, 0 otherwise |
| do_not_drive_flag | int | 1 if any do-not-drive advisory issued |
| fatality_count | int | FARS-reported fatalities for this vehicle |
| fatal_crash_count | int | Distinct fatal crash cases |
## ⬇️ Get the Full Dataset
| Tier | Price | Includes |
|------|-------|----------|
| Sample | Free | 1,000 rows, Public Domain (this repo) |
| Complete | $99 | Full 33K+ profiles, CSV + Parquet, commercial license |
| Annual | $199/yr | Complete + annual updates |
👉 **[Purchase at claritystorm.com/datasets/vehicle-safety-profile](https://claritystorm.com/datasets/vehicle-safety-profile)**
## Sources
- **NHTSA ODI Vehicle Complaints** — consumer complaints since 1995
- **NHTSA ODI Vehicle Recalls** — recall campaigns since 1967
- **NHTSA FARS** — fatal crash data since 1975
All source data is US federal government work in the **public domain** (17 U.S.C. 105).
Linked, aggregated, and processed by [ClarityStorm Data](https://claritystorm.com).
提供机构:
claritystorm



