claritystorm/dot-airline-ontime
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/claritystorm/dot-airline-ontime
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: public-domain
task_categories:
- tabular-classification
- tabular-regression
tags:
- aviation
- airline
- on-time-performance
- flight-delays
- bts
- dot
- united-states
- machine-learning
pretty_name: DOT Airline On-Time Performance 2018–Present
size_categories:
- 10M<n<100M
---
# DOT Airline On-Time Performance 2018–Present
**35M+ domestic flights (2018–2024)** — BTS Reporting Carrier On-Time Performance (USDOT Form 41), unified from 84 monthly files into a single analysis-ready table.
Departure/arrival delays, cancellation codes, delay cause breakdowns (carrier/weather/NAS/security/late aircraft), taxi times, and aircraft routing.
| 📊 Records | 📅 Coverage | 🏷️ License | 🔄 Updated |
|-----------|-------------|-----------|-----------|
| 35M+ flights | 2018–2024 (84 months) | Public Domain | Annual |
**This repo contains a free 1,000-row sample.**
Full dataset (CSV + Parquet + year-partitioned Parquet) → **[claritystorm.com/datasets/dot-airline-ontime](https://claritystorm.com/datasets/dot-airline-ontime)**
---
## Quick Start
```python
from datasets import load_dataset
import pandas as pd
# Load the 1,000-row sample
ds = load_dataset("claritystorm/dot-airline-ontime")
df = ds["train"].to_pandas()
# On-time rate by carrier
carrier_perf = (
df[df["cancelled"] == 0]
.groupby("carrier")["is_delayed"]
.agg(flights="count", delayed="sum")
.assign(delay_rate=lambda x: (x["delayed"] / x["flights"] * 100).round(1))
.sort_values("delay_rate")
)
print(carrier_perf)
```
## Use Cases
- **Flight delay prediction** — 35M+ labeled examples for ML models
- **Airline benchmarking** — on-time performance by carrier, route, airport, and season
- **COVID-19 aviation impact** — study collapse and recovery 2020–2024
- **Airport operations research** — NAS and weather delay propagation through hubs
- **Insurance & risk pricing** — delay distributions for flight delay insurance models
- **Travel product optimization** — connection time recommendations and flight-risk scoring
## Schema (selected fields)
| Field | Type | Description |
|-------|------|-------------|
| flight_date | date | Date of flight (YYYY-MM-DD) |
| carrier | string | IATA carrier code (AA, DL, WN, etc.) |
| origin | string | Origin airport IATA code |
| dest | string | Destination airport IATA code |
| route | string | Route key (e.g. JFK-LAX) — computed |
| dep_delay | float | Departure delay in minutes |
| arr_delay | float | Arrival delay in minutes |
| is_delayed | int | 1 if arr_delay_minutes ≥ 15 and not cancelled — computed |
| cancelled | int | 1 if flight was cancelled |
| cancellation_reason | string | carrier / weather / national_air_system / security — computed |
| carrier_delay | float | Delay minutes attributable to carrier |
| weather_delay | float | Delay minutes attributable to weather |
| nas_delay | float | Delay minutes attributable to NAS |
| late_aircraft_delay | float | Delay from late incoming aircraft |
## ⬇️ Get the Full Dataset
| Tier | Price | Includes |
|------|-------|----------|
| Sample | Free | 1,000 rows, Public Domain (this repo) |
| Complete | $99 | Full 35M+ flights, CSV + Parquet + year-partitioned Parquet |
| Annual | $199/yr | Complete + annual updates as BTS releases new monthly data |
👉 **[Purchase at claritystorm.com/datasets/dot-airline-ontime](https://claritystorm.com/datasets/dot-airline-ontime)**
## Source
**Bureau of Transportation Statistics (BTS)**, US Department of Transportation — Reporting Carrier On-Time Performance (Form 41 Traffic).
Under 14 CFR Part 234, US carriers with ≥1% of domestic scheduled service must report monthly.
Source data is US federal government work in the **public domain** (17 U.S.C. 105).
Unified, typed, and enriched by [ClarityStorm Data](https://claritystorm.com).
提供机构:
claritystorm



