345rf4gt56t4r3e3/flight-delays-europe-2023-2025
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/345rf4gt56t4r3e3/flight-delays-europe-2023-2025
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
tags:
- flight-delays
- europe
- aviation
- 2023-2025
- parquet
- opdi
- eurocontrol
- time-series
- regression
- prediction
---
# Flight Delays Europe 2023‑2025
This dataset contains per‑flight delay estimates for European commercial flights from January 2023 to December 2025, derived from the Open Performance Data Initiative (OPDI) flight‑list data published by Eurocontrol.
## Dataset Description
- **Total flights:** 15,300,770 (with delay estimates)
- **Raw flight lists:** 46,416,928 flights (36 months)
- **Years covered:** 2023, 2024, 2025
- **Date range:** 2023‑01‑02 to 2025‑12‑31
- **Source:** OPDI flight‑list Parquet files (v002) via Eurocontrol Performance Review Unit
- **Delay definition:** `delay_min = actual_flight_duration – median_historical_duration_for_route_and_month`
- Actual duration: `last_seen – first_seen` (ADS‑B tracked times)
- Expected duration: median of historical durations for the same airport pair (`adep‑ades`) and calendar month, computed within the same year.
- **Binary label:** `delayed_15min = 1` if `delay_min > 15`.
## Data Schema
| Column | Type | Description |
|--------|------|-------------|
| `id` | int64 | Unique flight identifier (OPDI) |
| `icao24` | string | ICAO 24‑bit address of the aircraft |
| `flt_id` | string | Callsign / flight identifier |
| `dof` | date | Date of flight (YYYY‑MM‑DD) |
| `adep` | string | ICAO 4‑letter departure airport code |
| `ades` | string | ICAO 4‑letter arrival airport code |
| `adep_p` | string | Predicted departure airport (often empty) |
| `ades_p` | string | Predicted arrival airport (often empty) |
| `registration` | string | Aircraft registration (e.g., G‑ABCD) |
| `model` | string | Aircraft model description |
| `typecode` | string | ICAO aircraft type code (e.g., B738) |
| `icao_aircraft_class` | string | ICAO aircraft class (e.g., L2J) |
| `icao_operator` | string | ICAO airline code (e.g., BAW) |
| `first_seen` | datetime | First ADS‑B detection (UTC) |
| `last_seen` | datetime | Last ADS‑B detection (UTC) |
| `version` | string | OPDI data version (e.g., v2.0.0) |
| `unix_time` | int64 | Unix timestamp of first_seen |
| `source_month` | string | Year‑month of the source file (YYYYMM) |
| `source_year` | int | Year extracted from source_month |
| `duration_min` | float | Actual flight duration in minutes |
| `expected_duration_min` | float | Median historical duration for same route‑month (minutes) |
| `delay_min` | float | Delay in minutes (actual – expected) |
| `delayed_15min` | int | Binary flag: 1 if delay > 15 min |
| `route` | string | Concatenated `adep‑ades` |
## Key Statistics
### Overall Delay Distribution (minutes)
| Statistic | Value |
|-----------|-------|
| Count | 15,300,770 |
| Mean | 2.46 |
| Std | 18.34 |
| Min | -120.0 |
| 25% | -3.88 |
| 50% | 0.00 |
| 75% | 4.96 |
| Max | 1,047.5 |
### Flights Delayed >15 min
| Year | Total Flights | % Delayed >15 min |
|------|---------------|-------------------|
| 2023 | 4,595,566 | 6.97 % |
| 2024 | 5,126,450 | 7.43 % |
| 2025 | 5,578,754 | 10.37 % |
| **Overall** | **15,300,770** | **8.36 %** |
### Top Routes with Highest Delay >3 h Probability (min 50 flights)
| Route | Flights | Delayed >3 h | Probability |
|-------|---------|--------------|-------------|
| LLBG‑EDDH | 50 | 17 | 34.00 % |
| EFHK‑EHRD | 50 | 13 | 26.00 % |
| LPPM‑LPFR | 485 | 77 | 15.88 % |
| EGNR‑EIWF | 52 | 8 | 15.38 % |
| EBLG‑ETNG | 1,399 | 206 | 14.72 % |
*(Full list of probabilities by airport, airline, month, etc. is available in the companion CSV files.)*
## Data Coverage
- **Mapping coverage:** 33 % of raw flights have valid `adep`, `ades`, `first_seen`, `last_seen` and thus a delay estimate. The remaining flights are mostly overflights or missing airport data.
- **Geographic focus:** European flights (Eurocontrol member states).
- **Temporal coverage:** Full calendar years 2023‑2025.
## Data Sources & Methodology
1. **Flight‑list data:** Monthly Parquet files from the [Open Performance Data Initiative (OPDI)](https://www.eurocontrol.int/Performance/data/download/OPDI/v002/flight_list/) (Eurocontrol).
2. **Delay calculation:** For each flight, compute actual duration (`last_seen – first_seen`). Then compute the median historical duration for the same route (`adep‑ades`) and calendar month using all flights in the same year. The delay is the difference.
3. **Assumptions:**
- `first_seen` and `last_seen` are reliable proxies for off‑block‑time and on‑block‑time.
- The median historical duration approximates the scheduled duration.
- Flights with negative delay (faster than median) are possible.
## Usage
Load the dataset with 🤗 `datasets`:
```python
from datasets import load_dataset
ds = load_dataset("345rf4gt56t4r3e3/flight-delays-europe-2023-2025")
print(ds)
# DatasetDict({
# train: Dataset({
# features: ['id', 'icao24', 'flt_id', ...],
# num_rows: 15300770
# })
# })
# Convert to pandas for analysis
df = ds['train'].to_pandas()
print(df['delay_min'].describe())
```
Example: compute average delay by hour of day:
```python
import pandas as pd
df['hour'] = pd.to_datetime(df['first_seen']).dt.hour
hourly = df.groupby('hour')['delay_min'].mean()
print(hourly)
```
## License
This dataset is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
## Citation
If you use this dataset in your research, please cite:
```bibtex
@dataset{flight_delays_europe_2023_2025,
title = {Flight Delays Europe 2023‑2025},
author = {Open‑source community},
year = {2026},
url = {https://huggingface.co/datasets/345rf4gt56t4r3e3/flight-delays-europe-2023-2025}
}
```
## Acknowledgments
- Data sourced from [Eurocontrol Performance Review Unit](https://www.eurocontrol.int/Performance/) via the Open Performance Data Initiative.
- Delay estimation methodology inspired by common aviation‑delay research.
## Contact
For questions or issues, open a discussion on the Hugging Face dataset page.
提供机构:
345rf4gt56t4r3e3



