five

345rf4gt56t4r3e3/flight-delays-europe-2023-2025

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/345rf4gt56t4r3e3/flight-delays-europe-2023-2025
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - flight-delays - europe - aviation - 2023-2025 - parquet - opdi - eurocontrol - time-series - regression - prediction --- # Flight Delays Europe 2023‑2025 This dataset contains per‑flight delay estimates for European commercial flights from January 2023 to December 2025, derived from the Open Performance Data Initiative (OPDI) flight‑list data published by Eurocontrol. ## Dataset Description - **Total flights:** 15,300,770 (with delay estimates) - **Raw flight lists:** 46,416,928 flights (36 months) - **Years covered:** 2023, 2024, 2025 - **Date range:** 2023‑01‑02 to 2025‑12‑31 - **Source:** OPDI flight‑list Parquet files (v002) via Eurocontrol Performance Review Unit - **Delay definition:** `delay_min = actual_flight_duration – median_historical_duration_for_route_and_month` - Actual duration: `last_seen – first_seen` (ADS‑B tracked times) - Expected duration: median of historical durations for the same airport pair (`adep‑ades`) and calendar month, computed within the same year. - **Binary label:** `delayed_15min = 1` if `delay_min > 15`. ## Data Schema | Column | Type | Description | |--------|------|-------------| | `id` | int64 | Unique flight identifier (OPDI) | | `icao24` | string | ICAO 24‑bit address of the aircraft | | `flt_id` | string | Callsign / flight identifier | | `dof` | date | Date of flight (YYYY‑MM‑DD) | | `adep` | string | ICAO 4‑letter departure airport code | | `ades` | string | ICAO 4‑letter arrival airport code | | `adep_p` | string | Predicted departure airport (often empty) | | `ades_p` | string | Predicted arrival airport (often empty) | | `registration` | string | Aircraft registration (e.g., G‑ABCD) | | `model` | string | Aircraft model description | | `typecode` | string | ICAO aircraft type code (e.g., B738) | | `icao_aircraft_class` | string | ICAO aircraft class (e.g., L2J) | | `icao_operator` | string | ICAO airline code (e.g., BAW) | | `first_seen` | datetime | First ADS‑B detection (UTC) | | `last_seen` | datetime | Last ADS‑B detection (UTC) | | `version` | string | OPDI data version (e.g., v2.0.0) | | `unix_time` | int64 | Unix timestamp of first_seen | | `source_month` | string | Year‑month of the source file (YYYYMM) | | `source_year` | int | Year extracted from source_month | | `duration_min` | float | Actual flight duration in minutes | | `expected_duration_min` | float | Median historical duration for same route‑month (minutes) | | `delay_min` | float | Delay in minutes (actual – expected) | | `delayed_15min` | int | Binary flag: 1 if delay > 15 min | | `route` | string | Concatenated `adep‑ades` | ## Key Statistics ### Overall Delay Distribution (minutes) | Statistic | Value | |-----------|-------| | Count | 15,300,770 | | Mean | 2.46 | | Std | 18.34 | | Min | -120.0 | | 25% | -3.88 | | 50% | 0.00 | | 75% | 4.96 | | Max | 1,047.5 | ### Flights Delayed >15 min | Year | Total Flights | % Delayed >15 min | |------|---------------|-------------------| | 2023 | 4,595,566 | 6.97 % | | 2024 | 5,126,450 | 7.43 % | | 2025 | 5,578,754 | 10.37 % | | **Overall** | **15,300,770** | **8.36 %** | ### Top Routes with Highest Delay >3 h Probability (min 50 flights) | Route | Flights | Delayed >3 h | Probability | |-------|---------|--------------|-------------| | LLBG‑EDDH | 50 | 17 | 34.00 % | | EFHK‑EHRD | 50 | 13 | 26.00 % | | LPPM‑LPFR | 485 | 77 | 15.88 % | | EGNR‑EIWF | 52 | 8 | 15.38 % | | EBLG‑ETNG | 1,399 | 206 | 14.72 % | *(Full list of probabilities by airport, airline, month, etc. is available in the companion CSV files.)* ## Data Coverage - **Mapping coverage:** 33 % of raw flights have valid `adep`, `ades`, `first_seen`, `last_seen` and thus a delay estimate. The remaining flights are mostly overflights or missing airport data. - **Geographic focus:** European flights (Eurocontrol member states). - **Temporal coverage:** Full calendar years 2023‑2025. ## Data Sources & Methodology 1. **Flight‑list data:** Monthly Parquet files from the [Open Performance Data Initiative (OPDI)](https://www.eurocontrol.int/Performance/data/download/OPDI/v002/flight_list/) (Eurocontrol). 2. **Delay calculation:** For each flight, compute actual duration (`last_seen – first_seen`). Then compute the median historical duration for the same route (`adep‑ades`) and calendar month using all flights in the same year. The delay is the difference. 3. **Assumptions:** - `first_seen` and `last_seen` are reliable proxies for off‑block‑time and on‑block‑time. - The median historical duration approximates the scheduled duration. - Flights with negative delay (faster than median) are possible. ## Usage Load the dataset with 🤗 `datasets`: ```python from datasets import load_dataset ds = load_dataset("345rf4gt56t4r3e3/flight-delays-europe-2023-2025") print(ds) # DatasetDict({ # train: Dataset({ # features: ['id', 'icao24', 'flt_id', ...], # num_rows: 15300770 # }) # }) # Convert to pandas for analysis df = ds['train'].to_pandas() print(df['delay_min'].describe()) ``` Example: compute average delay by hour of day: ```python import pandas as pd df['hour'] = pd.to_datetime(df['first_seen']).dt.hour hourly = df.groupby('hour')['delay_min'].mean() print(hourly) ``` ## License This dataset is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). ## Citation If you use this dataset in your research, please cite: ```bibtex @dataset{flight_delays_europe_2023_2025, title = {Flight Delays Europe 2023‑2025}, author = {Open‑source community}, year = {2026}, url = {https://huggingface.co/datasets/345rf4gt56t4r3e3/flight-delays-europe-2023-2025} } ``` ## Acknowledgments - Data sourced from [Eurocontrol Performance Review Unit](https://www.eurocontrol.int/Performance/) via the Open Performance Data Initiative. - Delay estimation methodology inspired by common aviation‑delay research. ## Contact For questions or issues, open a discussion on the Hugging Face dataset page.
提供机构:
345rf4gt56t4r3e3
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作