Flight Status Prediction
收藏www.kaggle.com2022-10-07 更新2025-01-21 收录
下载链接:
https://www.kaggle.com/robikscube/flight-delay-dataset-20182022
下载链接
链接失效反馈官方服务:
资源简介:
- Can you predict which flights will be cancelled or delayed?
- Can you predict the delay time?
- Can you explore how different airlines compare?
This dataset makes all of these possible. Perfect for a school project, research project or resume builder.
# Air Flight Dataset
This dataset contains all flight information including cancellation and delays by airline for dates back to January 2018.
For your convenience you can use the `Combined_Flights_XXXX.csv` or `Combined_Flights_XXXX.parquet` files to access the combined data for the entire year. These files also have filtered out columns that are mostly null in the original dataset.
The raw data including all columns by month can be found in the files named `Flights_XXXX_X.csv`
# BACKGROUND
The data contained in the compressed file has been extracted from the Marketing Carrier On-Time Performance (Beginning January 2018) data table of the "On-Time" database from the TranStats data library. The time period is indicated in the name of the compressed file; for example, XXX\_XXXXX\_2001\_1 contains data of the first month of the year 2001.
# RECORD LAYOUT
Below are fields in the order that they appear on the records:
| Column | Description |
|----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Year | Year |
| Quarter | Quarter (1-4) |
| Month | Month |
| DayofMonth | Day of Month |
| DayOfWeek | Day of Week |
| FlightDate | Flight Date (yyyymmdd) |
| Marketing_Airline_Network | Unique Marketing Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. |
| Operated_or_Branded_Code_Share_Partners | Reporting Carrier Operated or Branded Code Share Partners |
| DOT_ID_Marketing_Airline | An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. |
| IATA_Code_Marketing_Airline | Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. |
| Flight_Number_Marketing_Airline | Flight Number |
| Originally_Scheduled_Code_Share_Airline | Unique Scheduled Operating Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users,for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. |
| DOT_ID_Originally_Scheduled_Code_Share_Airline | An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. |
| IATA_Code_Originally_Scheduled_Code_Share_Airline | Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. |
| Flight_Num_Originally_Scheduled_Code_Share_Airline | Flight Number |
| Operating_Airline | Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. |
| DOT_ID_Operating_Airline | An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. |
| IATA_Code_Operating_Airline | Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. |
| Tail_Number | Tail Number |
| Flight_Number_Operating_Airline | Flight Number |
| OriginAirportID | Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. |
| OriginAirportSeqID | Origin Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. |
| OriginCityMarketID | Origin Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. |
| Origin | Origin Airport |
| OriginCityName | Origin Airport, City Name |
| OriginState | Origin Airport, State Code |
| OriginStateFips | Origin Airport, State Fips |
| OriginStateName | Origin Airport, State Name |
| OriginWac | Origin Airport, World Area Code |
| DestAirportID | Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. |
| DestAirportSeqID | Destination Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. |
| DestCityMarketID | Destination Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. |
| Dest | Destination Airport |
| DestCityName | Destination Airport, City Name |
| DestState | Destination Airport, State Code |
| DestStateFips | Destination Airport, State Fips |
| DestStateName | Destination Airport, State Name |
| DestWac | Destination Airport, World Area Code |
| CRSDepTime | CRS Departure Time (local time: hhmm) |
| DepTime | Actual Departure Time (local time: hhmm) |
| DepDelay | Difference in minutes between scheduled and actual departure time. Early departures show negative numbers. |
| DepDelayMinutes | Difference in minutes between scheduled and actual departure time. Early departures set to 0. |
| DepDel15 | Departure Delay Indicator, 15 Minutes or More (1=Yes) |
| DepartureDelayGroups | Departure Delay intervals, every (15 minutes from 180) |
| DepTimeBlk | CRS Departure Time Block, Hourly Intervals |
| TaxiOut | Taxi Out Time, in Minutes |
| WheelsOff | Wheels Off Time (local time: hhmm) |
| WheelsOn | Wheels On Time (local time: hhmm) |
| TaxiIn | Taxi In Time, in Minutes |
| CRSArrTime | CRS Arrival Time (local time: hhmm) |
| ArrTime | Actual Arrival Time (local time: hhmm) |
| ArrDelay | Difference in minutes between scheduled and actual arrival time. Early arrivals show negative numbers. |
| ArrDelayMinutes | Difference in minutes between scheduled and actual arrival time. Early arrivals set to 0. |
| ArrDel15 | Arrival Delay Indicator, 15 Minutes or More (1=Yes) |
| ArrivalDelayGroups | Arrival Delay intervals, every (15-minutes from 180) |
| ArrTimeBlk | CRS Arrival Time Block, Hourly Intervals |
| Cancelled | Cancelled Flight Indicator (1=Yes) |
| CancellationCode | Specifies The Reason For Cancellation |
| Diverted | Diverted Flight Indicator (1=Yes) |
| CRSElapsedTime | CRS Elapsed Time of Flight, in Minutes |
| ActualElapsedTime | Elapsed Time of Flight, in Minutes |
| AirTime | Flight Time, in Minutes |
| Flights | Number of Flights |
| Distance | Distance between airports (miles) |
| DistanceGroup | Distance Intervals, every 250 Miles, for Flight Segment |
| CarrierDelay | Carrier Delay, in Minutes |
| WeatherDelay | Weather Delay, in Minutes |
| NASDelay | National Air System Delay, in Minutes |
| SecurityDelay | Security Delay, in Minutes |
| LateAircraftDelay | Late Aircraft Delay, in Minutes |
| FirstDepTime | First Gate Departure Time at Origin Airport |
| TotalAddGTime | Total Ground Time Away from Gate for Gate Return or Cancelled Flight |
| LongestAddGTime | Longest Time Away from Gate for Gate Return or Cancelled Flight |
| DivAirportLandings | Number of Diverted Airport Landings |
| DivReachedDest | Diverted Flight Reaching Scheduled Destination Indicator (1=Yes) |
| DivActualElapsedTime | Elapsed Time of Diverted Flight Reaching Scheduled Destination, in Minutes. The ActualElapsedTime column remains NULL for all diverted flights. |
| DivArrDelay | Difference in minutes between scheduled and actual arrival time for a diverted flight reaching scheduled destination. The ArrDelay column remains NULL for all diverted flights. |
| DivDistance | Distance between scheduled destination and final diverted airport (miles). Value will be 0 for diverted flight reaching scheduled destination. |
| Div1Airport | Diverted Airport Code1 |
| Div1AirportID | Airport ID of Diverted Airport 1. Airport ID is a Unique Key for an Airport |
| Div1AirportSeqID | Airport Sequence ID of Diverted Airport 1. Unique Key for Time Specific Information for an Airport |
| Div1WheelsOn | Wheels On Time (local time: hhmm) at Diverted Airport Code1 |
| Div1TotalGTime | Total Ground Time Away from Gate at Diverted Airport Code1 |
| Div1LongestGTime | Longest Ground Time Away from Gate at Diverted Airport Code1 |
| Div1WheelsOff | Wheels Off Time (local time: hhmm) at Diverted Airport Code1 |
| Div1TailNum | Aircraft Tail Number for Diverted Airport Code1 |
| Div2Airport | Diverted Airport Code2 |
| Div2AirportID | Airport ID of Diverted Airport 2. Airport ID is a Unique Key for an Airport |
| Div2AirportSeqID | Airport Sequence ID of Diverted Airport 2. Unique Key for Time Specific Information for an Airport |
| Div2WheelsOn | Wheels On Time (local time: hhmm) at Diverted Airport Code2 |
| Div2TotalGTime | Total Ground Time Away from Gate at Diverted Airport Code2 |
| Div2LongestGTime | Longest Ground Time Away from Gate at Diverted Airport Code2 |
| Div2WheelsOff | Wheels Off Time (local time: hhmm) at Diverted Airport Code2 |
| Div2TailNum | Aircraft Tail Number for Diverted Airport Code2 |
| Div3Airport | Diverted Airport Code3 |
| Div3AirportID | Airport ID of Diverted Airport 3. Airport ID is a Unique Key for an Airport |
| Div3AirportSeqID | Airport Sequence ID of Diverted Airport 3. Unique Key for Time Specific Information for an Airport |
| Div3WheelsOn | Wheels On Time (local time: hhmm) at Diverted Airport Code3 |
| Div3TotalGTime | Total Ground Time Away from Gate at Diverted Airport Code3 |
| Div3LongestGTime | Longest Ground Time Away from Gate at Diverted Airport Code3 |
| Div3WheelsOff | Wheels Off Time (local time: hhmm) at Diverted Airport Code3 |
| Div3TailNum | Aircraft Tail Number for Diverted Airport Code3 |
| Div4Airport | Diverted Airport Code4 |
| Div4AirportID | Airport ID of Diverted Airport 4. Airport ID is a Unique Key for an Airport |
| Div4AirportSeqID | Airport Sequence ID of Diverted Airport 4. Unique Key for Time Specific Information for an Airport |
| Div4WheelsOn | Wheels On Time (local time: hhmm) at Diverted Airport Code4 |
| Div4TotalGTime | Total Ground Time Away from Gate at Diverted Airport Code4 |
| Div4LongestGTime | Longest Ground Time Away from Gate at Diverted Airport Code4 |
| Div4WheelsOff | Wheels Off Time (local time: hhmm) at Diverted Airport Code4 |
| Div4TailNum | Aircraft Tail Number for Diverted Airport Code4 |
| Div5Airport | Diverted Airport Code5 |
| Div5AirportID | Airport ID of Diverted Airport 5. Airport ID is a Unique Key for an Airport |
| Div5AirportSeqID | Airport Sequence ID of Diverted Airport 5. Unique Key for Time Specific Information for an Airport |
| Div5WheelsOn | Wheels On Time (local time: hhmm) at Diverted Airport Code5 |
| Div5TotalGTime | Total Ground Time Away from Gate at Diverted Airport Code5 |
| Div5LongestGTime | Longest Ground Time Away from Gate at Diverted Airport Code5 |
| Div5WheelsOff | Wheels Off Time (local time: hhmm) at Diverted Airport Code5 |
| Div5TailNum | Aircraft Tail Number for Diverted Airport Code5 |
| Duplicate | Duplicate flag marked Y if the flight is swapped based on Form-3A data |
您能否预测哪些航班将被取消或延误?
您能否预测延误时间?
您能否探究不同航空公司之间的比较?
本数据集使所有这些成为可能。非常适合学校项目、研究项目或简历构建。
# 航空飞行数据集
本数据集包含了自2018年1月以来的所有航班信息,包括各航空公司的取消和延误情况。
为方便使用,您可以使用`Combined_Flights_XXXX.csv`或`Combined_Flights_XXXX.parquet`文件访问整个年度的合并数据。这些文件还过滤掉了原始数据集中大部分为空的列。
按月包含所有列的原始数据可以找到名为`Flights_XXXX_X.csv`的文件中。
# 背景
压缩文件中的数据已从TranStats数据库中的“准时”数据库的“营销承运人准时性表现”(自2018年1月起)数据表中提取。压缩文件名称中的时间范围指示了数据的时间段;例如,XXX_XXXXX_2001_1包含2001年1月的第一月数据。
# 记录布局
以下是记录中字段的顺序:
| 列名 | 描述 |
|---|---|
| Year | 年份 |
| Quarter | 季度(1-4) |
| Month | 月份 |
| DayofMonth | 月份中的日 |
| DayOfWeek | 星期 |
| FlightDate | 航班日期(yyyymmdd) |
| Marketing_Airline_Network | 独特的营销承运人代码。当同一代码被多个承运人使用时,将使用数字后缀为早期用户,例如,PA、PA(1)、PA(2)。用于跨年度分析,请使用此字段。 |
| Operated_or_Branded_Code_Share_Partners | 报告承运人运营或品牌代码共享合作伙伴 |
| DOT_ID_Marketing_Airline | 由美国DOT分配的识别号,用于识别独特的航空公司(承运人)。独特的航空公司(承运人)是指持有并报告同一DOT证书的实体,无论其代码、名称或持有公司/企业如何。 |
| IATA_Code_Marketing_Airline | 由IATA分配的代码,通常用于识别承运人。由于同一代码可能在不同时间被分配给不同的承运人,因此代码并不总是唯一的。进行分析时,请使用独特承运人代码。 |
| Flight_Number_Marketing_Airline | 航班号 |
| Originally_Scheduled_Code_Share_Airline | 唯一的预定运营承运人代码。当同一代码被多个承运人使用时,将使用数字后缀为早期用户,例如,PA、PA(1)、PA(2)。用于跨年度分析,请使用此字段。 |
| DOT_ID_Originally_Scheduled_Code_Share_Airline | 由美国DOT分配的识别号,用于识别独特的航空公司(承运人)。独特的航空公司(承运人)是指持有并报告同一DOT证书的实体,无论其代码、名称或持有公司/企业如何。 |
| IATA_Code_Originally_Scheduled_Code_Share_Airline | 由IATA分配的代码,通常用于识别承运人。由于同一代码可能在不同时间被分配给不同的承运人,因此代码并不总是唯一的。进行分析时,请使用独特承运人代码。 |
| Flight_Num_Originally_Scheduled_Code_Share_Airline | 预定运营代码共享承运人航班号 |
| Operating_Airline | 独特承运人代码。当同一代码被多个承运人使用时,将使用数字后缀为早期用户,例如,PA、PA(1)、PA(2)。用于跨年度分析,请使用此字段。 |
| DOT_ID_Operating_Airline | 由美国DOT分配的识别号,用于识别独特的航空公司(承运人)。独特的航空公司(承运人)是指持有并报告同一DOT证书的实体,无论其代码、名称或持有公司/企业如何。 |
| IATA_Code_Operating_Airline | 由IATA分配的代码,通常用于识别承运人。由于同一代码可能在不同时间被分配给不同的承运人,因此代码并不总是唯一的。进行分析时,请使用独特承运人代码。 |
| Tail_Number | 尾号 |
| Flight_Number_Operating_Airline | 运营航空公司航班号 |
| OriginAirportID | 出发机场,机场ID。由美国DOT分配的识别号,用于识别独特的机场。用于跨年度分析机场,因为机场可以更改其机场代码,机场代码也可以被重复使用。 |
| OriginAirportSeqID | 出发机场,机场序列ID。由美国DOT分配的识别号,用于在给定时间点识别独特的机场。机场属性,如机场名称或坐标,可能随时间而变化。 |
| OriginCityMarketID | 出发机场,城市市场ID。城市市场ID是由美国DOT分配的识别号,用于识别城市市场。用于合并服务于同一城市市场的机场。 |
| Origin | 出发机场 |
| OriginCityName | 出发机场,城市名称 |
| OriginState | 出发机场,州代码 |
| OriginStateFips | 出发机场,州Fips |
| OriginStateName | 出发机场,州名称 |
| OriginWac | 出发机场,世界区域代码 |
| DestAirportID | 目的地机场,机场ID。由美国DOT分配的识别号,用于识别独特的机场。用于跨年度分析机场,因为机场可以更改其机场代码,机场代码也可以被重复使用。 |
| DestAirportSeqID | 目的地机场,机场序列ID。由美国DOT分配的识别号,用于在给定时间点识别独特的机场。机场属性,如机场名称或坐标,可能随时间而变化。 |
| DestCityMarketID | 目的地机场,城市市场ID。城市市场ID是由美国DOT分配的识别号,用于识别城市市场。用于合并服务于同一城市市场的机场。 |
| Dest | 目的地机场 |
| DestCityName | 目的地机场,城市名称 |
| DestState | 目的地机场,州代码 |
| DestStateFips | 目的地机场,州Fips |
| DestStateName | 目的地机场,州名称 |
| DestWac | 目的地机场,世界区域代码 |
| CRSDepTime | CRS出发时间(当地时间:hhmm) |
| DepTime | 实际出发时间(当地时间:hhmm) |
| DepDelay | 预定和实际出发时间之间的分钟数差。提前出发显示为负数。 |
| DepDelayMinutes | 预定和实际出发时间之间的分钟数差。提前出发设置为0。 |
| DepDel15 | 出发延误指示器,15分钟或更长时间(1=是) |
| DepartureDelayGroups | 出发延误区间,每(180分钟内的15分钟) |
| DepTimeBlk | CRS出发时间块,每小时区间 |
| TaxiOut | 出租车时间,以分钟计 |
| WheelsOff | 轮子离地时间(当地时间:hhmm) |
| WheelsOn | 轮子接触地面时间(当地时间:hhmm) |
| TaxiIn | 出租车进入时间,以分钟计 |
| CRSArrTime | CRS到达时间(当地时间:hhmm) |
| ArrTime | 实际到达时间(当地时间:hhmm) |
| ArrDelay | 预定和实际到达时间之间的分钟数差。提前到达显示为负数。 |
| ArrDelayMinutes | 预定和实际到达时间之间的分钟数差。提前到达设置为0。 |
| ArrDel15 | 到达延误指示器,15分钟或更长时间(1=是) |
| ArrivalDelayGroups | 到达延误区间,每(180分钟内的15分钟) |
| ArrTimeBlk | CRS到达时间块,每小时区间 |
| Cancelled | 取消航班指示器(1=是) |
| CancellationCode | 指定取消原因 |
| Diverted | 转飞航班指示器(1=是) |
| CRSElapsedTime | CRS飞行时间,以分钟计 |
| ActualElapsedTime | 飞行时间,以分钟计 |
| AirTime | 飞行时间,以分钟计 |
| Flights | 航班数 |
| Distance | 机场之间的距离(英里) |
| DistanceGroup | 航段距离区间,每250英里 |
| CarrierDelay | 承运人延误,以分钟计 |
| WeatherDelay | 天气延误,以分钟计 |
| NASDelay | 国家航空系统延误,以分钟计 |
| SecurityDelay | 安全延误,以分钟计 |
| LateAircraftDelay | 晚到飞机延误,以分钟计 |
| FirstDepTime | 出发机场第一次登机口出发时间 |
| TotalAddGTime | 离开登机口返回或取消航班的总地面时间 |
| LongestAddGTime | 离开登机口返回或取消航班的最长地面时间 |
| DivAirportLandings | 转飞机场着陆次数 |
| DivReachedDest | 转飞航班达到预定目的地的指示器(1=是) |
| DivActualElapsedTime | 转飞航班达到预定目的地的飞行时间,以分钟计。对于所有转飞航班,ActualElapsedTime列保持为NULL。 |
| DivArrDelay | 转飞航班达到预定目的地的预定和实际到达时间之间的分钟数差。对于所有转飞航班,ArrDelay列保持为NULL。 |
| DivDistance | 预定目的地和最终转飞机场之间的距离(英里)。对于达到预定目的地的转飞航班,值将为0。 |
| Div1Airport | 转飞机场代码1 |
| Div1AirportID | 转飞机场1的机场ID。机场ID是机场的唯一键 |
| Div1AirportSeqID | 转飞机场1的机场序列ID。机场的唯一键 |
| Div1WheelsOn | 转飞机场代码1的轮子接触地面时间(当地时间:hhmm) |
| Div1TotalGTime | 转飞机场代码1离开登机口的总地面时间 |
| Div1LongestGTime | 转飞机场代码1离开登机口的最长地面时间 |
| Div1WheelsOff | 转飞机场代码1的轮子离地时间(当地时间:hhmm) |
| Div1TailNum | 转飞机场代码1的飞机尾号 |
| Div2Airport | 转飞机场代码2 |
| Div2AirportID | 转飞机场2的机场ID。机场ID是机场的唯一键 |
| Div2AirportSeqID | 转飞机场2的机场序列ID。机场的唯一键 |
| Div2WheelsOn | 转飞机场代码2的轮子接触地面时间(当地时间:hhmm) |
| Div2TotalGTime | 转飞机场代码2离开登机口的总地面时间 |
| Div2LongestGTime | 转飞机场代码2离开登机口的最长地面时间 |
| Div2WheelsOff | 转飞机场代码2的轮子离地时间(当地时间:hhmm) |
| Div2TailNum | 转飞机场代码2的飞机尾号 |
| Div3Airport | 转飞机场代码3 |
| Div3AirportID | 转飞机场3的机场ID。机场ID是机场的唯一键 |
| Div3AirportSeqID | 转飞机场3的机场序列ID。机场的唯一键 |
| Div3WheelsOn | 转飞机场代码3的轮子接触地面时间(当地时间:hhmm) |
| Div3TotalGTime | 转飞机场代码3离开登机口的总地面时间 |
| Div3LongestGTime | 转飞机场代码3离开登机口的最长地面时间 |
| Div3WheelsOff | 转飞机场代码3的轮子离地时间(当地时间:hhmm) |
| Div3TailNum | 转飞机场代码3的飞机尾号 |
| Div4Airport | 转飞机场代码4 |
| Div4AirportID | 转飞机场4的机场ID。机场ID是机场的唯一键 |
| Div4AirportSeqID | 转飞机场4的机场序列ID。机场的唯一键 |
| Div4WheelsOn | 转飞机场代码4的轮子接触地面时间(当地时间:hhmm) |
| Div4TotalGTime | 转飞机场代码4离开登机口的总地面时间 |
| Div4LongestGTime | 转飞机场代码4离开登机口的最长地面时间 |
| Div4WheelsOff | 转飞机场代码4的轮子离地时间(当地时间:hhmm) |
| Div4TailNum | 转飞机场代码4的飞机尾号 |
| Div5Airport | 转飞机场代码5 |
| Div5AirportID | 转飞机场5的机场ID。机场ID是机场的唯一键 |
| Div5AirportSeqID | 转飞机场5的机场序列ID。机场的唯一键 |
| Div5WheelsOn | 转飞机场代码5的轮子接触地面时间(当地时间:hhmm) |
| Div5TotalGTime | 转飞机场代码5离开登机口的总地面时间 |
| Div5LongestGTime | 转飞机场代码5离开登机口的最长地面时间 |
| Div5WheelsOff | 转飞机场代码5的轮子离地时间(当地时间:hhmm) |
| Div5TailNum | 转飞机场代码5的飞机尾号 |
| Duplicate | 如果航班基于Form-3A数据交换,则重复标志标记为Y。 |
提供机构:
Kaggle
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含2018年1月以来的美国航班信息,涵盖航班取消、延误等详细数据,适用于航班状态预测、航空公司比较等分析场景。数据集来源于美国交通部的TranStats数据库,具有权威性和丰富的数据字段。
以上内容由遇见数据集搜集并总结生成



