Danielolevsky92/wwii-bombing-operations-eda
收藏Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Danielolevsky92/wwii-bombing-operations-eda
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- tabular-classification
pretty_name: WWII Aerial Bombing Operations EDA
size_categories:
- 100K<n<1M
configs:
- config_name: default
data_files:
- path: operations.csv
split: train
---
# WWII Aerial Bombing Operations - EDA
## Video Presentation
<video src="https://huggingface.co/datasets/Danielolevsky92/wwii-bombing-operations-eda/resolve/main/DanielOlevskyEDA_Presentation.mp4" controls="controls" style="max-width: 720px;"></video>
## Overview
This project presents an end-to-end Exploratory Data Analysis (EDA) of aerial bombing operations conducted during World War II. The goal is to uncover patterns in bombing intensity, target selection, aircraft usage, and geographic distribution across the war years.
The analysis is based on a dataset published by the U.S. Air Force (USAF), containing over 178,000 bombing mission records from 1939 to 1945.
---
## Objectives
- Understand how bombing intensity changed over the course of the war
- Identify the most heavily targeted countries and regions
- Analyze which aircraft types were used most frequently
- Examine seasonal and geographic patterns in bombing campaigns
- Investigate the distribution of bomb loads across missions
---
## Dataset Description
- **Source:** U.S. Air Force (USAF) - published on Kaggle
- **Original rows:** 178,281 | **After cleaning:** 113,139
- **Original columns:** 46 | **After cleaning:** 21
- **File size:** 28.48 MB
- **Time period:** 1939 - 1945
Key columns in the dataset:
- `Mission Date` - date of the bombing mission
- `Theater of Operations` - European (ETO), Pacific (PTO), Mediterranean (MTO), etc.
- `Target Country` - country that was bombed
- `Target City` - city that was bombed
- `Aircraft Series` - type of aircraft used
- `Altitude (Hundreds of Feet)` - flight altitude
- `Attacking Aircraft` - number of aircraft in the mission
- `Total Weight (Tons)` - total weight of bombs dropped
- `High Explosives Weight (Tons)` - weight of high explosive bombs
- `Target Type` - type of target (airdrome, city, bridge, etc.)
---
## Main Research Question
> How did the intensity of WWII aerial bombing operations change over the years of the war, and which countries were targeted the most?
---
## Data Cleaning
Steps performed:
- Dropped 27 columns with more than 50% missing values
- Removed rows with impossible values:
- Altitude above 100,000 feet
- Coordinates outside valid ranges (-90/90 latitude, -180/180 longitude)
- Bomb weights above 2,000 tons
- Parsed Mission Date from text to proper datetime format
- Extracted Year and Month as separate columns for time analysis
- Checked for duplicates - none found
**Result:** Dataset reduced from 178,281 rows and 46 columns to 113,139 rows and 21 columns
---
## Outlier Handling
Outliers were identified using descriptive statistics and the IQR method:
- **Invalid outliers** (impossible values) were removed - e.g. altitude of 4,000,000 feet, coordinates like latitude 1,108
- **Realistic outliers** were kept - e.g. missions with 332 aircraft, which did happen historically in massive raids
**Justification:** This data was digitized from old WWII paper records, so data entry errors are expected. However, extreme but historically plausible values were retained to preserve dataset integrity.
---
## Descriptive Statistics
Key statistics from the cleaned dataset:
- **Average altitude:** 14,100 feet per mission
- **Median bomb load:** 17 tons per mission
- **Average bomb load:** 30.4 tons (pulled up by large raids)
- **Most active year:** 1944 with 55,384 missions
- **Peak bombing:** 1,974,380 tons dropped in 1944
---
## Exploratory Data Analysis
### Key Finding 1: Bombing Intensity per Year
1944 was by far the most active year with 55,384 missions - coinciding with the Allied D-Day invasion of Europe. The war started with just 18 missions in 1940 and escalated dramatically.

### Key Finding 2: Most Bombed Countries
Germany was the most targeted country with 36,496 missions - more than double Italy in second place (15,445). The top 3 were all European countries.

### Key Finding 3: Most Used Aircraft
The B24 Liberator (25,365 missions) and B17 Flying Fortress (23,801 missions) were the backbone of Allied bombing operations - together accounting for nearly 50,000 missions.

### Key Finding 4: Total Bombs Dropped per Year
Nearly 2 million tons of bombs were dropped in 1944 alone - more than all previous years combined. This 25,000x increase from 1940 reflects the full mobilization of Allied air power.

### Key Finding 5: Correlation Analysis
Strongest correlation: High Explosives Weight vs Total Weight (0.93). Surprisingly, year had almost no correlation with bomb load per mission (0.03) - meaning escalation was driven by MORE missions, not bigger ones.

---
## Research Questions
### Q1: Did bombing intensity differ between theaters?
Yes - the European Theater (ETO) had 55,299 missions vs the Pacific Theater (PTO) with 30,375 - nearly double. This explains why Germany was the most bombed country.
**Insight:** The Allied strategy prioritized defeating Germany first before focusing on Japan - clearly reflected in the data.

---
### Q2: Which months had the most bombing missions?
Spring months - March (13,948) and April (14,962) - had the most activity. Winter months (January, December) had significantly fewer missions.
**Insight:** Spring offered the best flying weather in Europe - longer days, clearer skies, and less fog compared to winter months.

---
### Q3: Did missions fly higher as the war progressed?
Yes - average altitude increased from 6,740 feet in 1940 to a peak of 15,510 feet in 1944, then dropped in 1945.
**Insight:** As German anti-aircraft (Flak) defenses grew stronger, Allied planes flew higher to avoid being shot down. The drop in 1945 suggests weakening German defenses near the end of the war.

---
### Q4: What was the distribution of bomb loads across missions?
The distribution is heavily right-skewed - most missions carried under 50 tons (median 17 tons), while a small number of massive raids carried hundreds of tons.
**Insight:** WWII bombing escalation came from flying MORE missions rather than loading more bombs onto each mission - confirmed by the near-zero correlation between year and bomb load per mission.

---
### Q5: Which target types were most common?
Airdromes (airports) were the most common known target at 27.3%, followed by City Areas (20%) and Marshalling Yards (7.2%). Nearly 30% of targets were unidentified.
**Insight:** The Allied strategy prioritized destroying enemy air power first (airdromes), then cutting off supply lines (marshalling yards and bridges).

---
## Key Insights Summary
- The war escalated dramatically - from 18 missions in 1940 to over 55,000 in 1944
- Germany was the most bombed country with 36,496 missions
- The European Theater was twice as active as the Pacific Theater
- The B24 and B17 were the backbone of Allied bombing operations
- Spring months had the most activity due to better flying weather
- Escalation came from MORE missions - not bigger ones
- Airdromes were the most common target - reflecting the strategy of destroying enemy air power first
---
## Limitations
- The dataset primarily covers USAF records - the Pacific campaign against Japan is underrepresented
- Nearly 30% of target types are unidentified - expected from digitized historical paper records
- Some columns had up to 99.99% missing values and had to be dropped
- Data was collected from paper records digitized decades after the war - errors are expected
---
## Tools & Technologies
- Python (Pandas, NumPy, Matplotlib, Seaborn)
- Google Colab
- Kaggle API
---
## Files
- `operations.csv` - the original dataset from USAF via Kaggle
- `WWII_Bombing_EDA_Daniel_Olevsky.ipynb` - full EDA notebook with all code and analysis
---
## Author
Daniel Olevsky
*This analysis was conducted as part of a Data Science assignment at Reichman University. The dataset is used for educational purposes only.*
提供机构:
Danielolevsky92



