Movies Dataset
收藏Netflix Movies and TV Shows Data Analysis using SQL
Overview
This project involves a comprehensive analysis of Netflixs movies and TV shows data using SQL. The goal is to extract valuable insights and answer various business questions based on the dataset.
Objectives
- Analyze the distribution of content types (movies vs TV shows).
- Identify the most common ratings for movies and TV shows.
- List and analyze content based on release years, countries, and durations.
- Explore and categorize content based on specific criteria and keywords.
Dataset
The data for this project is sourced from the Kaggle dataset:
- Dataset Link: Movies Dataset
Schema
sql DROP TABLE IF EXISTS netflix; CREATE TABLE netflix ( show_id VARCHAR(5), type VARCHAR(10), title VARCHAR(250), director VARCHAR(550), casts VARCHAR(1050), country VARCHAR(550), date_added VARCHAR(55), release_year INT, rating VARCHAR(15), duration VARCHAR(15), listed_in VARCHAR(250), description VARCHAR(550) );
Business Problems and Solutions
1. Count the Number of Movies vs TV Shows
Objective: Determine the distribution of content types on Netflix.
2. Find the Most Common Rating for Movies and TV Shows
Objective: Identify the most frequently occurring rating for each type of content.
3. List All Movies Released in a Specific Year (e.g., 2020)
Objective: Retrieve all movies released in a specific year.
4. Find the Top 5 Countries with the Most Content on Netflix
Objective: Identify the top 5 countries with the highest number of content items.
5. Identify the Longest Movie
Objective: Find the movie with the longest duration.
6. Find Content Added in the Last 5 Years
Objective: Retrieve content added to Netflix in the last 5 years.
7. Find All Movies/TV Shows by Director Rajiv Chilaka
Objective: List all content directed by Rajiv Chilaka.
8. List All TV Shows with More Than 5 Seasons
Objective: Identify TV shows with more than 5 seasons.
9. Count the Number of Content Items in Each Genre
Objective: Count the number of content items in each genre.
10. Find each year and the average numbers of content release in India on Netflix. Return top 5 year with highest avg content release!
Objective: Calculate and rank years by the average number of content releases by India.
11. List All Movies that are Documentaries
Objective: Retrieve all movies classified as documentaries.
12. Find All Content Without a Director
Objective: List content that does not have a director.
13. Find How Many Movies Actor Salman Khan Appeared in the Last 10 Years
Objective: Count the number of movies featuring Salman Khan in the last 10 years.
14. Find the Top 10 Actors Who Have Appeared in the Highest Number of Movies Produced in India
Objective: Identify the top 10 actors with the most appearances in Indian-produced movies.
15. Categorize Content Based on the Presence of Kill and Violence Keywords
Objective: Categorize content as Bad if it contains kill or violence and Good otherwise. Count the number of items in each category.
Findings and Conclusion
- Content Distribution: The dataset contains a diverse range of movies and TV shows with varying ratings and genres.
- Common Ratings: Insights into the most common ratings provide an understanding of the contents target audience.
- Geographical Insights: The top countries and the average content releases by India highlight regional content distribution.
- Content Categorization: Categorizing content based on specific keywords helps in understanding the nature of content available on Netflix.




