This project presents an exploratory data analysis (EDA) of a Netflix dataset using Python. It covers data cleaning, transformation, visualization, and insight extraction to uncover meaningful patterns in Netflix movie/TV data. The goal is to understand genre trends, popularity distribution, voting patterns, and release year insights.
-
📥 Data Loading: Load and inspect structured CSV datasets.
-
🧹 Data Cleaning:
- Handle missing values and duplicates.
- Convert date columns to datetime format and extract year.
- Drop irrelevant columns.
-
📊 Exploratory Data Analysis:
- Descriptive statistics.
- Vote categorization into
popular
,average
,below_avg
,not_popular
. - Genre splitting and normalization.
-
📈 Visualizations:
- Genre frequency distribution.
- Vote category distribution.
- Popularity extremes (most/least popular movies).
- Release year trends.
-
📌 Insights Extraction: Identify top genres, most popular titles, and yearly content trends.
netflix-data-analysis/
│
├── Netflix_Data_Analysis.ipynb # Main Jupyter notebook
├── netflix_dataset.csv # Dataset used
├── README.md # Project description
└── requirements.txt # Python dependencies
git clone https://github.com/vinitjain2005/Netflix-Data-Analysis.git
cd Netflix-Data-Analysis
pip install -r requirements.txt
⚠️ Make sure you have Jupyter installed:pip install notebook
jupyter notebook
Open Netflix_Data_Analysis.ipynb
and run the cells to reproduce the analysis.
- Python 3.x
- Jupyter Notebook
- Pandas – Data manipulation and cleaning
- Matplotlib / Seaborn – Data visualization
- Genre distribution bar charts.
- Vote category counts.
- Most popular vs least popular movies.
- Release year histogram.
This project is open-source under the MIT License.
Contributions are welcome! Fork the repository, enhance the notebook, or suggest new visualizations via pull requests.