This project focuses on building a personalized movie recommendation system using user ratings, demographics, and movie metadata.
The objective is to analyze user behavior and apply multiple recommendation techniques to deliver accurate and relevant movie suggestions.
The dataset used is inspired by the MovieLens-style format and includes ratings, user profiles, and movie information.
As a Data Scientist at Zee, the goal is to:
- Understand user viewing preferences
- Build scalable and accurate recommender systems
- Improve user engagement through personalization
- Compare multiple recommendation algorithms and evaluation metrics
The dataset consists of three main files:
- Format:
UserID::MovieID::Rating::Timestamp - Ratings are on a 5-star scale
- Each user has rated at least 20 movies
- Used as the core signal for preference learning
- Format:
UserID::Gender::Age::Occupation::Zip-code - Contains demographic information
- Useful for user segmentation and behavior analysis
- Format:
MovieID::Title::Genres - Genres are pipe (
|) separated (e.g., Action|Drama|Comedy) - Helps in content-based filtering
- User–User similarity
- Item–Item similarity
- Pearson Correlation
- Cosine Similarity
- Genre-based similarity
- User preference profiling
- User-based KNN
- Item-based KNN
- Implemented using cosine similarity
- SVD (Singular Value Decomposition)
- Surprise Library SVD
- CMFREC (Collective Matrix Factorization)
- User embeddings
- User–User similarity
- User–Item similarity
The recommender systems are evaluated using:
- MAPE (Mean Absolute Percentage Error)
- RMSE / MSE
- NDCG (Normalized Discounted Cumulative Gain)
- MRR (Mean Reciprocal Rank)
These metrics help assess both rating prediction accuracy and ranking quality.
pandas
numpy
matplotlib
seaborn
scipy
scikit-learn
surprise
cmfrec