
A cloud-based movie recommendation system built on MovieLens 32M ratings. The project includes a full GCP big-data pipeline, BigQuery analytics, Databricks Spark processing, Spark MLlib ALS model training, Looker Studio dashboarding, and a live Next.js portfolio app deployed on Vercel. The live app serves ALS-powered collaborative-filtering recommendations with TMDB as a cold-start fallback.
I built an end-to-end cloud computing and big data project using the MovieLens 32M dataset, containing more than 32 million ratings and 87,000+ movies.
The project starts with a full Google Cloud pipeline: raw MovieLens CSV files are staged in Cloud Storage, loaded into BigQuery, analyzed with SQL, and optimized using partitioning and clustering. I also built a Looker Studio dashboard with key insights such as ratings over time, genre performance, top-rated movies, rating distribution, and interactive filters.
For distributed processing, I used Databricks and Apache Spark to reproduce the same analytics using both Spark SQL and DataFrame operations. I also inspected the Spark physical plan, including Photon execution operators, to understand how Databricks optimized the queries.
The machine learning stage uses Spark MLlib ALS to train a collaborative-filtering recommender. The model was trained on an 80/20 train-test split and achieved an RMSE of 0.8081 on the held-out test set. I then exported ALS item factors and used them to build movie-to-movie recommendations.
Finally, I turned the project into a live portfolio app using Next.js, Vercel, Supabase, and the TMDB API. The app allows users to search for movies, view posters and metadata, and get recommendations. For movies covered by the trained MovieLens ALS model, the app serves collaborative-filtering recommendations from Supabase. For newer or cold-start movies, it gracefully falls back to TMDB-based recommendations.
This project demonstrates cloud data engineering, big-data analytics, distributed Spark processing, ML model training, dashboarding, deployment, and production-style hybrid recommendation design.
Gallery





