You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Smart Traffic Management System for Ho Chi Minh City, Vietnam leveraging batch and real-time data processing, intuitive dashboards, and monitoring tools to optimize traffic flow, enhance safety, and support sustainable urban mobility through advanced analytics and user-friendly applications.
This project simulates a real-world enterprise data migration and modernization strategy. It extracts transactional data from a simulated "On-Premise" environment (hosted on AWS EC2), performs heavy distributed processing using a Hadoop/Spark cluster, and ultimately serves the data via a Cloud-Native, serverless architecture to optimize costs .
Big Data Technologies can be defined as software tools for analyzing, processing, and extracting data from an extremely complex and large data set with which traditional management tools can never deal
Production-ready Wikipedia crawler with PySpark and Apache Hive integration. Extracts article data and stores it in Hive with Parquet format and date partitioning.
End-to-end Big Data ETL Pipeline that tracks medicine stock across hospital branches. Automatically detects expired medicines and low stock situations daily. Built with Python MySQL Hadoop HDFS Apache Hive and Apache Airflow DAG automation.
Big Data project integrating Polymarket prediction data and Binance cryptocurrency rates to analyse relationships between market expectations and real prices.
An automated engine that bridges Data Engineering and System Architecture. It fetches real-world Kaggle datasets and dynamically generates professional pipeline diagrams using Architecture-as-Code (AaC)