Skip to content
View M20Jay's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report M20Jay

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
M20Jay/README.md

      

Typing SVG

I build end-to-end production pipelines — from raw data through model development, containerised deployment, real-time monitoring, and automated retraining. Four domains: fraud detection, credit risk, environmental ML, and African language NLP — built to enterprise scale.

Currently executing a 15-week intensive MLOps programme — one complete production-grade project per week. Every project is fully deployed, monitored, and documented. No shortcuts.

ML Engineer coding

📍 Based in Nairobi, Kenya
✉️ ngangam93@gmail.com
Air Quality API: 18.184.3.203:8000/docs — Live on AWS EC2 Frankfurt
Fraud Detection: 18.184.3.203:8003/docs
Churn Prediction: 18.184.3.203:8002/docs
Customer Segmentation: 18.184.3.203:8004/docs
Credit Risk Scoring: 18.184.3.203:8005/docs
Recommendation Dashboard: recommendation-system-dashboard.onrender.com
Recommendation API: 18.184.3.203:8001/docs

Deployment Status

Air Quality API live on AWS EC2 Frankfurt — Deployed 30 May 2026. Docker containerised. No cold starts. No monthly suspensions. 24/7 uptime.

🔭 Currently building: Week 9 — Apache Airflow · DAGs · Parallel Model Training · Scheduling
🌱 Next: Week 10 — AWS S3 · RDS · ECR · HTTPS · Domain Name

🤝 Open to: African AI · banking · telecom · healthcare · environmental analytics
🏆 Best Paper Award — Beijing Institute of Technology 2018 · 34 countries
"Data is only as powerful as the institution's willingness to act on it. I have spent ten years building both."


🔥 Proven Results

Achievement Detail
🌍 Environmental anomaly detection ARIMA RMSE 9.93 · PM2.5 spike 469 µg/m³ detected · 11,998 OpenAQ readings · live dashboard + API
🎬 Movie Recommendation System Item-CF RMSE 0.9540 · P@10 69.7% · 943 users · 1,682 movies · CineAI Netflix-standard dashboard · live
🎯 Real-time fraud detection 284,807 transactions · Kafka streaming · 22ms response · live production API
💳 Credit risk scoring + SHAP explainability XGBoost · ROC-AUC 0.703 · SHAP waterfall · Basel III ready · live production API
🔍 RAG Document Search — prototype 1,244-page semantic search · 4,329 chunks · LaBSE · 109 languages · full system Week 12
🌍 Kiswahili NLP — prototype Zero-shot classification · UNEP Strategic Objectives · mBERT · 104 languages · full system Week 15
📊 Institutional M&E Architecture 50+ KPIs · Results-Based Management · World Bank KYEOP — A-rating from Ministry of Public Service
🏆 Best Paper Award 22nd International ICIT · Beijing Institute of Technology · 2018 · 34 countries
🔄 8 production systems 6 Systems Live on AWS EC2 Frankfurt · deployed 31 May 2026

🗓️ 15-Week MLOps Programme

One production-grade project per week — model development · containerisation · cloud deployment · live monitoring · automated retraining. Every week ships.

Progress Week 8 complete — 8 done · 7 remaining

Week Project Stack Status
01 Churn Prediction Pipeline XGBoost · FastAPI · Docker · PostgreSQL · Grafana Live API · Repo
02 Real-Time Fraud Detection LightGBM · Kafka · Redis · FastAPI · Docker · Grafana Live API · Repo
03 Customer Segmentation KMeans · PCA · MLflow · Evidently · Streamlit · FastAPI · Docker Live API · Repo
04 RAG Document Search System LaBSE · ChromaDB · FastAPI · pypdf · Docker ✅ Local confirmed · AWS pending · Repo
05 Credit Risk Scoring + Propensity + RFM XGBoost · SHAP · ADASYN · DVC · RFM · FastAPI · PostgreSQL · Grafana · Docker Live API · Repo
06 Environmental Anomaly Detection + Time Series 🌍 ARIMA · Prophet · LSTM (PyTorch) · Isolation Forest · Streamlit · FastAPI · Docker Dashboard · API · Repo
07 Recommendation System Item-CF · SVD · scikit-surprise · FastAPI · Streamlit · PostgreSQL · Docker Dashboard · API · Repo
08 MLOps Automation MLflow · DVC · Evidently AI · GitHub Actions · Prefect · Week 6 case study ✅ Complete
09 Apache Airflow — Pipeline Orchestration Airflow · DAGs · Scheduling · Alerting · Docker 🔄 In Progress
10 Cloud Deployment — AWS / GCP EC2 · RDS · ECR · HTTPS · Cloud Run 🔲 Pending
11 Environmental Capstone 🌍 Random Forest · LSTM · Global Forest Watch · FastAPI · Docker 🔲 Pending
12 Advanced RAG Chatbot LangChain · FAISS · HuggingFace · FastAPI · Docker 🔲 Pending · Prototype
13 Apache Spark — Big Data PySpark · Spark MLlib · dbt · BigQuery 🔲 Pending
14 NLP — Text Classification HuggingFace · BERT · spaCy · FastAPI · Docker 🔲 Pending
15 Kiswahili NLP 🌍 mBERT · AfriBERTa · HuggingFace Hub · AWS 🔲 Pending · Prototype

Every project: production-grade code · containerised deployment · documented README · tested endpoints · no shortcuts.

Week APIs Building

💼 Portfolio Projects


🌍 Project 15 — Kiswahili NLP Environmental Classifier (Week 15 — In Progress)

mBERT · AfriBERTa · HuggingFace Transformers · MLflow · FastAPI · Docker · AWS

Kiswahili environmental text classifier connecting East African language knowledge to global environmental monitoring. Over 200 million East Africans speak Kiswahili yet most AI systems are built primarily in English — leaving indigenous communities unable to contribute environmental observations in their own language.

Classifies Kiswahili text by UNEP Strategic Objective:

Objective Focus Example
SO1 Climate Stability Mabadiliko ya tabianchi yanaathiri wakulima
SO2 Biodiversity Viumbe vingi vya porini viko hatarini kutoweka
SO3 Pollution & Waste Plastiki nyingi zinatupwa baharini

Python HuggingFace PyTorch MLflow Docker AWS

🔨 Prototype Complete · Zero-shot classification running · Full system Week 15 · Open-source release on HuggingFace Hub · Repository

🎬 Project 7 — Movie Recommendation System (Week 7 — Latest)

Item-CF · SVD · scikit-surprise · FastAPI · Streamlit · PostgreSQL · Docker · AWS EC2

Production recommendation engine trained on 100,000 ratings from 943 users on 1,682 movies. Item-CF wins with RMSE 0.9540 and Precision@10 of 69.7%. Netflix-standard CineAI dashboard with three tabs — recommendations, data insights, model performance.

Python FastAPI Streamlit Docker PostgreSQL scikit-surprise

CineAI Data Insights Dashboard

✅ Complete · Week 7 · Tests: 6/6 passing · Live Dashboard · Live API · Repository

⚠️ Deployment Notice: Full production system deploying to AWS EC2 as part of Week 15. Prototype complete and documented above.


🌍 Project 6 — Environmental Anomaly Detection + Time Series (Week 6)

ARIMA · Prophet · LSTM (PyTorch) · Isolation Forest · Streamlit · FastAPI · Docker · AWS EC2

Production environmental monitoring pipeline trained on 11,998 real PM2.5 sensor readings from 5 Nairobi locations via OpenAQ. Answers two questions automatically for every hourly reading: What will PM2.5 be next? Is this reading dangerous?

Key EDA findings:

  • PM2.5 peaks at 4am every day — night burning trapped in cold air
  • Friday is consistently the worst day of the week
  • Maximum spike: 469 µg/m³ on 2024-02-18 at 4am — 93x the WHO annual safe limit
  • 1.8% of all readings exceed the dangerous US EPA threshold of 55 µg/m³
Model RMSE MAE Type
ARIMA ✅ Best 9.93 8.35 Forecasting
LSTM (PyTorch) 19.46 17.87 Deep Learning
Prophet 22.05 19.40 Forecasting
Isolation Forest Anomaly Detection

Python PyTorch FastAPI Streamlit Docker scikit-learn

Nairobi Air Quality Dashboard — Anomaly Detection and PM2.5 Forecast

✅ Complete · Week 6 · Tests: 10/10 passing · Live Dashboard · Live API · Repository


💳 Project 5 — Credit Risk Scoring Pipeline (Week 5)

XGBoost · SHAP · ADASYN · DVC · RFM · FastAPI · PostgreSQL · Grafana · Docker · AWS EC2 Credit risk scoring for loan applicants — answers three questions simultaneously: Will they default? Will they accept the offer? How valuable are they? Built on 252,971 real LendingClub loans. ROC-AUC 0.703. SHAP explainability for every decision — Basel III compliant audit trail. Propensity scoring, RFM segmentation, 6-panel Grafana monitoring dashboard, PostgreSQL predictions storage.

Python XGBoost SHAP FastAPI Docker PostgreSQL Grafana

SHAP Waterfall — Credit Risk Explainability

✅ Complete · Week 5 · Live API · Repository


🔍 Project 4 — RAG Document Search System (Week 4 — Prototype)

LaBSE · ChromaDB · FastAPI · pypdf · Docker Semantic search across a 1,244-page technical document — 4,329 chunks indexed using LaBSE multilingual embeddings supporting 109 languages. Questions in any language retrieve relevant passages with exact page references in under one second. Full production system completing at Week 12.

Python FastAPI Docker ChromaDB HuggingFace

🔨 Prototype Complete · Week 4 · Full system Week 12 · Repository


📊 Project 3 — Customer Segmentation Pipeline (Week 3)

KMeans · PCA · StandardScaler · MLflow · Evidently · Streamlit · FastAPI · Docker Telecom customer segmentation — 7,032 customers grouped into 4 behavioural segments. Optimal K selected through elbow method and silhouette scoring. Live FastAPI inference, Streamlit dashboard, Evidently drift monitoring, MLflow experiment tracking.

Python scikit-learn FastAPI Docker MLflow Streamlit

Customer Segmentation Streamlit Dashboard

✅ Complete · Week 3 · Live API · Repository


🚨 Project 2 — Real-Time Fraud Detection Pipeline (Week 2)

LightGBM · Kafka · Redis · FastAPI · Prometheus · Grafana · Docker Real-time fraud scoring — 284,807 transactions, 22ms response time. Kafka streaming pipeline with Redis caching, Prometheus monitoring and 5-panel Grafana dashboard.

Python LightGBM Kafka FastAPI Docker Grafana

Fraud Detection Grafana Dashboard

✅ Complete · Week 2 · Live API · Repository


💳 Project 1 — Customer Churn Prediction (Week 1)

XGBoost · SMOTE · FastAPI · Docker · PostgreSQL · Grafana End-to-end telecom churn pipeline — feature engineering, class balancing, model training, containerised API deployment with live monitoring dashboard.

Python XGBoost FastAPI Docker Grafana

Churn Prediction API Docs

✅ Complete · Week 1 · Live API · Repository


🛠️ Skills

💻 Programming and Data Science Python  R  SQL  PyTorch  scikit-learn  Pandas  NumPy

🤖 NLP and AI HuggingFace  mBERT  AfriBERTa  LangChain  PyTorch  LaBSE  ChromaDB

⚙️ Deployment and Infrastructure FastAPI  Streamlit  Docker  PostgreSQL  Redis  Kafka

☁️ Cloud and MLOps AWS  GitHub Actions  Grafana  MLflow  Prometheus  Evidently  DVC

📊 Visualisation and BI Streamlit  Grafana  Plotly  Power BI  Tableau


🎯 Core Competencies

Area Skills
Machine Learning Fraud detection · churn · credit risk · segmentation · time series forecasting · anomaly detection · SHAP explainability
Deep Learning LSTM (PyTorch) · sequence modelling · time series · OOP neural network architecture
MLOps End-to-end pipelines · Docker · MLflow · Evidently drift monitoring · pytest · DVC
Streaming Real-time scoring · Apache Kafka · Redis caching · sub-22ms latency
NLP & RAG Semantic search · LaBSE · ChromaDB · vector embeddings · multilingual · 109 languages
Environmental ML Air quality forecasting · PM2.5 anomaly detection · OpenAQ · ARIMA · Isolation Forest
Cloud AWS EC2 · RDS · ECR · Docker · HTTPS · Render deployment
Research & M&E MSc Marketing Analytics · World Bank KYEOP · RBM · 50+ KPI frameworks · Board diversity research

🌐 Socials

GitHub  LinkedIn  HuggingFace  X


📈 GitHub Stats

GitHub Streak

Profile Views

🏆 GitHub Trophies

Snake


🏆 Goals

  • 🎓 MSc Marketing Analytics — University of Nairobi — In Progress 2026
  • 🌍 Kiswahili NLP — Building African language AI for East African communities — Full system Week 15
  • ☁️ AWS/GCP Cloud Certification — Target 2026

15 weeks. 15 production projects. One complete MLOps engineer. Building in public — no shortcuts.

Pinned Loading

  1. air-quality-anomaly-detection air-quality-anomaly-detection Public

    Production anomaly detection pipeline for air quality time series data. ARIMA · Prophet · LSTM · Isolation Forest · FastAPI · Streamlit · Docker

    Python 1

  2. rag-unep-documents rag-unep-documents Public

    RAG system on UNEP GEO-7 documents — Week 4 of 15 MLOps Programme

    Python 1

  3. recommendation-system recommendation-system Public

    Production recommendation engine — collaborative filtering, matrix factorization (SVD), and item-based filtering trained on MovieLens 100K. FastAPI · PostgreSQL · Streamlit · Docker · Render · Week…

    Python

  4. kiswahili-nlp kiswahili-nlp Public

    Kiswahili environmental text classifier using mBERT — classifying East African language text by UNEP Strategic Objective. Built by Martin James Ng'ang'a | github.com/M20Jay

  5. credit-risk-scoring-pipeline credit-risk-scoring-pipeline Public

    Production credit risk scoring API — XGBoost · SHAP · DVC · RFM · FastAPI · Docker · PostgreSQL · Grafana

    Python

  6. fraud-detection-pipeline fraud-detection-pipeline Public

    Week 2 — Real-time fraud detection | XGBoost · LightGBM · Kafka · Redis · Prometheus · FastAPI · Docker · PostgreSQL · Grafana

    Python