A selection of hands-on projects demonstrating real-world data science, modeling, and cloud deployment; built with Python, scikit-learn, PySpark, XGBoost/CatBoost, SHAP, and shipped via Streamlit/Render and AWS (S3, SageMaker, Lambda, MWAA/Airflow), with visuals in Tableau.


SignalGraph (PySpark + Postgres/Teradata + Prophet)

SignalGraph is a telecom-focused anomaly detection and forecasting project that processes large-scale 4G/5G performance data (latency, jitter, PRB utilization, packet loss) through a Spark ETL pipeline and delivers real-time network insights. It demonstrates modern data workflows—from feature engineering and anomaly flagging to forecasting and graph analytics; built for scale, transparency, and decision-making in telecom environments.

Highlights

📌 Business Impact: Helps telecom teams detect anomalies early, forecast degradation risk, and evaluate trade-offs in policy thresholds—improving service reliability and decision-making at network scale.

Tech Stack

📁 View Full Project


NetworkIQ — Incident Risk Monitor (“One Project, Three Platforms”)

NetworkIQ is a telecom-grade incident risk system that predicts network congestion and visualizes cell-site risk across three deployment platforms (Render, GCP Cloud Run, AWS on the roadmap). It showcases how AI-first system design can be made platform-agnostic, scalable, and portable; aligning with orchestration and enterprise deployment strategies.

Highlights

📌 Business Impact: NetworkIQ accelerates incident detection (reducing MTTD), supports better customer experience proxies (NPS), and lowers cost per GB—while enabling consistent, explainable AI across multiple clouds.

Tech Stack

👉 View Full Project


BNPL Credit Risk Insights Dashboard (Python + Streamlit)

A hands-on, end-to-end BNPL risk project that turns raw lending/repayment data into an interactive decision dashboard. It demonstrates modern risk workflows—from feature engineering and modeling to monitoring and “what-if” policy simulation—built for clarity, speed, and explainability.

Highlights

📌 Business Impact: Helps risk teams test policies before rollout, quantify approval vs. losses, and document governance-ready decisions.

🔗 View Full Project


Credit Risk Model Deployment & Monitoring (AWS + PySpark + CatBoost)

This flagship project showcases an end-to-end credit risk modeling pipeline — from scalable data processing to cloud deployment — aligned with best practices in financial services. Built using PySpark, CatBoost, SHAP, and AWS (S3, CLI), it simulates how modern risk pipelines are deployed and monitored at scale.

The full solution includes:

💼 Business Impact: This project simulates a realistic production-grade credit risk pipeline — bridging data engineering, ML modeling, and cloud deployment. It highlights how interpretability and geographic segmentation can inform policy, governance, and model recalibration.

📁 View Full Project


Telecom Churn Modeling & Retention Strategy

This project demonstrates how predictive modeling and customer segmentation can be used to drive retention strategy in a telecom context. Using a publicly available customer dataset, I developed a full churn risk pipeline.

The final solution integrates:

💡 Business Impact: The project enables strategic prioritization by identifying high-risk, high-value customers at risk of churn, supporting proactive retention efforts, revenue protection, and long-term profitability.

👉 View Full Project


Telecom Customer Segmentation with Python

Objective:
Developed a customer segmentation model using unsupervised learning on simulated postpaid telecom data to identify actionable behavioral clusters for marketing, retention, and product strategy.

Highlights:

Key Findings:

📌 Key Findings

Segment Description Strategy
💬 Voice-Dominant Users High voice & intl use,
short tenure
Add voice bundles,
retention plans
📱 High-Usage Streamers Heavy data/streaming,
higher churn
Promote unlimited/
entertainment perks
💸 Low-Value Starters Low usage,
low tenure
Grow via onboarding
& upselling
🧭 Loyal Minimalists Long tenure, low usage,
least churn
Reward loyalty,
protect margin

Tech Stack: Python, pandas, scikit-learn, matplotlib, seaborn
Core Skills Demonstrated: Customer analytics, unsupervised learning, PCA, strategic interpretation, stakeholder communication

👉 View Full Project


Customer Churn Predictor

Goal: Predict whether a telecom customer is likely to churn using an end-to-end machine learning pipeline.

Description:
This interactive app allows users to input customer features (e.g., tenure, contract type, monthly charges) and receive a real-time churn prediction. It includes data preprocessing, feature engineering, model training, cloud deployment, and live user interaction.

Screenshot:
Churn Prediction App Screenshot

⚙️ Tech Stack

Purpose Tool
Language Python 3
ML Library scikit-learn
Visualization seaborn, matplotlib
Data Handling pandas, NumPy
Deployment GitHub Pages

📶 Telecom Engagement Monitoring with Fractional Logistic Regression

This project builds a full monitoring pipeline to track postpaid customer engagement over time using simulated telecom data. The model uses fractional logistic regression to predict monthly engagement as a proportion and evaluates its stability across development and monitoring datasets.

👉 View Full Project Notebook


🧰 Tech Stack

Component Library / Tool
Modeling statsmodels (GLM - Binomial with Logit link)
Data Handling pandas, numpy
Evaluation Metrics sklearn.metrics
Stability Analysis Custom PSI logic
Visualization matplotlib

📌 Highlights & Findings


This project demonstrates how to proactively monitor engagement models using interpretable statistics and custom stability metrics, with outputs ready for integration into model governance workflows.

Fraud Detection with XGBoost & SHAP

A simulated end-to-end machine learning pipeline that predicts fraudulent transactions using XGBoost and interprets the model with SHAP values.

Objective

Detect fraudulent transactions using synthetic data with engineered features such as transaction type, amount, time, and customer behavior patterns.

Key Steps

⚙️ Tech Stack

Purpose Tool
Language Python
ML Library XGBoost, scikit-learn
Explainability SHAP
Data Simulation NumPy, pandas
Visualization matplotlib, seaborn
Deployment Local / GitHub

📈 Sample Output

📎 View on GitHub


Airline Flight Delay Prediction with Python

A full machine learning pipeline that predicts flight delays using simulated airline data enriched with real U.S. airport codes and weather features. The project explores exploratory analysis, model training, and practical recommendations for airport operations.

Objective

Predict whether a flight will be delayed based on features like carrier, origin, departure time, distance, and simulated weather patterns.

Key Steps

⚙️ Tech Stack

Purpose Tool
Language Python 3
ML Library scikit-learn
Visualization matplotlib, seaborn
Simulation NumPy, pandas
Mapping (EDA) Plotly, geopandas
Deployment GitHub Pages (Markdown)

📂 Read the Full Report

📎 View Full Project

🛠️ In Progress

🗺️ Geospatial Risk Dashboard (Tableau)

Building an interactive Tableau dashboard to visualize public health and economic risk indicators across Texas counties.

Will be added soon…


What’s Next


For more details, view my full portfolio homepage or connect via LinkedIn.