A selection of hands-on projects demonstrating real-world data science, AI system design, and cloud deployment; built with Python, SQL, PySpark, scikit-learn, XGBoost/CatBoost, SHAP, Prophet, Neo4j, and LangChain, and shipped via Streamlit, Render, Hugging Face Spaces, and multi-cloud platforms (AWS, GCP), including AWS SageMaker, Lambda, S3, MWAA/Airflow, and GCP Vertex AI/BigQuery ML. Workflows include RAG pipelines, schema validation, and drift monitoring, with visuals in Tableau and Plotly.


EvalOps: Production-Grade LLM Evaluation and Observability Platform
A systematic evaluation framework for LLM applications addressing non-deterministic outputs. Provides semantic similarity matching using BERT embeddings, statistical drift detection, A/B comparison with effect size calculation, and a full observability stack. Deployed on AWS via Docker with 285 tests passing and a live demo at http://44.213.248.8:8501.

Highlights:

Business / Research Impact:
Traditional testing fails for LLMs because “correct” can be phrased a thousand ways. EvalOps provides the semantic understanding, statistical rigor, and continuous monitoring that production LLM applications require.

Tech Stack:

🔗 Live Demo: http://44.213.248.8:8501
📁 View Full Project


AutoDoc AI: Multi-Agent RAG System for Regulatory Documentation
A production-grade multi-agent system that automates regulatory documentation generation for credit risk models. Four specialized AI agents (Research, Writer, Compliance, Editor) collaborate through custom Python orchestration to generate SR 11-7 compliant documentation with 100% source fidelity and zero hallucinations. Presented to 200+ colleagues.

Highlights:

Business / Research Impact:
AutoDoc AI proves that regulated industries can adopt AI safely by designing for auditability first. The multi-agent architecture separates concerns (retrieval, generation, validation, editing) in ways that mirror human workflows while maintaining the traceability regulators require.

Tech Stack:

📁 View Full Project


ChurnGuard: Production MLOps for Customer Retention
An end-to-end MLOps pipeline demonstrating production-grade machine learning deployment. From XGBoost model training through Docker containerization to AWS EC2 deployment, ChurnGuard showcases the full lifecycle of taking a model from notebook to production with proper experiment tracking, model registry, and API serving.

Highlights:

Business / Research Impact:
ChurnGuard demonstrates that MLOps isn’t about tools but about discipline: experiment tracking prevents “which model was that?”, model registry enables rollback, containerization eliminates “works on my machine”, and proper deployment means the model actually reaches users.

Tech Stack:

📁 View Full Project


MCP Banking Workflows: Model Context Protocol Tools for Regulated Industries
A comprehensive Model Context Protocol (MCP) server implementing nine specialized tools for banking model risk management workflows. Built to address authentic pain points in SR 11-7 compliance, cross-file consistency validation, and model dependency mapping. Demonstrates how MCP can bring AI capabilities to domain-specific enterprise workflows.

Highlights:

Business / Research Impact:
MCP Banking Workflows shows how AI can augment (not replace) expert judgment in regulated industries. Model validators still make decisions but they spend less time on mechanical checks and more time on substantive analysis.

Tech Stack:

📁 View Full Project


CreditNLP: Fine-Tuned LLM for Startup Default Risk Prediction
A fine-tuned language model that identifies default risk signals in startup loan applications where traditional quantitative data is sparse. Using QLoRA on Mistral-7B, the model learns to detect implicit risk patterns in application narratives that experienced underwriters recognize intuitively but cannot codify into rules. Achieves 93.9% accuracy on parseable outputs compared to 60% for few-shot prompting.

Highlights:

Business / Research Impact:
CreditNLP proves that domain expertise can be encoded into model weights through labeled examples. The patterns experienced underwriters “feel” after thousands of applications can be learned by a 7B model in 41 minutes with the right training data.

Tech Stack:

📁 View Full Project


Fraud RT Sandbox: Real-Time Fraud Simulation & Detection
A sandbox environment to simulate and detect fraud in real time, combining streaming pipelines, hybrid detection logic, and dynamic responses—built to explore latency, model drift, decision rules, and system robustness under adversarial patterns.

Highlights:

Business / Research Impact:
This sandbox helps practitioners see how fraud detection systems behave under realistic pressure: how fast models decay, how rules must adapt, and how to balance responsiveness vs false alarms in live applications.

Tech Stack:

📁 View Full Project


Zero-Hallucination RAG Agent: Custom vs Pre-Built Tools
Built to answer queries about your portfolio, this project compares off-the-shelf RAG tools (like Flowise) against a custom LangChain architecture designed to prevent hallucination at the system level, separating metadata vs semantic paths, strict grounding, and validation.

Highlights:

Business / Research Impact:
In contexts where credibility is essential (e.g. portfolio Q&A, knowledge systems), a system design that prevents hallucination is far more valuable than one that sounds fluent but fabricates answers.

Tech Stack:

👉 View Full Project


Prompt Engineering Lab: From Zero-Shot to Production-Ready Systems
A structured deep dive into prompt engineering, evolving from naive zero-shot prompts to robust, production-grade systems by layering schema enforcement, validation, and retrieval augmentation, with iterative debugging via confusion matrices.

Highlights:

Business / Research Impact:
This lab shows how prompt engineering can mature from prototype to dependable system, critical for any AI product that needs consistency, trust, and governance.

Tech Stack:

📁 View Full Project


AI-in-the-Cloud Knowledge Shootout (Perplexity vs NotebookLM)
An experiment comparing two AI “knowledge copilots” — Perplexity and NotebookLM — across identical cloud architecture, cost, and governance prompts. The goal: see how each tool answers with varying levels of grounding, scope, and practical signal.

Highlights:

Business / Research Impact:
This shootout surfaces how knowledge tools differ in utility depending on use case — whether you want precise, source-aligned advice or agile, up-to-date pointers. For teams architecting cloud + AI systems, knowing which tool to lean on (or how to combine them) is as critical as choosing the cloud services themselves.

Tech Stack:

👉 View Full Project


RiskBench AI Coding Shootout (Code Agent / AI Coding Tools Comparison)
A controlled “shootout” comparing three AI coding assistants — GitHub Copilot, Claude Code, and Cursor — as they each build the same end-to-end ML pipeline. Across sequential sprints, they generate synthetic data, build and tune XGBoost models, and deploy a serving API with SHAP interpretability. The project held prompts, acceptance tests, and repo structure constant, so that differences reflect tool behavior, not environment.

Highlights:

Business / Research Impact:
This shootout surfaces how coding agents behave in context, not just in toy demos. It highlights that tool choice can influence data quality, modeling decisions, and the eventual readiness of a system. For teams exploring LLM-based development, this acts as both a benchmark and a blueprint for tool-risk awareness.

Tech Stack:

📁 View Full Project


Cross-Cloud AutoML Shootout
A benchmarking exploration comparing AWS AutoML, GCP Vertex AI, and BigQuery ML on the same dataset, revealing how each cloud’s constraints, quotas, and design philosophies shape real-world ML development.

Highlights:

Business Impact:
Cloud choices are part of the model. The shootout demonstrates that designing AI systems isn’t only about algorithms and datasets — it’s about navigating constraints, quotas, and trade-offs that directly affect deployment and business value.

Tech Stack:

👉 View Full Project


SignalGraph (PySpark + Postgres/Teradata + Prophet)

SignalGraph is a telecom-focused anomaly detection and forecasting project that processes large-scale 4G/5G performance data (latency, jitter, PRB utilization, packet loss) through a Spark ETL pipeline and delivers real-time network insights. It demonstrates modern data workflows—from feature engineering and anomaly flagging to forecasting and graph analytics; built for scale, transparency, and decision-making in telecom environments.

Highlights

📌 Business Impact: Helps telecom teams detect anomalies early, forecast degradation risk, and evaluate trade-offs in policy thresholds—improving service reliability and decision-making at network scale.

Tech Stack

📁 View Full Project


NetworkIQ — Incident Risk Monitor (“One Project, Three Platforms”)

NetworkIQ is a telecom-grade incident risk system that predicts network congestion and visualizes cell-site risk across three deployment platforms (Render, GCP Cloud Run, AWS on the roadmap). It showcases how AI-first system design can be made platform-agnostic, scalable, and portable; aligning with orchestration and enterprise deployment strategies.

Highlights

📌 Business Impact: NetworkIQ accelerates incident detection (reducing MTTD), supports better customer experience proxies (NPS), and lowers cost per GB—while enabling consistent, explainable AI across multiple clouds.

Tech Stack

👉 View Full Project


BNPL Credit Risk Insights Dashboard (Python + Streamlit)

A hands-on, end-to-end BNPL risk project that turns raw lending/repayment data into an interactive decision dashboard. It demonstrates modern risk workflows—from feature engineering and modeling to monitoring and “what-if” policy simulation—built for clarity, speed, and explainability.

Highlights

📌 Business Impact: Helps risk teams test policies before rollout, quantify approval vs. losses, and document governance-ready decisions.

🔗 View Full Project


Credit Risk Model Deployment & Monitoring (AWS + PySpark + CatBoost)

This flagship project showcases an end-to-end credit risk modeling pipeline — from scalable data processing to cloud deployment — aligned with best practices in financial services. Built using PySpark, CatBoost, SHAP, and AWS (S3, CLI), it simulates how modern risk pipelines are deployed and monitored at scale.

The full solution includes:

💼 Business Impact: This project simulates a realistic production-grade credit risk pipeline — bridging data engineering, ML modeling, and cloud deployment. It highlights how interpretability and geographic segmentation can inform policy, governance, and model recalibration.

📁 View Full Project


Telecom Churn Modeling & Retention Strategy

This project demonstrates how predictive modeling and customer segmentation can be used to drive retention strategy in a telecom context. Using a publicly available customer dataset, I developed a full churn risk pipeline.

The final solution integrates:

💡 Business Impact: The project enables strategic prioritization by identifying high-risk, high-value customers at risk of churn, supporting proactive retention efforts, revenue protection, and long-term profitability.

👉 View Full Project


Telecom Customer Segmentation with Python

Objective:
Developed a customer segmentation model using unsupervised learning on simulated postpaid telecom data to identify actionable behavioral clusters for marketing, retention, and product strategy.

Highlights:

Key Findings:

📌 Key Findings

Segment Description Strategy
💬 Voice-Dominant Users High voice & intl use,
short tenure
Add voice bundles,
retention plans
📱 High-Usage Streamers Heavy data/streaming,
higher churn
Promote unlimited/
entertainment perks
💸 Low-Value Starters Low usage,
low tenure
Grow via onboarding
& upselling
🧭 Loyal Minimalists Long tenure, low usage,
least churn
Reward loyalty,
protect margin

Tech Stack: Python, pandas, scikit-learn, matplotlib, seaborn
Core Skills Demonstrated: Customer analytics, unsupervised learning, PCA, strategic interpretation, stakeholder communication

👉 View Full Project


Customer Churn Predictor

Goal: Predict whether a telecom customer is likely to churn using an end-to-end machine learning pipeline.

Description:
This interactive app allows users to input customer features (e.g., tenure, contract type, monthly charges) and receive a real-time churn prediction. It includes data preprocessing, feature engineering, model training, cloud deployment, and live user interaction.

Screenshot:
Churn Prediction App Screenshot

⚙️ Tech Stack

Purpose Tool
Language Python 3
ML Library scikit-learn
Visualization seaborn, matplotlib
Data Handling pandas, NumPy
Deployment GitHub Pages

📶 Telecom Engagement Monitoring with Fractional Logistic Regression

This project builds a full monitoring pipeline to track postpaid customer engagement over time using simulated telecom data. The model uses fractional logistic regression to predict monthly engagement as a proportion and evaluates its stability across development and monitoring datasets.

👉 View Full Project Notebook


🧰 Tech Stack

Component Library / Tool
Modeling statsmodels (GLM - Binomial with Logit link)
Data Handling pandas, numpy
Evaluation Metrics sklearn.metrics
Stability Analysis Custom PSI logic
Visualization matplotlib

📌 Highlights & Findings


This project demonstrates how to proactively monitor engagement models using interpretable statistics and custom stability metrics, with outputs ready for integration into model governance workflows.

Fraud Detection with XGBoost & SHAP

A simulated end-to-end machine learning pipeline that predicts fraudulent transactions using XGBoost and interprets the model with SHAP values.

Objective

Detect fraudulent transactions using synthetic data with engineered features such as transaction type, amount, time, and customer behavior patterns.

Key Steps

⚙️ Tech Stack

Purpose Tool
Language Python
ML Library XGBoost, scikit-learn
Explainability SHAP
Data Simulation NumPy, pandas
Visualization matplotlib, seaborn
Deployment Local / GitHub

📈 Sample Output

📎 View on GitHub


Airline Flight Delay Prediction with Python

A full machine learning pipeline that predicts flight delays using simulated airline data enriched with real U.S. airport codes and weather features. The project explores exploratory analysis, model training, and practical recommendations for airport operations.

Objective

Predict whether a flight will be delayed based on features like carrier, origin, departure time, distance, and simulated weather patterns.

Key Steps

⚙️ Tech Stack

Purpose Tool
Language Python 3
ML Library scikit-learn
Visualization matplotlib, seaborn
Simulation NumPy, pandas
Mapping (EDA) Plotly, geopandas
Deployment GitHub Pages (Markdown)

📂 Read the Full Report

📎 View Full Project

🛠️ In Progress

🗺️ Geospatial Risk Dashboard (Tableau)

Building an interactive Tableau dashboard to visualize public health and economic risk indicators across Texas counties.

Will be added soon…


What’s Next


For more details, view my full portfolio homepage or connect via LinkedIn.