AI-in-the-Cloud Knowledge Shootout (Perplexity and NotebookLM)

Perplexity vs NotebookLM, as a continuation of the Cross-Cloud Shootout series. This project set out to test whether AI copilots could act as cloud knowledge orchestrators, producing reliable guidance on architecture, cost, and governance. Instead of benchmarking AWS and GCP directly, the experiment compared how each tool answered the same six cloud prompts. NotebookLM was tied to a curated corpus of AWS/GCP docs and my Cross-Cloud behaviour. Perplexity searched the open web in real time. The shootout revealed two complementary roles: NotebookLM excels at structured, policy-level synthesis, while Perplexity delivers concise, actionable answers.

RiskBench AI Coding Shootout (Claude Code, Cursor, Github Copilot)

This project set out to pit three leading AI coding assistants (GitHub Copilot, Claude Code, and Cursor) against each other in a controlled “shootout,” with each tool tasked to build out the same end-to-end machine learning pipeline. Across four sprints, the tools generated synthetic datasets, trained and tuned XGBoost models, explored data quality and feature engineering, and ultimately deployed a serving API with SHAP-based interpretability. By holding the repo, prompts, and acceptance tests constant, the project revealed not just raw coding differences, but how each tool shapes data quality, model credibility, and the path to a production-ready ML system.

Cross-Cloud AutoML Shootout - Lessons from AWS, GCP, and BigQuery

When I kicked off the Cross-Cloud AutoML Shootout, the idea was simple: put AWS and GCP side by side, train on the same dataset, and see which delivered the better model with less friction. What started as a straightforward benchmark quickly turned into something bigger, a case study in how different cloud philosophies shape the experience of doing machine learning. Just like in banking, where model development often collides with regulatory guardrails, this project revealed how quotas, hidden constraints, and pricing structures can be as important as the algorithms themselves.

SignalGraph 5G - Anomaly Detection & Forecasts (PySpark + Postgres/Teradata + Prophet)

SignalGraph 5G is a demo system that ingests synthetic 4G/5G KPI data, processes it through a Spark-based lakehouse pipeline, and exposes an analyst-friendly UI in Streamlit. The project was designed for anomaly detection, large-scale data engineering, data warehouse/lakehouse integration, and applied ML/forecasting in the network domain. It is deployed as a live Streamlit web app on Render, connected to a Neon Postgres warehouse.

NetworkIQ - Incident Risk Monitor (Render, Google Cloud, AWS)

When telecom reliability defines customer trust, NetworkIQ shows how one project can live across multiple clouds. NetworkIQ predicts congestion and visualizes incidents on Render, GCP Cloud Run, and AWS, completing the One Project, Three Clouds vision. Built with PySpark preprocessing, XGBoost prediction, and Streamlit dashboards, NetworkIQ demonstrates that portability, scalability, and explainability can be baked into a single AI-first system, no matter the platform.

AI-Augmented BNPL Risk Dashboard with Intelligent Override System (Scikit-learn/XGBoost, Streamlit, Render)

In the fast-growing Buy Now Pay Later market, consumers face hidden risks from fragmented credit visibility and rapid lending decisions that can spiral into unmanageable debt. This project tackles the problem by providing a Streamlit-based dashboard with real-time monitoring, anomaly detection, policy simulations, and an intelligent override system that allows immediate intervention when risk thresholds are breached. The result is a tool that balances speed with safety, giving risk teams clear insights, actionable controls, and the confidence to manage BNPL risk responsibly in a space where regulation has not yet caught up.

Credit Risk Model Deployment & Monitoring (AWS + PySpark + CatBoost)

In a world where credit decisions must stay reliable, explainable, and scalable, this project addresses the challenge of deploying a credit risk model in a cloud-native, data-intensive environment. It builds a synthetic telecom-inspired credit dataset, then uses PySpark for scalable preprocessing, CatBoost for powerful categorical modeling, Amazon S3 for seamless cloud storage, and SHAP for insight into feature impact. The result is a scalable, explainable pipeline that supports segment-level business insights and aligns with real-world credit risk workflows, giving teams confidence in automation and clarity in decision-making for postpaid lending.

Telecom Churn Modeling & Retention Strategy

Customer churn erodes revenue and undermines growth in competitive telecom markets, and preventing it requires early and reliable signals. This project delivers a complete churn modeling pipeline that combines Python, Pandas, scikit-learn, and XGBoost to predict at-risk customers, SHAP for clear interpretability, and CLTV simulations to quantify revenue exposure. It also incorporates model monitoring through Population Stability Index and customer segmentation to guide retention strategies. The outcome is a system that not only predicts churn but also explains it, monitors its stability, and translates insights into actionable business decisions.

Customer Segmentation Using Statistical Clustering

Telecom companies must understand who their customers are to tailor marketing and retention strategies effectively. This project simulates a diverse base of 5,000 postpaid customers, then applies preprocessing and K-Means clustering to reveal four distinct personas. Visualization through PCA aids interpretation, while segment profiling by usage patterns, tenure, churn risk, and payment behavior, drives targeted strategic actions. The result is an operationally intuitive segmentation model that supports personalization, retention, and plan design using a realistic, scalable methodology.

Agentic AI, Natural Language Email & Calendar Assistant (LangChain + Streamlit + ChatGPT + Google API)

This project demonstrates the implementation of an AI-augmented assistant built with Streamlit and powered by LangChain. It connects to Gmail and Google Calendar via OAuth, interprets natural language commands using an LLM agent, and executes intelligent actions such as sending emails or retrieving upcoming events.

Telecom Engagement Monitoring using Fractional Logistic Regression

This project implements a fractional logistic regression monitoring pipeline for tracking customer engagement in a telecom environment. It simulates realistic development and monitoring datasets to evaluate how well the model generalizes over time using key metrics such as RMSE, MAE, PSI, and calibration curves.

Customer Churn Prediction App (Deployed on Render)

A customer churn risk can quietly erode business value, so this project builds a real-time prediction engine designed to surface risk before it materializes. It constructs and preprocesses realistic telecom-style churn data using ColumnTransformer, trains a RandomForestClassifier, and packages both the model and preprocessing steps using joblib. The user interacts via an intuitive Streamlit interface that signals churn likelihood in real time. Hosted serverlessly on Render, the app bridges data science with operational readiness.

Credit Bureau Sandbox; Governance Gate & Dashboard Hook (AWS + Tableau)

This project demonstrates hands-on work with bureau-style sandbox data, a credit reporting dashboard hook, and an AWS-based governance gate for model productionization. The repo demonstrates them in a lightweight, auditable way, without committing secrets or spinning heavy compute. It gives concrete artifacts, one-liners to reproduce behavior, and clear pointers to files.

Fraud Detection with XGBoost and scikit-learn

This project demonstrates a full machine learning workflow for detecting fraudulent transactions using simulated data, with XGBoost, SMOTE for class imbalance, RandomizedSearchCV for hyperparameter tuning, and threshold optimization to improve performance.

Airline Flight Delay Prediction with Python

This project aims to predict whether a flight will be significantly delayed (15+ minutes) using flight metadata, weather, and carrier information. Understanding delay drivers is essential for airlines and airports to improve operations and passenger experience.

Analyzing A/B Test Impact on Marketplace Conversions with Uplift Modeling

This project simulates and analyzes an A/B pricing test in a marketplace context. Using Python, I simulate customer behavior, estimate the causal impact of a price change on conversion rates, and apply uplift modeling to identify heterogeneous treatment effects across cities. The project demonstrates key skills in experimental design, causal inference, uplift modeling, and data visualization.

Forecasting Monthly Insurance Claim Payouts Using ARIMA

This project simulates realistic monthly insurance claim payouts over a 5-year period (2020–2024), applies ARIMA modeling for time series forecasting, and evaluates model performance on a simulated 2025 out-of-sample dataset.

Telecom Customer Churn Prediction with Python

This project focuses on predicting customer churn in the telecommunications industry. Customer churn occurs when a user stops using a company’s services. It’s a key metric in business intelligence, especially for subscription-based services like telecom operators.

Data Visualization and Animation with R

This example visualizes data with ggplot2 in many different ways and employs animation techniques with gganimation.

Spatial Data With ArcMap and R

This example uses geocoded (ArcMap) foreign direct investment (FDI) data to analyze and control for spatial autocorrelation in R:

Webscraping with R

This example scrapes web data and cleans it using R’s rvest and Tidyverse. Here we will scrape the Wikipedia data table list of countries by external debt.

Twitter Data

This example scrapes Twitter data, visualizes it, and looks at some descriptive information:

Paulo Cavallo, PhD

AI-in-the-Cloud Knowledge Shootout (Perplexity and NotebookLM)

RiskBench AI Coding Shootout (Claude Code, Cursor, Github Copilot)

Cross-Cloud AutoML Shootout - Lessons from AWS, GCP, and BigQuery

SignalGraph 5G - Anomaly Detection & Forecasts (PySpark + Postgres/Teradata + Prophet)

NetworkIQ - Incident Risk Monitor (Render, Google Cloud, AWS)

AI-Augmented BNPL Risk Dashboard with Intelligent Override System (Scikit-learn/XGBoost, Streamlit, Render)

Credit Risk Model Deployment & Monitoring (AWS + PySpark + CatBoost)

Telecom Churn Modeling & Retention Strategy

Customer Segmentation Using Statistical Clustering

Agentic AI, Natural Language Email & Calendar Assistant (LangChain + Streamlit + ChatGPT + Google API)

Telecom Engagement Monitoring using Fractional Logistic Regression

Customer Churn Prediction App (Deployed on Render)

Credit Bureau Sandbox; Governance Gate & Dashboard Hook (AWS + Tableau)

Fraud Detection with XGBoost and scikit-learn

Airline Flight Delay Prediction with Python

Analyzing A/B Test Impact on Marketplace Conversions with Uplift Modeling

Forecasting Monthly Insurance Claim Payouts Using ARIMA

Telecom Customer Churn Prediction with Python

Data Visualization and Animation with R

Spatial Data With ArcMap and R

Webscraping with R

Twitter Data

Machine Learning

Code of Ethics