NetworkIQ - Incident Risk Monitor (Render, Google Cloud, AWS)

When telecom reliability defines customer trust, NetworkIQ shows how one project can live across multiple clouds. NetworkIQ predicts congestion and visualizes incidents on Render, GCP Cloud Run, and AWS, completing the One Project, Three Clouds vision. Built with PySpark preprocessing, XGBoost prediction, and Streamlit dashboards, NetworkIQ demonstrates that portability, scalability, and explainability can be baked into a single AI-first system, no matter the platform.

Live App: GCP Deployment โ€“ Render Deployment โ€“ AWS Deployment


๐Ÿ”น Project Overview

NetworkIQ is a telecom-aligned MVP that transforms network telemetry into faster incident detection (MTTDโ†“), better customer experience (NPS proxyโ†‘), and leaner cost per GB.
It demonstrates how AI-first system design can turn raw performance data into actionable insights for network operators. The dataset used in NetworkIQ is synthetic and was generated to mimic realistic telecom KPIs such as signal strength (RSRP, RSRQ, SINR), throughput, latency, jitter, and drop rates. Values were modeled on publicly available ranges from industry specifications and research datasets, with synthetic generation used to create plausible correlations (e.g., poor SINR leading to lower throughput and higher drop rates).


๐Ÿ”น Why This Matters

  • Detect Incidents Earlier (MTTDโ†“): Spot congestion and outages from KPI anomalies.
  • Improve Customer Experience (NPS proxyโ†‘): Reduce dropped/slow sessions and clearly communicate impact.
  • Optimize Cost (Cost/GBโ†“): Enable smarter capacity planning and parameter tuning.

๐Ÿ”น Tech Stack & Architecture

Architecture:

CSV โ†’ PySpark (ETL) โ†’ Parquet โ†’ XGBoost (prediction) โ†’ Streamlit Dashboard โ†’ Multi-cloud Deployment (Render + GCP + AWS roadmap).

  • Data Pipeline: PySpark CSV โ†’ Parquet ingestion
  • Modeling: Logistic Regression, Random Forest, XGBoost (selected as best-performing)
  • Dashboard: Streamlit app deployed multi-cloud
  • Visualization: Interactive map overlays predictions with intuitive cell-site visuals
  • CI/CD: GitHub Actions workflow for GCP Cloud Run deployment
  • Secrets: Managed securely via Google Secret Manager

๐Ÿ”น EDA & Key KPIs

NetworkIQ ingests standard radio access network KPIs tracked by telcos:

  • Throughput (Mbps): Data delivery performance.
  • Latency (ms): Network responsiveness.
  • Packet Loss (%): Connection stability.
  • Dropped Session Rate: Customer experience proxy.

NetworkIQ

EDA Example (PySpark Snippet):

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("NetworkIQ").getOrCreate()

df = spark.read.csv("data/raw/sample_cells.csv", header=True, inferSchema=True)
df = df.withColumnRenamed("cell_id", "cell").withColumn("throughput_mbps", df["bytes"]/df["duration"])

df.show(5)

๐Ÿ”น Model Development

Model Performance

Model AUC KS Notes
Logistic Regression 0.74 0.28 Interpretable but weaker baseline
Random Forest 0.81 0.36 Higher complexity, moderate gains
XGBoost 0.86 0.42 Best performance, robust & scalable

๐Ÿ‘‰ XGBoost outperformed alternatives, identifying up to 20% more high-risk accounts at the same approval rate.

Training Example (Python):

from xgboost import XGBClassifier
from sklearn.metrics import roc_auc_score

xgb = XGBClassifier(n_estimators=200, max_depth=6, learning_rate=0.1, random_state=42)
xgb.fit(X_train, y_train)
preds = xgb.predict_proba(X_test)[:,1]

auc = roc_auc_score(y_test, preds)
print("AUC:", auc)

NetworkIQ


๐Ÿ”น AI Interpretation (Executive Briefings)

NetworkIQ includes an AI Interpretation module powered by LLMs (Gemini API). With a single click, it generates:

  • Executive Summaries โ€” highlights network-wide issues and trends.
  • Actionable Recommendations โ€” suggests where intervention should be prioritized.
  • Per-Cell Explanations โ€” natural language explanations for why each site is at risk.

This ensures both technical and non-technical stakeholders can understand the model outputs without digging into raw metrics.
In production, this feature would run automatically as filters change, but in our demo we keep it manual to manage token usage.

NetworkIQ

How it works:
I integrated the Gemini API into the Streamlit app through a lightweight wrapper. When a user clicks Generate AI Briefing, the app sends a structured prompt containing the filtered network metrics (latency, throughput, drop rate, predicted risk) for the selected cells. Gemini then returns a natural-language summary, which I format into an executive briefing, action recommendations, or per-cell explanations depending on the userโ€™s choice.

To keep the demo efficient and cost-aware, the feature is set to manual execution rather than auto-refresh. In a production system, this workflow would run automatically as filters change, ensuring stakeholders always have up-to-date AI-generated insights.


๐Ÿ”น Interactive Risk Map

The Predicted Risk Map overlays model outputs onto cell-site locations:

  • Circle size = risk magnitude (larger means higher predicted probability).
  • Color = risk level (amber โ†’ red as risk increases).
  • Hover tooltips display cell ID and predicted probability.

This visualization makes it easy to spot geographic clusters of risk and prioritize field resources.

NetworkIQ


๐Ÿ”น Deployment & Validation

โœ… Multi-Cloud Deployment: Live dashboards on GCP and Render
โœ… Predictive Engine: XGBoost congestion prediction integrated into dashboard
โœ… AI Integration: Translates predictions into natural-language insights for non-technical users
โœ… Industry Validation: Demoed to a telecom professional, who highlighted predictive accuracy, intuitive mapping, and accessibility

CI/CD Workflow (Excerpt):

name: Deploy to Cloud Run

on:
  workflow_dispatch:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Authenticate
      uses: google-github-actions/auth@v1
      with:
        credentials_json: $
    - name: Deploy
      run: gcloud run deploy networkiq --source . --region us-central1 --allow-unauthenticated

๐Ÿ”น Responsible AI & Monitoring

  • Explainability: SHAP values for feature contribution.
  • Stability: PSI to monitor population drift.
  • Transparency: Model card skeleton included in /docs.
  • Ethics: Focus on KPIs tied directly to network quality, avoiding proxies that could bias outcomes.

๐Ÿ”น Roadmap

  • AWS App Runner deployment to complete the โ€œOne Project, Three Cloudsโ€ strategy.
  • Feedback Loops for adaptive retraining.
  • Advanced Forecasting (ARIMA/Prophet baselines).
  • Expanded KPI Coverage for richer incident monitoring.

๐Ÿ”น Lessons Learned

  • Spark Version Mismatch: Fixed by aligning PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to Python 3.10 environment.
  • API Latency: Optimized Streamlit queries to reduce dashboard lag.
  • Secrets Management: Ensured API keys were never exposed in code or logs.

How I Deployed NetworkIQ on AWS (Free Tier)

Goal: make the app easy to reach on the public internet while keeping cost โ‰ˆ $0 on a new AWS account.

What I deployed

  • The Streamlit app is packaged in a Docker image (see Dockerfile).
  • I launched one free-tier EC2 instance (Amazon Linux 2023, t2.micro).
  • I ran the container so the site is reachable on port 80 (HTTP).

Simple architecture

  1. EC2 (t3.micro) โ€” tiny virtual machine in AWS.
  2. Docker โ€” runs the app the same way everywhere.
  3. Streamlit โ€” serves the UI.
  4. Security group โ€” allows web traffic in.

Live Demo (AWS)

AWS (EC2) Live Demo

The EC2 instance serves the Dockerized Streamlit app on HTTP 80 (host 80 โ†’ container 8080).
Note: The demo may be offline outside demo hours and will be retired at the end of my AWS Free Tier window.

๐Ÿ”น License

MIT ยฉ 2025 Paulo Cavallo.

Written on August 22, 2025