Customer Churn Prediction App (Deployed on Render)

A customer churn risk can quietly erode business value, so this project builds a real-time prediction engine designed to surface risk before it materializes. It constructs and preprocesses realistic telecom-style churn data using ColumnTransformer, trains a RandomForestClassifier, and packages both the model and preprocessing steps using joblib. The user interacts via an intuitive Streamlit interface that signals churn likelihood in real time. Hosted serverlessly on Render, the app bridges data science with operational readiness.

🔗 Live App: https://churn-prediction-app-dxft.onrender.com

App Preview

Churn Prediction App Screenshot

Project Overview

This app:

Trains a RandomForestClassifier to predict churn
Encodes/preprocesses input features using ColumnTransformer
Uses joblib to save/load model artifacts
Provides a user-friendly interface using Streamlit
Is deployed serverlessly using Render

✅ Step 1: Generate Simulated Churn Data (`data/generate_data.py`)

I’ll simulate a telecom dataset with realistic churn behavior.

Highlights:

1000 samples with features like tenure, charges, contract type
Binary churn outcome (Yes / No)
Noise-injected churn probabilities
CSV output: data/telco_churn.csv

Concepts:

Simulated structured data with dependencies
Controlled randomness
Binary classification labels

Code

import pandas as pd
import numpy as np

np.random.seed(42)
n = 1000

gender = np.random.choice(['Male', 'Female'], size=n)
senior_citizen = np.random.choice([0, 1], size=n, p=[0.85, 0.15])
partner = np.random.choice(['Yes', 'No'], size=n)
dependents = np.random.choice(['Yes', 'No'], size=n)
contract = np.random.choice(['Month-to-month', 'One year', 'Two year'], size=n)
payment_method = np.random.choice(['Electronic check', 'Mailed check', 'Bank transfer', 'Credit card'], size=n)

tenure = np.random.randint(0, 72, size=n)
monthly_charges = np.round(np.random.normal(loc=70, scale=20, size=n), 2)
monthly_charges = np.clip(monthly_charges, 20, 130)
total_charges = tenure * monthly_charges

churn_prob = (
    0.3 * (contract == 'Month-to-month').astype(int) +
    0.2 * (monthly_charges > 80).astype(int) +
    0.1 * (senior_citizen == 1).astype(int)
)
churn_prob = np.clip(churn_prob + np.random.normal(0, 0.1, n), 0, 1)
churn = np.where(churn_prob > 0.5, 'Yes', 'No')

df = pd.DataFrame({
    'gender': gender,
    'SeniorCitizen': senior_citizen,
    'Partner': partner,
    'Dependents': dependents,
    'tenure': tenure,
    'MonthlyCharges': monthly_charges,
    'TotalCharges': total_charges,
    'Contract': contract,
    'PaymentMethod': payment_method,
    'Churn': churn
})

df.to_csv('data/telco_churn.csv', index=False)

# Quick sanity check on churn distribution
print("Churn distribution:", df['Churn'].value_counts(normalize=True).round(2))

Model Training and Preprocessing (Python)

1. Load and Preprocess Data

Concepts:

Feature pipelines using scikit-learn
Avoiding data leakage by fitting only on training data

Code

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer

def load_data(path):
    df = pd.read_csv(path)
    return df

def preprocess_and_split(df):
    X = df.drop('Churn', axis=1)
    y = df['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)

    categorical = X.select_dtypes(include='object').columns.tolist()
    numerical = X.select_dtypes(include=['int64', 'float64']).columns.tolist()

    preprocessor = ColumnTransformer(transformers=[
        ('num', StandardScaler(), numerical),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical)
    ])

    X_transformed = preprocessor.fit_transform(X)
    X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.2, random_state=42)
    return X_train, X_test, y_train, y_test, preprocessor

2. Train and Save the Model

I’ll train a RandomForestClassifier on the transformed data and save it.

Code

from sklearn.ensemble import RandomForestClassifier
import joblib

def train_and_save_model(X_train, y_train, preprocessor):
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    joblib.dump(model, 'model/churn_model.pkl')
    joblib.dump(preprocessor, 'model/preprocessor.pkl')

    return model

The target Churn is binary encoded.
Final model and preprocessor are saved as .pkl files for use in the web app.

3. Execute Training Script

if __name__ == "__main__":
    df = load_data("data/telco_churn.csv")
    X_train, X_test, y_train, y_test, preprocessor = preprocess_and_split(df)
    train_and_save_model(X_train, y_train, preprocessor)

🧾 Decision Notes

I chose a RandomForestClassifier for its interpretability and robustness on synthetic churn data. Streamlit was selected for its speed in building prototypes, and Render for seamless, cost-effective cloud deployment without complex infrastructure.

4: Streamlit App for Render

Code

import streamlit as st
import pandas as pd
import numpy as np
import joblib

model = joblib.load("model/churn_model.pkl")
preprocessor = joblib.load("model/preprocessor.pkl")

st.title("📉 Customer Churn Predictor")
st.markdown("Enter customer details below to predict the likelihood of churn.")

gender = st.selectbox("Gender", ["Male", "Female"])
senior = st.selectbox("Senior Citizen", [0, 1])
partner = st.selectbox("Has a Partner?", ["Yes", "No"])
dependents = st.selectbox("Has Dependents?", ["Yes", "No"])
tenure = st.slider("Tenure (months)", 0, 72, 12)
monthly_charges = st.slider("Monthly Charges", 20, 130, 70)
contract = st.selectbox("Contract Type", ["Month-to-month", "One year", "Two year"])
payment_method = st.selectbox("Payment Method", ["Electronic check", "Mailed check", "Bank transfer", "Credit card"])

total_charges = tenure * monthly_charges

if st.button("Predict Churn"):
    input_df = pd.DataFrame([{
        "gender": gender,
        "SeniorCitizen": senior,
        "Partner": partner,
        "Dependents": dependents,
        "tenure": tenure,
        "MonthlyCharges": monthly_charges,
        "TotalCharges": total_charges,
        "Contract": contract,
        "PaymentMethod": payment_method
    }])

    X_input = preprocessor.transform(input_df)
    prediction = model.predict(X_input)[0]
    probability = model.predict_proba(X_input)[0][1]

    label = "🚫 Will Not Churn" if prediction == 0 else "⚠️ Will Churn"
    st.subheader(f"Prediction: {label}")
    st.write(f"Churn Probability: **{probability:.2%}**")

⚙️ Local Setup

✅ 1. Clone the Repo

git clone https://github.com/pmcavallo/churn-prediction-app.git
cd churn-prediction-app

✅ 2. Install Dependencies

pip install -r requirements.txt

📄 Requirements (`requirements.txt`)

streamlit
pandas
numpy
scikit-learn
joblib

✅ 3. Train the Model

python model/train_model.py

This saves:

model/churn_model.pkl
model/preprocessor.pkl

✅ 4. Launch the Streamlit App

streamlit run app/app.py

🌐 Deploying to Render

Render is a free serverless platform that supports Python + Streamlit.

✅ Setup Steps

Push all files to my GitHub repo
Go to https://render.com
Click “New Web Service”
Connect the GitHub repo
Configure the following:
- Environment: Python
- Build Command: pip install -r requirements.txt
- Start Command: streamlit run app/app.py --server.port $PORT
Done! 🎉 The app is live.

📌 Add a render.yaml like this to automate config:

services:
  - type: web
    name: churn-prediction-app
    env: python
    plan: free
    buildCommand: pip install -r requirements.txt
    startCommand: streamlit run app/app.py --server.port $PORT
    autoDeploy: true

Considered Alternatives:

Logistic Regression: simpler but sacrificed predictive accuracy.
Flask/Dash: more boilerplate; Streamlit offered quicker iteration.
AWS/Heroku: suitable but rendered deployment heavier; Render provided instant streaming of updates.

Tech Stack

Purpose	Tool
Language	Python 3
ML Library	scikit-learn
Web UI	Streamlit
Deployment	Render
Model Storage	joblib
Dataset	Simulated Telco Churn

Paulo Cavallo, PhD

Customer Churn Prediction App (Deployed on Render)

App Preview

Project Overview

✅ Step 1: Generate Simulated Churn Data (`data/generate_data.py`)

Highlights:

Concepts:

Code

Model Training and Preprocessing (Python)

1. Load and Preprocess Data

Concepts:

Code

2. Train and Save the Model

Code

3. Execute Training Script

🧾 Decision Notes

4: Streamlit App for Render

Code

⚙️ Local Setup

✅ 1. Clone the Repo

✅ 2. Install Dependencies

📄 Requirements (`requirements.txt`)

✅ 3. Train the Model

✅ 4. Launch the Streamlit App

🌐 Deploying to Render

✅ Setup Steps

Tech Stack

Author

📄 License

This project is available under the MIT License.

Customer Churn Prediction App (Deployed on Render)

App Preview

Project Overview

✅ Step 1: Generate Simulated Churn Data (data/generate_data.py)

Highlights:

Concepts:

Code

Model Training and Preprocessing (Python)

1. Load and Preprocess Data

Concepts:

Code

2. Train and Save the Model

Code

3. Execute Training Script

🧾 Decision Notes

4: Streamlit App for Render

Code

⚙️ Local Setup

✅ 1. Clone the Repo

✅ 2. Install Dependencies

📄 Requirements (requirements.txt)

✅ 3. Train the Model

✅ 4. Launch the Streamlit App

🌐 Deploying to Render

✅ Setup Steps

Tech Stack

Author

📄 License

This project is available under the MIT License.

✅ Step 1: Generate Simulated Churn Data (`data/generate_data.py`)

📄 Requirements (`requirements.txt`)