Lending Club Credit Risk — AWS ML Showcase (Governance + Cost Control, under $25)
This project is an end‑to‑end ML capability on AWS operating under tight cost and quota constraints. I emphasize governance, reproducibility, leakage control, and cost‑aware training using a real credit‑risk problem.
Executive Story
We built a leak‑safe application‑time credit default model with SageMaker XGBoost 1.7‑1 trained on Managed Spot. Because quotas/UI hid parts of the console, we registered the model programmatically (Model Registry via SDK) and attached a metrics JSON in S3. We validated inference from the model.tar.gz rather than leaving an endpoint running, keeping idle cost at $0.
Validation metrics (n=176): AUC ≈ 0.844
, KS ≈ 0.604
, PR-AUC ≈ 0.617
, F1@Top20% ≈ 0.531
Training billable time: ~57 s (135 s total) — ~58% Spot savings
Artifacts: model tarball in S3 + metrics JSON + Approved ModelPackage (ARN)
This project demonstrates mastery of AWS and ML: thoughtful feature governance, cost management, SDK‑first reliability when the UI gets in the way, and clean teardown.
Architecture (minimal, production‑minded)
S3 (raw → curated) SageMaker Studio (Notebook)
│ │
├── data/hpo/train_v_app.csv ──────▶│ Feature screening (AUC/KS/|corr|) → whitelist (15 numeric features)
├── data/hpo/validation_v_app.csv │
│ │
└── model-metrics/app_metrics.json◀─┘ Exported evaluation (AUC, KS, PR-AUC, F1@Top20)
SageMaker Training (Managed Spot, built-in XGBoost 1.7-1)
└── output/model.tar.gz → S3
SageMaker Model Registry (SDK)
└── ModelPackageGroup: credit-risk-xgb-app
└── Version 1 (Approved) with ModelQuality metrics
Why this design? It shows cost‑aware training, governance (leakage control + registry), and reproducibility without always‑on endpoints.
What to Showcase (and why it matters)
1) Data layout & governance
- Curated CSVs in S3 (
data/hpo/train_v_app.csv
,validation_v_app.csv
). - Separate
model-metrics/app_metrics.json
used as ModelQuality evidence in the registry.
Screenshot placeholder
Explain in caption: proves disciplined storage, clear separation of data and evaluation artifacts.
2) Leakage-aware feature selection (application-time whitelist)
-
We computed **univariate AUC/KS and corr , removed leaky/high‑risk fields, and exported a **whitelist of 15 numeric features available at application time. - This is classic governance: features are safe to use before outcomes are known.
3) Cost-aware SageMaker training on Spot
- Trained with built-in XGBoost 1.7-1 on Managed Spot for ~58% savings.
- No endpoints were left running; batch inference validated locally from the
model.tar.gz
.
Screenshot placeholder
4) Governance via Model Registry (SDK-first)
- Created a ModelPackageGroup and an Approved ModelPackage via SDK.
- Attached our S3 metrics JSON as
ModelQuality.Statistics
. - UI variability/quota limits were handled gracefully: programmatic control > manual clicks.
Screenshot placeholder
Caption: proves there is a registered package with metrics evidence even when the console menu is hidden.
5) Honest evaluation & calibration
- Reported AUC/KS/PR-AUC/F1@Top20% on holdout validation (n=176).
- Calibrated probabilities; thresholds chosen for business targeting (Top20%).
6) Spend governance and teardown
- AWS Budgets configured; stayed within a tight cap.
- Teardown checklist: no endpoints, no Batch jobs, no warm pools, idle kernels shut down.
Results (Validation)
- AUC: 0.844
- KS: 0.604
- PR-AUC: 0.617
- F1@Top20%: 0.531
Interpretation: With strong leakage controls and a small dataset, the calibrated XGBoost is trustworthy and deployable. The governance trail (ModelPackage + metrics) makes it review‑ready.
Reproducibility: register the package programmatically
Works even if the console hides the “Model Registry” menu.
import boto3, sagemaker
from sagemaker import image_uris
region = sagemaker.Session().boto_region_name
bucket = "credit-risk-flagship-dev"
mpg = "credit-risk-xgb-app"
image = image_uris.retrieve("xgboost", region=region, version="1.7-1")
artifact_s3 = "s3://credit-risk-flagship-dev/output/calibration_app/sagemaker-xgboost-2025-09-22-22-54-33-406/output/model.tar.gz"
metrics_s3 = "s3://credit-risk-flagship-dev/model-metrics/app_metrics.json"
sm = boto3.client("sagemaker")
try:
sm.create_model_package_group(
ModelPackageGroupName=mpg,
ModelPackageGroupDescription="App-safe 15-feature XGBoost; leakage-controlled; Spot-trained."
)
except sm.exceptions.ValidationException:
pass
resp = sm.create_model_package(
ModelPackageGroupName=mpg,
ModelApprovalStatus="Approved",
ModelPackageDescription="Application-time feature whitelist; validation AUC≈0.844, KS≈0.604.",
InferenceSpecification={
"Containers":[{"Image": image, "ModelDataUrl": artifact_s3}],
"SupportedContentTypes":["text/csv"],
"SupportedResponseMIMETypes":["text/csv","application/json"]
},
ModelMetrics={
"ModelQuality":{"Statistics":{"ContentType":"application/json","S3Uri": metrics_s3}}
}
)
print("MODEL PACKAGE ARN:", resp["ModelPackageArn"])
Describe it later (proof):
import boto3
sm=boto3.client("sagemaker")
arn="arn:aws:sagemaker:us-east-1:678804053923:model-package/credit-risk-xgb-app/1" # replace if re-created
desc=sm.describe_model_package(ModelPackageName=arn)
print(desc["ModelApprovalStatus"])
print(desc["InferenceSpecification"]["Containers"][0]["Image"])
print(desc["InferenceSpecification"]["Containers"][0]["ModelDataUrl"])
print(desc.get("ModelMetrics",{}))
Cost & Governance Controls (at-a-glance)
- IAM: single execution role; least privilege + Registry API access.
- Managed Spot training: ~58% savings; smallest practical instance.
- Zero idle cost: no endpoints left up; batch validated from tarball.
- Budgets: alert configured; stayed inside a ~$25 envelope.
- Tags: package/group tagged for Project/Stage/Governance/Cost.
Key AWS Lessons (turning constraints into strengths)
- UI variability & quotas are common in real accounts; SDK‑first workflows keep you moving and leave a better audit trail.
- Leakage control is not optional in credit risk; feature availability timing is part of MLOps governance.
- Costs accumulate at the edges (idle endpoints, forgotten jobs). A teardown checklist is part of “done.”
Next Steps (low-cost polish)
- Add a Serverless Inference one‑shot demo (invoke once, screenshot latency, delete).
- Wrap training+registry into a single
train_register.py
and wire to CI. - Add a monthly drift/KS check as a SageMaker Processing job (trigger on demand).