LoanCreditRisk

This project asks a blunt question: if we had to overhaul a consumer lending stack today using only the LendingClub public book, what truly moves default risk, how far can lightweight models push discriminatory power out-of-time, and what policy shifts would actually change business outcomes? The work centers on 36‑month loans issued before mid-2017 (about 1.06M records, 14.6% default after censoring to resolved statuses), treating default vs fully paid as the target and stripping away unfinished loans to avoid naïve underreporting of risk.

Data, curation, and modeling stance

Raw Kaggle CSVs were cleaned into Parquet with type fixes (percent strings to floats, dates parsed, high-missing/leaky columns dropped). Only ex-ante variables were kept: loan size and term, affordability (DTI, income), credit quality (FICO bands, inquiries, delinquencies), utilization/depth (revolving balance, open-to-buy, months since oldest tradeline), and lightly scrubbed text fields (loan and employment titles). Time-aware splits reserve the most recent months for validation/test to mimic production drift. Two lean baselines were trained: ElasticNet logistic regression with standardized numeric inputs and sparse encodings, and XGBoost with minimal scaling and capped text cardinality. Out-of-time AUC, quarterly stability, and calibration were the main yardsticks; interpretability leaned on SHAP and logistic coefficients.

What the data says

Risk climbs monotonically across the usual suspects: top-decile FICO loans default near 6% while the bottom decile approaches 20%; DTI and revolving utilization follow a similar 11→20% climb. Purpose and housing matter more than the headline suggests: small_business (~22%) and medical (~17%) run hotter than debt_consolidation, and renters default ~5 p.p. more than mortgage holders. Amounts cluster around $10k (p50) with a long tail; incomes center near $62k, making log scaling practical. Text is mostly present, enabling simple embeddings without heroic NLP.

Model results and behavior

Despite the simplicity, the baselines work: the ElasticNet model lands at validation AUC 0.6738, and XGBoost adds a steady but meaningful lift to 0.6873 even without hyperparameter tuning. Quarterly AUC curves stay flat enough to trust; XGB holds a ~0.01 edge across cohorts. SHAP and coefficients tell a consistent story—affordability (DTI), credit quality (FICO), utilization, and credit depth/recency dominate. Calibration is already decent in low–mid risk bands; Venn–Abers nudges probabilities into tighter alignment.

Policy lens and economic signal

The historical grade/subgrade ladder is internally coherent: higher rates track higher default, almost linearly. The challenger scores reshuffle risk within each grade, improving lift and KS on the approved population. Once we make pricing burden explicit, the perceived gap between LR and XGB compresses, hinting that some “lift” is really repricing rather than newfound risk signal. Segment-level error checks surface intuitive hot spots (large requests, high utilization, thinner credit histories) without revealing brittle drivers, which argues for safe shadow deployment.

Constraints and uncertainties

Selection bias looms large: outcomes reflect historic approvals and prices, not randomized experiments. We assume the latest issue date approximates the data extract date; 60‑month loans and post-2017 vintages are out of scope for now. Feature engineering is intentionally light, text is basic, and macro-shock robustness is untested. Early-delinquency proxies for live monitoring are not yet folded in.

Where to push next

Three tracks unlock the most value quickly. Coverage and features: extend to 60‑month and post-2017 vintages with proper censoring; add richer engineered ratios/interactions, better text embeddings, and a v2 that explicitly tracks potentially unstable characteristics with monitoring hooks. Productization and experimentation: move notebooks into scripts/CLI (data prep → feature store → training → scoring), wire MLflow for hyperparameter sweeps and registries, and start logging “first 2–3 payments” as early warning signals. Causal leverage: design lightweight A/B or Bayesian sequential tests in mid-bands to isolate pricing and limit effects; treat payment-burden elasticity as an explicit assumption until randomized data arrives. In parallel, deploy the current XGB score in shadow alongside the grade policy, recalibrating monthly with recent vintages and watching PSI/stability on the selected features.

Credits

This project uses the public LendingClub loan dataset, made available by LendingClub through its public data releases. The data is used strictly for research and educational purposes, and LendingClub is not affiliated with or responsible for the analyses and conclusions presented here.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
functions.py		functions.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoanCreditRisk

Data, curation, and modeling stance

What the data says

Model results and behavior

Policy lens and economic signal

Constraints and uncertainties

Where to push next

Credits

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LoanCreditRisk

Data, curation, and modeling stance

What the data says

Model results and behavior

Policy lens and economic signal

Constraints and uncertainties

Where to push next

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages