You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project is aimed at predicting financial risk for companies, focusing on their ability to avoid defaults on debt obligations. It involves analyzing financial data from 2015 and predicting defaults based on net worth data for 2016. Key methods used include data preprocessing, feature engineering, machine learning, and visualization.
Problem Statement
Defaults in companies can lead to lower credit ratings, higher borrowing costs, and challenges in raising capital. The objective is to predict the likelihood of default using historical financial data to help stakeholders make informed decisions.
Datasets
Primary Dataset
Description: Contains 67 columns representing financial metrics like Net Worth, Total Debt, Revenue, and Profit.
Target Variable: Networth_Next_Year used to derive the default variable.
Stock Price Data
Description: Weekly stock prices for companies from 2014 to 2020.
Project Workflow
1. Data Cleaning and Preprocessing
Column Renaming: Standardized column names (e.g., replaced spaces and special characters with underscores).
Outlier Treatment: Capped outliers using the 5th and 95th percentiles.
Missing Value Imputation: Used median imputation for filling missing values.
2. Feature Engineering
Created the binary target variable default:
1: Networth_Next_Year < 0 (Defaulted).
0: Networth_Next_Year > 0 (Non-Defaulted).
Addressed multicollinearity using Variance Inflation Factor (VIF).
Selected features based on univariate and bivariate analysis.
3. Exploratory Data Analysis
Key Insights
Boxplots and Heatmaps:
Variables like Networth and Capital_Employed showed significant separation between default and non-default groups.
Correlation Matrix:
Highlighted multicollinearity among independent variables like Gross_Block, PBIDT, and Total_Debt.
Variable
Correlation with Target
Networth
0.85
Capital_Employed
0.78
PBIDT
0.72
4. Model Building
4.1 Logistic Regression
Approach A: Removed highly correlated variables using VIF > 5.
Approach B: Used all variables, iteratively removing those with p-values > 0.05.
4.2 Random Forest Classifier
Built a base model with default parameters.
Tuned hyperparameters using GridSearchCV.
Applied SMOTE to address class imbalance.
4.3 Linear Discriminant Analysis (LDA)
Explored LDA for classification but noted weaker performance compared to Random Forest.
5. Model Evaluation
Metrics Used:
Recall: Prioritized to minimize false negatives.
Precision: Evaluated to avoid false positives.
Accuracy: Provided overall performance.
Best Model: Random Forest with SMOTE
Metric
Train Data
Test Data
Accuracy
98%
94%
Recall
93%
91%
Precision
87%
84%
F1-Score
90%
87%
6. Stock Price Analysis
Visualization:
Weekly stock price trends for companies like Infosys and SAIL.
Highlighted volatility using boxplots.
Returns and Risk Analysis
Stock
Mean Return
Standard Deviation (Risk)
Shree Cement
5.2%
2.4%
Infosys
4.8%
1.8%
Idea Vodafone
-3.4%
5.8%
7. Key Results
Logistic Regression
Model B (all variables with p-values < 0.05) outperformed Model A.
Random Forest
GridSearchCV-tuned model with SMOTE showed the highest performance.
Stock Analysis
High-risk stocks like Idea Vodafone and Jet Airways showed negative returns and high volatility.
Shree Cement and Infosys emerged as high-return, low-risk stocks.
Recommendations
Investment Strategy:
Focus on high-return, low-volatility stocks (e.g., Shree Cement, Infosys).
Avoid high-risk stocks with low returns and high volatility.
Model Deployment:
Use the Random Forest model with SMOTE for default prediction.
Regularly update the model with new data.
About
machine learning techniques to predict company defaults by optimizing the trade-off between recall (minimizing false negatives) and precision (avoiding false positives). Logistic Regression and Random Forest models were trained, with emphasis on recall to ensure accurate identification of high-risk companies.