Financial Risk Analysis Project

Overview

This project is aimed at predicting financial risk for companies, focusing on their ability to avoid defaults on debt obligations. It involves analyzing financial data from 2015 and predicting defaults based on net worth data for 2016. Key methods used include data preprocessing, feature engineering, machine learning, and visualization.

Problem Statement

Defaults in companies can lead to lower credit ratings, higher borrowing costs, and challenges in raising capital. The objective is to predict the likelihood of default using historical financial data to help stakeholders make informed decisions.

Datasets

Primary Dataset

Description: Contains 67 columns representing financial metrics like Net Worth, Total Debt, Revenue, and Profit.
Target Variable: Networth_Next_Year used to derive the default variable.

Stock Price Data

Description: Weekly stock prices for companies from 2014 to 2020.

Project Workflow

1. Data Cleaning and Preprocessing

Column Renaming: Standardized column names (e.g., replaced spaces and special characters with underscores).
Outlier Treatment: Capped outliers using the 5th and 95th percentiles.
Missing Value Imputation: Used median imputation for filling missing values.

2. Feature Engineering

Created the binary target variable default:
- 1: Networth_Next_Year < 0 (Defaulted).
- 0: Networth_Next_Year > 0 (Non-Defaulted).
Addressed multicollinearity using Variance Inflation Factor (VIF).
Selected features based on univariate and bivariate analysis.

3. Exploratory Data Analysis

Key Insights

Boxplots and Heatmaps:
- Variables like Networth and Capital_Employed showed significant separation between default and non-default groups.
Correlation Matrix:
- Highlighted multicollinearity among independent variables like Gross_Block, PBIDT, and Total_Debt.

Variable	Correlation with Target
Networth	0.85
Capital_Employed	0.78
PBIDT	0.72

4. Model Building

4.1 Logistic Regression

Approach A: Removed highly correlated variables using VIF > 5.
Approach B: Used all variables, iteratively removing those with p-values > 0.05.

4.2 Random Forest Classifier

Built a base model with default parameters.
Tuned hyperparameters using GridSearchCV.
Applied SMOTE to address class imbalance.

4.3 Linear Discriminant Analysis (LDA)

Explored LDA for classification but noted weaker performance compared to Random Forest.

5. Model Evaluation

Metrics Used:
- Recall: Prioritized to minimize false negatives.
- Precision: Evaluated to avoid false positives.
- Accuracy: Provided overall performance.

Best Model: Random Forest with SMOTE

Metric	Train Data	Test Data
Accuracy	98%	94%
Recall	93%	91%
Precision	87%	84%
F1-Score	90%	87%

6. Stock Price Analysis

Visualization:
- Weekly stock price trends for companies like Infosys and SAIL.
- Highlighted volatility using boxplots.

Returns and Risk Analysis

Stock	Mean Return	Standard Deviation (Risk)
Shree Cement	5.2%	2.4%
Infosys	4.8%	1.8%
Idea Vodafone	-3.4%	5.8%

7. Key Results

Logistic Regression

Model B (all variables with p-values < 0.05) outperformed Model A.

Random Forest

GridSearchCV-tuned model with SMOTE showed the highest performance.

Stock Analysis

High-risk stocks like Idea Vodafone and Jet Airways showed negative returns and high volatility.
Shree Cement and Infosys emerged as high-return, low-risk stocks.

Recommendations

Investment Strategy:
- Focus on high-return, low-volatility stocks (e.g., Shree Cement, Infosys).
- Avoid high-risk stocks with low returns and high volatility.
Model Deployment:
- Use the Random Forest model with SMOTE for default prediction.
- Regularly update the model with new data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Financial_risk_analysis.ipynb		Financial_risk_analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Risk Analysis Project

Overview

Problem Statement

Datasets

Primary Dataset

Stock Price Data

Project Workflow

1. Data Cleaning and Preprocessing

2. Feature Engineering

3. Exploratory Data Analysis

Key Insights

4. Model Building

4.1 Logistic Regression

4.2 Random Forest Classifier

4.3 Linear Discriminant Analysis (LDA)

5. Model Evaluation

Best Model: Random Forest with SMOTE

6. Stock Price Analysis

Returns and Risk Analysis

7. Key Results

Logistic Regression

Random Forest

Stock Analysis

Recommendations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Financial Risk Analysis Project

Overview

Problem Statement

Datasets

Primary Dataset

Stock Price Data

Project Workflow

1. Data Cleaning and Preprocessing

2. Feature Engineering

3. Exploratory Data Analysis

Key Insights

4. Model Building

4.1 Logistic Regression

4.2 Random Forest Classifier

4.3 Linear Discriminant Analysis (LDA)

5. Model Evaluation

Best Model: Random Forest with SMOTE

6. Stock Price Analysis

Returns and Risk Analysis

7. Key Results

Logistic Regression

Random Forest

Stock Analysis

Recommendations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages