GitHub - Finterly/Machine-Learning-Classification-of-Mutation: Machine Learning classification models

Classifying Disease Mutation data

Created machine learning models in R and Python to classify curves of disease data as mutation or no mutation.

The data

7 variables for learning and response variable True (is mutation) and False (is not).

R workflow

R results

All models show promising mean accuracy over 90% Random Forest performs the best with mean accuracy of 0.99 Yeo-johnson transformation gave best results

Python workflow

Exploratory analysis in Python

In both the training and validation set, there appears to be correlation variables Variation is greater in certain variables

Preliminary model comparison Python

Final results (Python and R)

Visualized final predictions as a binary heat map. Out of the 100 test samples, 99 samples were curves were classified correctly--with models in close agreeance.

Best performing models (in no particular order):

Random Forest
Naive Bayes
XGBoost Classifier
Gradient Boosting Classifier
AdaBoost
K-Nearest Neighbor
Voting Classifier
Bagging Classifier

Less well performing models(with more tuning better models may be possible):

Decision Tree (would have been nice and interpretable)
Support Vector Machine Classifier
Log Regression

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
img		img
Exploratory_Data_Analysis.ipynb		Exploratory_Data_Analysis.ipynb
ML_Model_Building.ipynb		ML_Model_Building.ipynb
README.md		README.md
my_utils.ipynb		my_utils.ipynb
r_code.R		r_code.R
r_notebook.Rmd		r_notebook.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classifying Disease Mutation data

The data

R workflow

R results

Python workflow

Exploratory analysis in Python

Preliminary model comparison Python

Final results (Python and R)

Best performing models (in no particular order):

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Finterly/Machine-Learning-Classification-of-Mutation

Folders and files

Latest commit

History

Repository files navigation

Classifying Disease Mutation data

The data

R workflow

R results

Python workflow

Exploratory analysis in Python

Preliminary model comparison Python

Final results (Python and R)

Best performing models (in no particular order):

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages