Using Machine Learning to predict cervix cancer risk
ML is on its way to become a powerful dyagnostic tool. Unitl then, it can be used to determine where to focus screening efforts. In this project I try to predict the risk of having a positive cervical cancer biopsy using survey infromation regarding potential risk factors. The dataset comprises demographic information, habits, and historic medical records of 858 Venezuelan patients.
There are two directories Code and Data.
Four Jupyter Notebooks:
cervix_project_1. Data_preparation. Data splitting, cleaning and LogisticRegression
cervix_project_2.Model_design_1. Testing a neural network model to predict the results of a biopsy
cervix_project_2.Model_design_2. Set initial bias to train the model
cervix_project_2.Model_design_3. Using SMOTE to balance dataset
kag_risk_factors_cervical_cancer.csv. csv file downloaded from Kaggle (https://www.kaggle.com/loveall/cervical-cancer-risk-classification). The dataset is originally from Venezuela (https://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29)
Cervix_cancer_data_profiling_report.html. Train Data analysis report
Cervix_cancer_data_profiling_report_post_processing.html. Train Data analysis report after data cleaning
Intermedite output files (csv files)
