Skip to content

Latest commit

 

History

History
7 lines (6 loc) · 688 Bytes

File metadata and controls

7 lines (6 loc) · 688 Bytes

Machine Learning IPython Notebooks

  1. Classify spam/ham with data from machine learning repository using scikit tfidvectorizer
    • load data, preprocess by removing stopwords, punctuations and lowercase all the characters.
    • check the data actual spam, ham counts, get top words related to spam/ham.
    • vectorize the text by tfidvectorizer, since it performs better than countvectorizer.
    • fit the vectorized matrix into randomforestclassifier, multinomialNB and compare the results