A comprehensive repository dedicated to recreating popular machine learning models from scratch in Python, designed to facilitate deep understanding through practical examples.
Explore the Documentation »
View Demo
·
Report Bug
·
Request Feature
This repository is aimed at providing an educational resource for those seeking to understand the inner workings of machine learning algorithms. While these implementations may not match the efficiency or completeness of mature libraries like scikit-learn, the focus here is on simplicity and clarity, allowing users to see the mechanics behind each algorithm.
Each subfolder contains a demo.ipynb notebook that demonstrates a practical application of the corresponding algorithm. This includes loading a dataset, performing basic exploratory data analysis (EDA), and fitting the model to the data.
In addition to the code, a series of articles have been published that provide detailed explanations of each model. These articles cover the mathematical foundations, use cases, assumptions, advantages, and limitations of each algorithm, as well as a thorough breakdown of the corresponding Python implementation. For a deeper understanding, I highly recommend referring to these articles.
Here is a collection of articles detailing the theory and implementation of each model in this repository:
Foundational algorithms for classification and regression.
| Model | Description | Article |
|---|---|---|
| KNN | K-Nearest Neighbors algorithm for classification and regression. | Read |
| Naive Bayes | Probabilistic classifier based on applying Bayes' theorem. | Read |
| ID3 Decision Tree | Iterative Dichotomiser 3 algorithm for decision tree construction. | Read |
| CART | Classification and Regression Trees for predictive modeling. | Read |
| SVC | Support Vector Classifier for finding optimal hyperplanes. | Read |
Powerful techniques combining multiple models.
| Model | Description | Article |
|---|---|---|
| Random Forest | Ensemble learning method using multiple decision trees. | Read |
| AdaBoost | Adaptive Boosting ensemble method focusing on difficult samples. | Read |
| XGBoost | Optimized distributed gradient boosting library. | Read |
Discovering hidden patterns in unlabeled data.
| Model | Description | Article |
|---|---|---|
| K-Means Clustering | Unsupervised learning algorithm for partitioning data into k clusters. | Read |
| PCA | Dimensionality reduction using Principal Component Analysis. | Read |
The mathematical engines behind successful model training.
| Model | Description | Article |
|---|---|---|
| SGD | Stochastic Gradient Descent optimization algorithm. | Read |
| Adam Optimizer | Adaptive Moment Estimation optimization algorithm. | Read |
| Nadam Optimizer | Nesterov-accelerated Adaptive Moment Estimation. | Read |
The building blocks of modern AI.
| Model | Description | Article |
|---|---|---|
| Neural Networks | Foundational architecture of deep learning models. | Read |
| Batch Normalization | Technique to improve training stability by normalizing layer inputs. | Read |
| Fine-Tuning DNNs | Strategies for adapting pre-trained deep neural networks. | Read |
Specialized architectures for vision, sequence, and recursive processing.
| Model | Description | Article |
|---|---|---|
| CNN | Convolutional Neural Networks for processing grid-like data (images). | Read |
| Deep CNN (AlexNet) | Deep Convolutional Neural Network architecture. | Read |
| RNN | Recurrent Neural Networks for sequential data processing. | Read |
| LSTM | Long Short-Term Memory networks for processing sequential data. | Read |
| Recursive Language Models | Novel approach to maintain infinite context via recursion. | Read |
To get started with the project locally, follow these steps:
Ensure you have Python installed on your machine. You can install the necessary packages using either pip or conda.
-
Clone the repository:
git clone https://github.com/cristianleoo/models-from-scratch-python.git cd models-from-scratch-python -
Create and Activate a Virtual Environment: It is recommended to use a virtual environment to manage dependencies.
# Create virtual environment python3 -m venv venv # Activate it (Mac/Linux) source venv/bin/activate # Or on Windows: # venv\Scripts\activate
-
Install required packages:
pip install -r requirements.txt
-
Register the Jupyter Kernel: To ensure your notebooks use the correct environment, register it as a kernel:
python -m ipykernel install --user --name=models-from-scratch --display-name "Python (Models From Scratch)"Now, when you open a notebook, select Python (Models From Scratch) as your kernel.
-
(Optional) Run Tests: Validate the setup by running the unit tests:
python -m unittest discover
Now you're ready to explore the repository and learn how these machine learning models work under the hood!
Feel free to reach out via GitHub issues for any bugs, feature requests, or suggestions.
This version maintains the structure but improves the language to make it more polished and professional while emphasizing the educational nature of the project. Let me know if you need further adjustments!