https://vizmachinelearning.streamlit.app/
This project is an interactive, educational dashboard designed to help beginners understand how the Naive Bayes algorithm works. It visually guides you through the process of feeding data into the algorithm and exploring its predictions, making complex machine learning concepts more approachable and intuitive. Essentially, we’ve built a way to see Naive Bayes in action!
- Interactive Pipeline Stepper: Step through each preprocessing stage of the Naive Bayes algorithm.
- Dynamic Hyperparameter Tuning: Easily adjust key settings (like variance smoothing and alpha) and observe the impact on model performance in real-time.
- Real-Time Graph Updates: See how the Gaussian distribution changes as you tune hyperparameters, and analyze the model's accuracy and confusion matrix.
- Python
- Scikit-Learn
- Pandas
- Plotly
- Streamlit
- Clone the repository:
git clone https://github.com/your-username/your-repository.git
cd your-repository- Create a virtual environment:
python -m venv venv-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate - On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Install dependencies:
pip install -r requirements.txt- Run the app:
streamlit run app.py- Load Data: Upload your own CSV dataset or select one of the built-in datasets.
- Tune Hyperparameters: Use the sliders in the left panel to adjust the variance smoothing and alpha parameters.
- View Insights: Observe the changes in the accuracy, Gaussian bell curve, and confusion matrix on the right panel as you adjust the hyperparameters.
- Functionality: This section represents the core of the application. It takes raw data and transforms it into a format suitable for the Naive Bayes model.
- Stages: The pipeline includes the following stages:
- Feature Selection: Automatically selects relevant features from the input data.
- Categorical Encoding: Converts categorical features into numerical representations using One-Hot Encoding.
- Numerical Scaling: Scales numerical features to a consistent range (e.g., using StandardScaler) to prevent features with larger values from dominating the model.
- Under the Hood: Uses
scikit-learn'sOneHotEncoderandStandardScalerclasses.
- Gaussian Distribution Plot: A visually compelling representation of the data’s distribution, demonstrating the impact of
var_smoothingon the curve's shape. A wider curve indicates less smoothing and a more complex distribution. - Accuracy Metrics: A numerical display of the model's accuracy, providing immediate feedback on the impact of hyperparameter tuning.
- Confusion Matrix: A table visualizing the model's classification performance by showing the number of true positives, true negatives, false positives, and false negatives.
- Hyperparameter Sliders: Allows the user to directly adjust the values of
var_smoothingandalpha. - Real-Time Updates: Changes to the hyperparameters are reflected immediately in the visualizations and accuracy metrics.
- Step Details: Expanding a specific step in the pipeline displays more granular information about its function.
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please see the CONTRIBUTING.md file for more information.