diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 00000000..b41fc756
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,14 @@
+repos:
+  - repo: https://github.com/JakobKlotz/md-snakeoil
+    rev: v0.1.7
+    hooks:
+      - id: snakeoil
+
+  - repo: https://github.com/hukkin/mdformat
+    rev: 1.0.0  # Use the ref you want to point at
+    hooks:
+    - id: mdformat
+      additional_dependencies:
+        - mdformat-mkdocs
+      args: [--wrap, "79"]
+          
\ No newline at end of file
diff --git a/docs/data-science/algorithms/index.md b/docs/data-science/algorithms/index.md
index d7d2eb5a..b1bf07a0 100644
--- a/docs/data-science/algorithms/index.md
+++ b/docs/data-science/algorithms/index.md
@@ -1,37 +1,36 @@
 # Introduction
 
-With extensive data preparation knowledge, we can tackle the next
-big part of the course: algorithms. An algorithm is a
+With extensive data preparation knowledge, we can tackle the next big part of
+the course: algorithms. An algorithm is a
 
 > a set of mathematical instructions or rules that, especially if given to a
 > computer, will help to calculate an answer to a problem.
-> 
+>
 > [Cambridge Dictionary](https://dictionary.cambridge.org/de/worterbuch/englisch/algorithm)
 
-In data science/machine learning, algorithms are used to solve problems, 
-such as modelling data to make predictions for unseen data, or clustering data 
-to find patterns.
+In data science/machine learning, algorithms are used to solve problems, such
+as modelling data to make predictions for unseen data, or clustering data to
+find patterns.
 
-The consecutive chapters will introduce you to common algorithms, like 
-linear and logistic regression, decision trees and k-means clustering. We 
-will explore the theory as well as practical examples. First, we establish two 
-main concepts in machine learning: supervised and unsupervised learning.
+The consecutive chapters will introduce you to common algorithms, like linear
+and logistic regression, decision trees and k-means clustering. We will explore
+the theory as well as practical examples. First, we establish two main concepts
+in machine learning: supervised and unsupervised learning.
 
 ## Supervised Learning
 
-Supervised learning is a type of machine learning where algorithms learn from 
-^^labeled^^ training data to make predictions on new, unseen data. The term 
-"supervised" comes from the idea that the algorithm is guided by a 
-"supervisor" (the labeled data) that provides the correct answers during
-training.
+Supervised learning is a type of machine learning where algorithms learn from
+^^labeled^^ training data to make predictions on new, unseen data. The term
+"supervised" comes from the idea that the algorithm is guided by a "supervisor"
+(the labeled data) that provides the correct answers during training.
 
 In supervised learning, each training example consists of:
 
-- Input features (\(X\)): The characteristics or attributes we use to make 
+- Input features (\(X\)): The characteristics or attributes we use to make
     predictions
 - Target variable (\(y\)): The correct output we want to predict
 
-The algorithm learns the relationship between inputs (\(X\)) and outputs 
+The algorithm learns the relationship between inputs (\(X\)) and outputs
 (\(y\)), creating a model that can then (hopefully!) generalize to new data.
 
 ### Example
@@ -60,49 +59,49 @@ new_apartment = [[150, 5]]
 predicted_price = model.predict(new_apartment)
 ```
 
-1. Underscores can be used as visual separators in numeric literals
-   to improve readability. They have no effect on the value of the number. For
-   example, `#!python 500_000` is the same as `#!python 500000`.
+1. Underscores can be used as visual separators in numeric literals to improve
+    readability. They have no effect on the value of the number. For example,
+    `#!python 500_000` is the same as `#!python 500000`.
 
 For each new observation, we can use the trained model to predict the price.
-The apartment with 150m² and 5 rooms has a predicted price of `#!python 
-775000`.
+The apartment with 150m² and 5 rooms has a predicted price of
+`#!python 775000`.
 
 ???+ info
 
-    Whether this estimate is actually close to reality depends on the
-    quality of the model and its underlying data. Later, we will 
-    discuss how to measure a model's quality.
+    Whether this estimate is actually close to reality depends on the quality of
+    the model and its underlying data. Later, we will discuss how to measure a
+    model's quality.
 
----
+______________________________________________________________________
 
 ### Classification vs. Regression
 
 Supervised learning encapsulates ^^both^^ classification and regression tasks.
 
-``` mermaid
+```mermaid
 graph LR
   A[Supervised Learning] --> B[Classification];
   A --> C[Regression];
 ```
 
----
+______________________________________________________________________
 
 #### Classification
 
 Classification problems involve predicting discrete categories or labels. The
 output is always one of a fixed set of classes. For instance, in binary
-classification, the model decides between two possibilities. 
+classification, the model decides between two possibilities.
 
-For example, the Portuguese retail bank data can be used to predict 
-whether a customer would subscribe to a term deposit. The target variable is 
-binary: yes or no.
+For example, the Portuguese retail bank data can be used to predict whether a
+customer would subscribe to a term deposit. The target variable is binary: yes
+or no.
 
-On the other hand, multiclass classification handles three or more categories 
-(like classifying animals in photos :fontawesome-solid-arrow-right: dog, 
-cat, dolphin, tiger, elephant, etc.).
+On the other hand, multiclass classification handles three or more categories
+(like classifying animals in photos :fontawesome-solid-arrow-right: dog, cat,
+dolphin, tiger, elephant, etc.).
 
----
+______________________________________________________________________
 
 #### Regression
 
@@ -112,18 +111,19 @@ numerical value along a continuous spectrum. These models work by finding
 patterns in the data to estimate a mathematical function that best describes
 the relationship between input features and the target variable.
 
-For instance the example, predicting the price of an apartment based on 
-its size and the number of rooms is a regression task.
+For instance the example, predicting the price of an apartment based on its
+size and the number of rooms is a regression task.
 
----
+______________________________________________________________________
 
 #### Examples
 
 <div class="grid cards" markdown>
 
--   __Classification__
+- __Classification__
+
+    ______________________________________________________________________
 
-    ---
     Predicting a ^^categorical^^ target variable:
 
     - Spam or not spam
@@ -133,11 +133,12 @@ its size and the number of rooms is a regression task.
     - Image classification (cat, dog, dolphin, etc.)
     - ...
 
--   __Regression__
+- __Regression__
+
+    ______________________________________________________________________
 
-    ---
     Predicting a ^^continuous^^ target variable:
-  
+
     - Apartment prices (like in the example above)
     - Temperature
     - Sales revenue
@@ -146,18 +147,18 @@ its size and the number of rooms is a regression task.
 </div>
 
 ???+ info
-  
-    No matter if you're dealing with a classification or regression task, the 
-    key to successful supervised learning lies in having high-quality labeled
-    data and selecting appropriate features (variables) that have predictive 
-    power for the target variable.
+
+    No matter if you're dealing with a classification or regression task, the key
+    to successful supervised learning lies in having high-quality labeled data and
+    selecting appropriate features (variables) that have predictive power for the
+    target variable.
 
 ## Unsupervised Learning
 
-Contrary, unsupervised learning deals with ^^unlabeled^^ data to discover 
-hidden patterns and structures. Unlike supervised learning, there is no 
-"supervisor" providing correct answers. The algorithm tries to find
-meaningful patterns on its own.
+Contrary, unsupervised learning deals with ^^unlabeled^^ data to discover
+hidden patterns and structures. Unlike supervised learning, there is no
+"supervisor" providing correct answers. The algorithm tries to find meaningful
+patterns on its own.
 
 In unsupervised learning, we solely have:
 
@@ -174,13 +175,7 @@ Let's say we want to segment customers based on their shopping behavior:
 from sklearn.cluster import KMeans
 
 # customer data [annual_spending, avg_basket_size]
-X = [
-    [1200, 50],
-    [5000, 150],
-    [800, 30],
-    [4500, 140],
-    [1000, 45]
-]
+X = [[1200, 50], [5000, 150], [800, 30], [4500, 140], [1000, 45]]
 
 # use k-means to find customer segments
 model = KMeans(n_clusters=2, random_state=42)  # (1)!
@@ -189,22 +184,22 @@ segments = model.fit_predict(X)
 print(segments)
 ```
 
-1. Setting the `random_state` parameter ensures that you always get the same 
-    results when executing the code repeatedly. Reproducibility is discussed 
+1. Setting the `random_state` parameter ensures that you always get the same
+    results when executing the code repeatedly. Reproducibility is discussed
     more in-depth in upcoming chapters.
 
 ```title=">>> Output"
 [1 0 1 0 1]
 ```
 
-The variable `segments` contains the cluster assignments for each customer. 
-The cluster assignment is simply an `#!python int` indicating which group the 
-customer belongs to. In this example, we have two clusters with the first 
-customer (`#!python [1200, 50]`) belonging to cluster 1 and the second 
-customer (`#!python [5000, 150]`) to cluster 0 and so on.
+The variable `segments` contains the cluster assignments for each customer. The
+cluster assignment is simply an `#!python int` indicating which group the
+customer belongs to. In this example, we have two clusters with the first
+customer (`#!python [1200, 50]`) belonging to cluster 1 and the second customer
+(`#!python [5000, 150]`) to cluster 0 and so on.
 
-The following plot visualizes the input data as scatter plot 
-colored by the cluster assignments:
+The following plot visualizes the input data as scatter plot colored by the
+cluster assignments:
 
 <div style="text-align: center;">
     <iframe src="/assets/data-science/algorithms/clusters.html" width="600" height="450">
@@ -219,28 +214,28 @@ colored by the cluster assignments:
 The algorithm will group similar customers together without being told what
 these groups should be, it discovers the patterns based on attributes.
 
----
+______________________________________________________________________
 
 ### Clustering & Dimensionality Reduction
 
 Unsupervised learning can be further divided into two main categories:
 
-``` mermaid
+```mermaid
 graph LR
   A[Unsupervised Learning] --> B[Clustering];
   A --> C[Dimensionality Reduction];
 ```
 
----
+______________________________________________________________________
 
 #### Clustering
 
 Clustering algorithms group similar data points together based on their
-features. The goal is to find cluster/groups in the data without any 
-prior knowledge of the groups just like in the previous customer segmentation
+features. The goal is to find cluster/groups in the data without any prior
+knowledge of the groups just like in the previous customer segmentation
 example.
 
----
+______________________________________________________________________
 
 #### Dimensionality Reduction
 
@@ -248,15 +243,16 @@ Dimensionality reduction techniques aim to reduce the number of input features
 while preserving the most important information. This can help to simplify
 complex data, speed up algorithms and improve model performance.
 
----
+______________________________________________________________________
 
 ### Examples
 
 <div class="grid cards" markdown>
 
--   __Clustering__
+- __Clustering__
+
+    ______________________________________________________________________
 
-    ---
     Clustering/grouping of similar data points:
 
     - Customer segmentation in marketing (like in the example above)
@@ -264,11 +260,12 @@ complex data, speed up algorithms and improve model performance.
     - Product recommendations
     - ...
 
--   __Dimensionality Reduction__
+- __Dimensionality Reduction__
+
+    ______________________________________________________________________
 
-    ---
     Reducing data complexity:
-  
+
     - Feature extraction from high-dimensional data
     - Visualization of complex datasets
     - Noise reduction in signals
@@ -283,26 +280,26 @@ complex data, speed up algorithms and improve model performance.
     answers to compare against. The value of the results often depends on how
     meaningful the discovered patterns are for the specific application.
 
----
+______________________________________________________________________
 
 ???+ tip "Domain knowledge"
 
-    No matter if you're dealing with supervised or unsupervised learning,
-    domain knowledge is crucial. Understanding the data and the problem you're
-    trying to solve will help you select the right algorithms, features and 
-    interpret the results.
+    No matter if you're dealing with supervised or unsupervised learning, domain
+    knowledge is crucial. Understanding the data and the problem you're trying to
+    solve will help you select the right algorithms, features and interpret the
+    results.
 
 ## Recap
 
-This chapter introduced two fundamental concepts in machine learning, 
+This chapter introduced two fundamental concepts in machine learning,
 supervised and unsupervised learning:
 
 | Concept                   | Data                   | Task                     | Goal                      |
-|---------------------------|------------------------|--------------------------|---------------------------|
+| ------------------------- | ---------------------- | ------------------------ | ------------------------- |
 | **Supervised Learning**   | Labeled (\(X\), \(y\)) | Regression               | Predict continuous values |
 |                           |                        | Classification           | Predict categories        |
 | **Unsupervised Learning** | Unlabeled (\(X\))      | Clustering               | Group similar data        |
 |                           |                        | Dimensionality Reduction | Reduce data complexity    |
 
-The following chapters will cover algorithms for each task with theory and 
-practical examples.
\ No newline at end of file
+The following chapters will cover algorithms for each task with theory and
+practical examples.
diff --git a/docs/data-science/algorithms/supervised/classification.md b/docs/data-science/algorithms/supervised/classification.md
index a4b9a818..58d11582 100644
--- a/docs/data-science/algorithms/supervised/classification.md
+++ b/docs/data-science/algorithms/supervised/classification.md
@@ -4,10 +4,10 @@
 
 While linear regression helps us predict continuous values, other real-world
 problems require predicting categorical outcomes: Will a customer subscribe to
-a term deposit? Is an email spam? Is a transaction fraudulent? 
-Logistic regression addresses these binary classification problems by extending
-the concepts we learned in linear regression to predict probabilities between 
-0 and 1.
+a term deposit? Is an email spam? Is a transaction fraudulent? Logistic
+regression addresses these binary classification problems by extending the
+concepts we learned in linear regression to predict probabilities between 0 and
+1\.
 
 We will cover the theory and apply logistic regression to the breast cancer
 dataset to predict whether a tumor is malignant or benign.
@@ -18,22 +18,21 @@ dataset to predict whether a tumor is malignant or benign.
 
     The theoretical part is adapted from:
 
-    ^^Daniel Jurafsky and James H. Martin. 2025. Speech and Language 
-    Processing: *An Introduction to Natural Language Processing, Computational 
-    Linguistics, and Speech Recognition with Language Models*[^1]^^
+    ^^Daniel Jurafsky and James H. Martin. 2025. Speech and Language Processing:
+    *An Introduction to Natural Language Processing, Computational Linguistics, and
+    Speech Recognition with Language Models*[^1]^^
 
-    [^1]:
-        3rd edition. Online manuscript released January 12, 2025.
-        [https://web.stanford.edu/~jurafsky/slp3](https://web.stanford.edu/~jurafsky/slp3)
+    [^1]: 3rd edition. Online manuscript released January 12, 2025.
+    [https://web.stanford.edu/~jurafsky/slp3](https://web.stanford.edu/~jurafsky/slp3)
 
 #### Deja vu: Linear regression
 
-Just like in linear regression, we have a set of features \(x_1, x_2, ..., x_n\)
-describing an outcome \(y\). But instead of predicting a continuous value, \(y\)
-is binary: 0 or 1.
+Just like in linear regression, we have a set of features
+\(x_1, x_2, ..., x_n\) describing an outcome \(y\). But instead of predicting a
+continuous value, \(y\) is binary: 0 or 1.
 
 Similar to linear regression, logistic regression uses a linear combination of
-the features to predict the outcome. I.e., each feature is assigned a 
+the features to predict the outcome. I.e., each feature is assigned a
 **weight**, and a **bias term** is added at the end.
 
 ???+ defi "Linear combination"
@@ -42,18 +41,18 @@ the features to predict the outcome. I.e., each feature is assigned a
     z = b_1 \cdot x_1 + b_2 \cdot x_2 + ... + b_n \cdot x_n + a
     \]
 
-with \(a\) being the bias term and \(b_1, b_2, ..., b_n\) the weights. 
-"The resulting single number \(z\) expresses the weighted sum
-of the evidence for the class." (Jurafsky & Martin, 2025 p. 79)
-Bias, weights and the intercept are all real numbers.
+with \(a\) being the bias term and \(b_1, b_2, ..., b_n\) the weights. "The
+resulting single number \(z\) expresses the weighted sum of the evidence for
+the class." (Jurafsky & Martin, 2025 p. 79) Bias, weights and the intercept are
+all real numbers.
 
-So far, logistic regression is the same as linear regression with the sole 
-difference that in [linear regression](regression.md#linear-regression) we 
-referred to the bias \(a\) as the intercept, and the 
-weights \(b_1, b_2, ..., b_n\) as coefficients or slope.
+So far, logistic regression is the same as linear regression with the sole
+difference that in [linear regression](regression.md#linear-regression) we
+referred to the bias \(a\) as the intercept, and the weights
+\(b_1, b_2, ..., b_n\) as coefficients or slope.
 
-However, \(z\) is not the final prediction, since it can take real values 
-and in fact ranges from \(-\infty\) to \(+\infty\). Thus, \(z\) needs to be 
+However, \(z\) is not the final prediction, since it can take real values and
+in fact ranges from \(-\infty\) to \(+\infty\). Thus, \(z\) needs to be
 transformed to a probability between 0 and 1. This is where the sigmoid
 function comes into play.
 
@@ -68,8 +67,8 @@ uses the sigmoid (or logistic) function to transform \(z\) into a probability
     \sigma(z) = \frac{1}{1 + e^{-z}}
     \]
 
-    The sigmoid function takes the real number \(z\) and transforms it to the 
-    range (0,1).
+    The sigmoid function takes the real number \(z\) and transforms it to the range
+    (0,1).
 
 <div style="text-align: center;">
     <iframe src="/assets/data-science/algorithms/classification/sigmoid.html" width="600" height="450">
@@ -80,36 +79,34 @@ uses the sigmoid (or logistic) function to transform \(z\) into a probability
     </figcaption>
 </div>
 
-For given input features \(x_1, x_2, ..., x_n\), we can calculate the 
-linear combination \(z\) and then apply the sigmoid function to get the 
-probability of the outcome.
-To compute the probability of the outcome being 1 
-:fontawesome-solid-arrow-right: \(P(y=1|x)\), for example 
-if an email is spam, we have to set a decision boundary.
+For given input features \(x_1, x_2, ..., x_n\), we can calculate the linear
+combination \(z\) and then apply the sigmoid function to get the probability of
+the outcome. To compute the probability of the outcome being 1
+:fontawesome-solid-arrow-right: \(P(y=1|x)\), for example if an email is spam,
+we have to set a decision boundary.
 
 ???+ defi "Decision boundary"
 
     If \(\sigma(z) \gt 0.5\), we predict \(y=1\), otherwise \(y=0\).
 
-For instance, if the probability of an email being spam is 0.7, we predict
-that the email is spam \((0.7 \gt 0.5)\). With a probability of 0.4, we 
-predict that the email is *not* spam \((0.4 \le 0.5)\).
+For instance, if the probability of an email being spam is 0.7, we predict that
+the email is spam \((0.7 \gt 0.5)\). With a probability of 0.4, we predict that
+the email is *not* spam \((0.4 \le 0.5)\).
 
 #### The optimization problem
 
-But how do we find the best parameter combination (weights and bias) for our 
-logistic regression model?
-Unlike linear regression, which uses ordinary least squares, logistic
-regression typically uses Maximum Likelihood Estimation (MLE), i.e., the best
-parameters (weights and bias) that maximize the likelihood of the observed
-data.
+But how do we find the best parameter combination (weights and bias) for our
+logistic regression model? Unlike linear regression, which uses ordinary least
+squares, logistic regression typically uses Maximum Likelihood Estimation
+(MLE), i.e., the best parameters (weights and bias) that maximize the
+likelihood of the observed data.
 
 <div style="text-align: center;">
     <iframe src="https://giphy.com/embed/3ohs7KViF6rA4aan5u" width="480" height="355" style="" frameBorder="0" class="giphy-embed" allowFullScreen></iframe>
     <figcaption>Lo and behold, even more math...</figcaption>
 </div>
 
-For optimization purposes we use the negative log-likelihood as our loss 
+For optimization purposes we use the negative log-likelihood as our loss
 function:
 
 ???+ defi "Negative log-likelihood"
@@ -119,7 +116,7 @@ function:
     \]
 
     with:
-    
+
     - \(m\) as the number of training examples
     - \(y_i\) being the the actual class (0 or 1)
     - \(\sigma(z_i)\) is the predicted probability using the sigmoid function
@@ -127,43 +124,40 @@ function:
 
 ???+ tip
 
-    Intuitively speaking, the loss function penalizes the model for making 
-    wrong predictions. If the model predicts a probability of 0.9 for a 
-    spam email, and the email is actually spam (\(y=1\)), the loss is small.
-    On the other hand, if the model predicts a probability of 0.1 for a 
-    spam email, and the email is spam (\(y=1\)), the loss will be high. 
+    Intuitively speaking, the loss function penalizes the model for making wrong
+    predictions. If the model predicts a probability of 0.9 for a spam email, and
+    the email is actually spam (\(y=1\)), the loss is small. On the other hand, if
+    the model predicts a probability of 0.1 for a spam email, and the email is spam
+    (\(y=1\)), the loss will be high.
+
+    The weights are gradually adjusted to minimize the loss. Think of it like
+    turning knobs slowly until we get better predictions.
 
-    The weights are gradually adjusted to minimize the loss.
-    Think of it like turning knobs slowly until we get better predictions.
-    
-    Gradually adjusting these knobs to minimize the loss is referred to as
-    gradient descent.
+    Gradually adjusting these knobs to minimize the loss is referred to as gradient
+    descent.
 
-Conveniently, `scikit-learn` provides a logistic regression implementation
-that takes care of the optimization for us. Finally, we look at a 
-practical example to see logistic regression in action.
+Conveniently, `scikit-learn` provides a logistic regression implementation that
+takes care of the optimization for us. Finally, we look at a practical example
+to see logistic regression in action.
 
 ## Example
 
 Let's apply logistic regression to the breast cancer dataset, a classic binary
-classification problem where we need to predict whether a tumor is *malignant 
+classification problem where we need to predict whether a tumor is *malignant
 or benign* based on various features.
 
 With class labels \(y\) being 0 (malignant) or 1 (benign), we can use logistic
-regression to predict the probability of a tumor being benign. The features 
+regression to predict the probability of a tumor being benign. The features
 were calculated from digitized images of a breast mass.
 
 ???+ info
 
-    See the [UCI Machine Learning Repository](https://doi.org/10.24432/C5DW2B)
-    for more information on the data set.[^2]
-
-    [^2]:
-        Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). 
-        Breast Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine 
-        Learning Repository. 
-        [https://doi.org/10.24432/C5DW2B](https://doi.org/10.24432/C5DW2B).
+    See the [UCI Machine Learning Repository](https://doi.org/10.24432/C5DW2B) for
+    more information on the data set.[^2]
 
+    [^2]: Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). Breast
+    Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository.
+    [https://doi.org/10.24432/C5DW2B](https://doi.org/10.24432/C5DW2B).
 
 ### Load the data
 
@@ -190,10 +184,9 @@ tumors.
 
 ???+ tip
 
-    Just like in the previous chapter, the data is divided into `X`, containing 
-    the attributes and `y` holding the corresponding labels. Having attributes 
-    and labels separated, makes life a bit easier when training and testing the 
-    model.
+    Just like in the previous chapter, the data is divided into `X`, containing the
+    attributes and `y` holding the corresponding labels. Having attributes and
+    labels separated, makes life a bit easier when training and testing the model.
 
 ???+ question "Number of features"
 
@@ -206,13 +199,14 @@ How many features (attributes) does the breast cancer dataset have?
 - [ ] 32
 
 `X.shape` reveals that we are dealing with 30 features.
+
 </p>
 </quiz>
 
 ### Split the data
 
 Before training our model, we want to split our data into two parts. Just like
-in the previous chapter, we perform a 80/20 split, i.e., we use 80% to train 
+in the previous chapter, we perform a 80/20 split, i.e., we use 80% to train
 the model and evaluate it on the remaining 20%.
 
 ```python
@@ -225,9 +219,9 @@ X_train, X_test, y_train, y_test = train_test_split(
 
 ???+ tip
 
-    If you need a refresh on the parameters used in `train_test_split()` 
-    revisit, the [Split the data](regression.md#split-the-data) section from 
-    the previous chapter.
+    If you need a refresh on the parameters used in `train_test_split()` revisit,
+    the [Split the data](regression.md#split-the-data) section from the previous
+    chapter.
 
 ### Train the model
 
@@ -240,20 +234,20 @@ model = LogisticRegression(random_state=42, max_iter=5_000)  # (1)!
 model.fit(X_train, y_train)
 ```
 
-1. The `random_state` parameter ensures reproducibility, while
-    `max_iter` specifies the maximum number of iterations taken for the solver 
-    to converge (i.e., solving the optimization problem to find the best 
+1. The `random_state` parameter ensures reproducibility, while `max_iter`
+    specifies the maximum number of iterations taken for the solver to
+    converge (i.e., solving the optimization problem to find the best
     parameter combination).
 
 `#!python model=LogisticRegression(...)` creates an instance of the logistic
-regression model. Only after calling the `fit()` method, the `model` is 
-actually trained. Since we separated attributes and labels into `X_train` and 
-`y_train` respectively, we can directly call the method without any 
-further data handling.
+regression model. Only after calling the `fit()` method, the `model` is
+actually trained. Since we separated attributes and labels into `X_train` and
+`y_train` respectively, we can directly call the method without any further
+data handling.
 
 #### Weights and bias
 
-With a trained model at hand, we can look at the weights \((b_1, b_2, ..., 
+With a trained model at hand, we can look at the weights \((b_1, b_2, ...,
 b_n)\) and bias \((a)\).
 
 ```python
@@ -264,43 +258,43 @@ print(f"Model weights: {model.coef_}")
 Model weights: [[ 0.98208299  0.22519686 -0.36688444  0.0262268 ... ]]
 ```
 
-The `coef_` attribute contains the weight for each feature. 
+The `coef_` attribute contains the weight for each feature.
 [As discussed](#deja-vu-linear-regression), the weights are real numbers.
 
 ???+ warning "You might not have the exact same results"
 
-    Your model weights might differ slightly from the ones shown above. 
-    This is completely normal and happens because:
+    Your model weights might differ slightly from the ones shown above. This is
+    completely normal and happens because:
 
-    **Numerical precision**: The default optimization solver 
-    (`#!python "lbfgs"`) behind `LogisticRegression` encounters tiny 
-    hardware-specific variations. The underlying libraries handle 
-    floating-point arithmetic differently across hardware platforms. During the
-    iterative optimization, these tiny rounding differences  accumulate, 
-    causing the solver to converge to slightly different solutions.
+    **Numerical precision**: The default optimization solver (`#!python "lbfgs"`)
+    behind `LogisticRegression` encounters tiny hardware-specific variations. The
+    underlying libraries handle floating-point arithmetic differently across
+    hardware platforms. During the iterative optimization, these tiny rounding
+    differences accumulate, causing the solver to converge to slightly different
+    solutions.
 
-    :fontawesome-solid-lightbulb: These small differences don't affect your 
-    model's predictions or accuracy.
+    :fontawesome-solid-lightbulb: These small differences don't affect your model's
+    predictions or accuracy.
 
 Now, it's your turn to look at the bias.
 
 ???+ question "Model bias"
 
     1. Open the `scikit-learn` docs on the
-       [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
-       class.
-    2. Find out how to access the bias term of the model.
-    3. Simply print the bias term of the model.
+        [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
+        class.
+    1. Find out how to access the bias term of the model.
+    1. Simply print the bias term of the model.
 
-    :fontawesome-solid-lightbulb: Remember, the bias is often referred to as 
+    :fontawesome-solid-lightbulb: Remember, the bias is often referred to as
     intercept.
 
 ### Predictions
 
-Since, the main purpose of a machine learning model is to make predictions,
-we will do just that.
+Since, the main purpose of a machine learning model is to make predictions, we
+will do just that.
 
-Predicting, is as simple as using the `predict()` method. We will use the 
+Predicting, is as simple as using the `predict()` method. We will use the
 patient measurements of the test set - `X_test`.
 
 ```python
@@ -314,25 +308,24 @@ print(y_pred[:5])
 [1 0 0 1 1]
 ```
 
-Congratulations, you just build a machine learning model to predict breast 
-cancer. But how good is the model? To conclude the chapter, we will briefly 
+Congratulations, you just build a machine learning model to predict breast
+cancer. But how good is the model? To conclude the chapter, we will briefly
 evaluate the model's performance.
 
 ### Evaluate the model
 
-Surely, we could just manually compare the predictions (`y_pred`) with the 
-actual labels (`y_test`) and evaluate how often the model was correct. Or 
+Surely, we could just manually compare the predictions (`y_pred`) with the
+actual labels (`y_test`) and evaluate how often the model was correct. Or
 instead, we can leverage another method called `score()`.
 
 ```python
 score = model.score(X_test, y_test)
 ```
 
-First, the `score()` method takes `X_test` and makes the corresponding 
-predictions and programmatically compares the predictions with the actual 
-labels `y_test`. `score()` returns the accuracy 
-:fontawesome-solid-arrow-right: the proportion of correctly
-classified instances. 
+First, the `score()` method takes `X_test` and makes the corresponding
+predictions and programmatically compares the predictions with the actual
+labels `y_test`. `score()` returns the accuracy :fontawesome-solid-arrow-right:
+the proportion of correctly classified instances.
 
 ```python
 print(f"Model accuracy: {round(score, 4)}")
@@ -342,33 +335,33 @@ print(f"Model accuracy: {round(score, 4)}")
 Model accuracy: 0.9561
 ```
 
-In our case, the model correctly classified 95.61% of the test set. In 
-other words, in 95.61% of instances, the model was able to correctly predict 
-if a tumor is malignant or benign.
+In our case, the model correctly classified 95.61% of the test set. In other
+words, in 95.61% of instances, the model was able to correctly predict if a
+tumor is malignant or benign.
 
 ???+ tip
 
     As the test set (both attributes and labels) were never used to train the
-    model, the accuracy is a good indicator of how well the model generalizes
-    to unseen data.
+    model, the accuracy is a good indicator of how well the model generalizes to
+    unseen data.
 
 ## Recap
 
 We covered logistic regression, a popular algorithm for binary classification.
 
-Upon discussing the theory, we discovered similarities to linear regression 
-in regard to the linear combination of features. With the help of the 
-sigmoid function, we transformed the linear combination into probabilities
-between 0 and 1.
+Upon discussing the theory, we discovered similarities to linear regression in
+regard to the linear combination of features. With the help of the sigmoid
+function, we transformed the linear combination into probabilities between 0
+and 1.
 
-Subsequently, we trained a logistic regression model on the breast cancer
-data to predict whether a tumor is malignant or benign. To evaluate the 
-model we split the data and finally calculated the accuracy.
+Subsequently, we trained a logistic regression model on the breast cancer data
+to predict whether a tumor is malignant or benign. To evaluate the model we
+split the data and finally calculated the accuracy.
 
 ???+ info
 
-    In subsequent chapters we will explore more sophisticated ways to split 
-    data and evaluate models.
+    In subsequent chapters we will explore more sophisticated ways to split data
+    and evaluate models.
 
 Next up, we will dive into algorithms, like decision trees and random forest,
 that can handle both regression and classification problems.
diff --git a/docs/data-science/algorithms/supervised/regression.md b/docs/data-science/algorithms/supervised/regression.md
index d005d629..87b35cbf 100644
--- a/docs/data-science/algorithms/supervised/regression.md
+++ b/docs/data-science/algorithms/supervised/regression.md
@@ -2,14 +2,14 @@
 
 ## Linear Regression
 
-In machine learning, we often want to predict continuous numerical values, like 
-house prices, temperatures or sales figures. Linear regression also knows as 
-Ordinary Least Squares (OLS) provides a foundational approach to this problem 
-by modeling the relationship between input variables and a target variable 
+In machine learning, we often want to predict continuous numerical values, like
+house prices, temperatures or sales figures. Linear regression also knows as
+Ordinary Least Squares (OLS) provides a foundational approach to this problem
+by modeling the relationship between input variables and a target variable
 using a straight line.
 
-This chapter introduces linear regression through a hands-on example.
-You'll learn to:
+This chapter introduces linear regression through a hands-on example. You'll
+learn to:
 
 - Build and train a linear regression model
 - Interpret model parameters (intercept and coefficients)
@@ -17,30 +17,21 @@ You'll learn to:
 - Evaluate model performance using the coefficient of determination (\(R^2\))
 - Get familiar with the `scikit-learn` workflow to train and evaluate models
 
----
+______________________________________________________________________
 
 ???+ info
 
     This chapter adapts and expands upon:
 
-    ^^scikit-learn: *Ordinary Least Squares and Ridge Regression*[^1]^^
-
-    ^^scikit-learn: *Linear Models*[^2]^^ 
-
-    ^^scikit-learn: *Metrics and scoring: quantifying the quality of predictions*[^3]^^
-
-    [^1]:
-        [https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols_ridge.html](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols_ridge.html)
-    [^2]:
-        [https://scikit-learn.org/stable/modules/linear_model.html](https://scikit-learn.org/stable/modules/linear_model.html)
-    [^3]:
-        [https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination](https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination)
+    - ^^scikit-learn: *[Ordinary Least Squares and Ridge Regression](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols_ridge.html)*^^
+    - ^^scikit-learn: *[Linear Models](https://scikit-learn.org/stable/modules/linear_model.html)*^^ 
+    - ^^scikit-learn: *[Metrics and scoring: quantifying the quality of predictions](https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination)*^^
 
 ## Theory
 
-Linear regression, also known as Ordinary Least Squares (OLS), models the 
-relationship between a continuous target variable \(y\) and one or more input 
-variables \(X\). The goal is to find the best linear function that predicts 
+Linear regression, also known as Ordinary Least Squares (OLS), models the
+relationship between a continuous target variable \(y\) and one or more input
+variables \(X\). The goal is to find the best linear function that predicts
 \(\hat{y}\) from \(X\).
 
 ???+ defi "Linear combination"
@@ -50,15 +41,15 @@ variables \(X\). The goal is to find the best linear function that predicts
     \]
 
     where:
-    
+
     - \(w_0\) is the **intercept** (bias term)
     - \(w_1, w_2, ..., w_n\) are the **coefficients** (weights)
     - \(x_1, x_2, ..., x_n\) are the input features
 
-The term "Ordinary Least Squares" refers to the optimization objective, 
-finding the weights \(w_0, w_1, ..., w_n\) that minimize the sum of squared 
-differences called residuals between the actual values \(y\) and predicted 
-values \(\hat{y}\).
+The term "Ordinary Least Squares" refers to the optimization objective, finding
+the weights \(w_0, w_1, ..., w_n\) that minimize the sum of squared differences
+called residuals between the actual values \(y\) and predicted values
+\(\hat{y}\).
 
 ???+ defi "Cost function"
 
@@ -68,28 +59,27 @@ values \(\hat{y}\).
 
     where \(n\) is the number of observations.
 
-This minimization ensures that our model makes the smallest possible errors 
-on average when predicting the training data. Let's look at an example.
+This minimization ensures that our model makes the smallest possible errors on
+average when predicting the training data. Let's look at an example.
 
 ## Example
 
-`scikit-learn` provides a couple of data sets for download. To fit a linear 
+`scikit-learn` provides a couple of data sets for download. To fit a linear
 regression on a real-world example, we choose the California housing data set.
-More information about the California Housing data set can be found 
+More information about the California Housing data set can be found
 [here](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset).
 
 ???+ info
 
     Data reference:
 
-    ^^Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions, 
-    Statistics and Probability Letters, 33:291-297, 1997^^
+    ^^Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions, Statistics
+    and Probability Letters, 33:291-297, 1997^^
 
-Our objective is to model the target variable \(y\) using input variables 
-\(X\). In this case, \(y\) corresponds to the median house value, expressed in 
-hundreds of thousands of dollars ($100,000).
-Below figure shows all houses in California colored by their median value 
-\(y\).
+Our objective is to model the target variable \(y\) using input variables
+\(X\). In this case, \(y\) corresponds to the median house value, expressed in
+hundreds of thousands of dollars ($100,000). Below figure shows all houses in
+California colored by their median value \(y\).
 
 <figure markdown="span">
     <img 
@@ -114,36 +104,37 @@ X, y = fetch_california_housing(return_X_y=True, as_frame=True)
 print(X.head())
 ```
 
-Conveniently, by setting `#!python return_X_y=True`, the function splits the 
+Conveniently, by setting `#!python return_X_y=True`, the function splits the
 input variables \(X\) and the target \(y\). Note, \(X\) contains multiple input
-variables such as such as house age *HouseAge* and average bedrooms *AveBedrms*.
-However, some are not needed.
+variables such as such as house age *HouseAge* and average bedrooms
+*AveBedrms*. However, some are not needed.
 
 ???+ question
 
-    The data frame `X` contains the variables `#!python "Latitude"` and 
-    `#!python "Longitude"`, remove them from the data frame. Tip: Consult the 
-    pandas [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html).
+    The data frame `X` contains the variables `#!python "Latitude"` and
+    `#!python "Longitude"`, remove them from the data frame. Tip: Consult the
+    pandas
+    [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html).
 
 ### Split the data
 
-Before training our OLS model, we need to split our data into two distinct sets:
+Before training our OLS model, we need to split our data into two distinct
+sets:
 
-- **Training set**: Used to fit the model and learn the optimal weights 
-    (80% of data)
-- **Test set**: Used to evaluate the model's performance on unseen data 
-    (20% of data)
+- **Training set**: Used to fit the model and learn the optimal weights (80% of
+    data)
+- **Test set**: Used to evaluate the model's performance on unseen data (20% of
+    data)
 
-Think of this like preparing for an exam: you study from practice problems 
-(training set) and then test your knowledge with new questions (test set). 
-This separation allows us to assess whether our model can accurately predict 
-house prices it hasn't seen before, rather than just memorizing the training 
-data.
+Think of this like preparing for an exam: you study from practice problems
+(training set) and then test your knowledge with new questions (test set). This
+separation allows us to assess whether our model can accurately predict house
+prices it hasn't seen before, rather than just memorizing the training data.
 
 ???+ info
 
-    The 80/20 split is a common convention, but not a strict rule. Depending on
-    the data set size, other split ratios might be a better fit.
+    The 80/20 split is a common convention, but not a strict rule. Depending on the
+    data set size, other split ratios might be a better fit.
 
 ```python
 from sklearn.model_selection import train_test_split
@@ -156,28 +147,28 @@ X_train, X_test, y_train, y_test = train_test_split(
 Let's break the code snippet down:
 
 1. `train_test_split()` takes the complete data set (`X` and `y`) as input
-2. Splits off 20% for testing (`#!python test_size=0.2`)
-3. Randomly shuffles the data (`#!python shuffle=True`) to remove any inherent
+1. Splits off 20% for testing (`#!python test_size=0.2`)
+1. Randomly shuffles the data (`#!python shuffle=True`) to remove any inherent
     ordering
-4. Set a seed (`#!python random_state=42`) which ensures the same
-    outcome every time the code snippet is executed. Since the shuffle 
-    operation is stochastic, we aim for reproducibility.
+1. Set a seed (`#!python random_state=42`) which ensures the same outcome every
+    time the code snippet is executed. Since the shuffle operation is
+    stochastic, we aim for reproducibility.
 
 ???+ info "Why shuffle?"
-    
+
     Some data sets may have inherent order (e.g., the houses could be sorted by
-    location). Shuffling ensures that both training and test sets are 
+    location). Shuffling ensures that both training and test sets are
     representative of the entire data distribution.
 
-After splitting, we put our test data (`X_test` and `y_test`) aside and use it 
+After splitting, we put our test data (`X_test` and `y_test`) aside and use it
 at the very end to measure the model's performance.
 
 ### Intuition
 
-For the first OLS model, we use a single input variable \(X\) as it allows us 
-to easily visualize and interpret the results. The choice falls on the median 
-income at the house location, referred to as *MedInc*. 
-Visualize the target and input variable in a scatter plot:
+For the first OLS model, we use a single input variable \(X\) as it allows us
+to easily visualize and interpret the results. The choice falls on the median
+income at the house location, referred to as *MedInc*. Visualize the target and
+input variable in a scatter plot:
 
 ```python
 import matplotlib.pyplot as plt
@@ -194,41 +185,41 @@ plt.show()
 
 <div class="grid cards" markdown>
 
--   __Scatter Plot__ 
-
-    --- 
-    
-    Looking at the scatter plot, you might intuitively imagine drawing a straight 
-    line through the points that best captures the trend. This intuition is 
-    exactly what OLS does mathematically, it finds the optimal line that minimizes 
-    the distance between the line and all data points. :point_down:
-
--   <figure markdown="span">
-        <img 
-            src="/assets/data-science/algorithms/regression/scatter-dark.png#only-dark"
-        >
-        <img 
-            src="/assets/data-science/algorithms/regression/scatter-light.png#only-light"
-        >
+- __Scatter Plot__
+
+    ______________________________________________________________________
+
+    Looking at the scatter plot, you might intuitively imagine drawing a straight
+    line through the points that best captures the trend. This intuition is
+    exactly what OLS does mathematically, it finds the optimal line that
+    minimizes the distance between the line and all data points. :point_down:
+
+- <figure markdown="span">
+    <img
+    src="/assets/data-science/algorithms/regression/scatter-dark.png#only-dark"
+    >
+    <img
+    src="/assets/data-science/algorithms/regression/scatter-light.png#only-light"
+    >
     </figure>
 
--   <figure markdown="span">
-        <img 
-            src="/assets/data-science/algorithms/regression/regression-dark.png#only-dark"
-        >
-        <img 
-            src="/assets/data-science/algorithms/regression/regression-light.png#only-light"
-        >
+- <figure markdown="span">
+    <img
+    src="/assets/data-science/algorithms/regression/regression-dark.png#only-dark"
+    >
+    <img
+    src="/assets/data-science/algorithms/regression/regression-light.png#only-light"
+    >
     </figure>
 
--   __Best-Fit Line__
+- __Best-Fit Line__
+
+    ______________________________________________________________________
 
-    ---
+    The OLS model finds the line that minimizes the sum of squared residuals, the
+    vertical distances between each point and the line. Recall from the theory
+    section that this is exactly what the cost function measures:
 
-    The OLS model finds the line that minimizes the sum of squared residuals,
-    the vertical distances between each point and the line. Recall from the 
-    theory section that this is exactly what the cost function measures:
-    
     \[
     \text{min} \quad \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
     \]
@@ -237,8 +228,8 @@ plt.show()
 
 ### Train the model
 
-Our next step is to train an OLS model to automatically find this "best-fit" 
-line. Remember, since we have one input variable, the linear combination 
+Our next step is to train an OLS model to automatically find this "best-fit"
+line. Remember, since we have one input variable, the linear combination
 simplifies to:
 
 \[
@@ -260,8 +251,8 @@ from sklearn.linear_model import LinearRegression
 model = LinearRegression()
 ```
 
-At this point, the model is not trained, however that can be easily done
-using the `fit()` method. Remember, to use the training set 
+At this point, the model is not trained, however that can be easily done using
+the `fit()` method. Remember, to use the training set
 
 ```python
 model.fit(X=X_train[["MedInc"]], y=y_train)
@@ -269,7 +260,7 @@ model.fit(X=X_train[["MedInc"]], y=y_train)
 
 #### Intercept and coefficient
 
-After training, we can inspect the model's learned parameters. The intercept 
+After training, we can inspect the model's learned parameters. The intercept
 and coefficient that define the best-fit line:
 
 ```python
@@ -290,15 +281,15 @@ These values tell us that our linear model is:
 
 **Interpretation:**
 
-- **Intercept (0.4446)**: The baseline house value (when *MedInc* is zero) 
-    ~ $44,460
+- **Intercept (0.4446)**: The baseline house value (when *MedInc* is zero)
+    is around $44,460
 - **Coefficient (0.4193)**: For each unit increase in *MedInc*, the house value
     increases by ~ $41,930
 
 ### Predictions
 
-Now that the model is trained, we can predict house prices for new observations. 
-Let's predict the price \(\hat{y}\) for a house in an area where 
+Now that the model is trained, we can predict house prices for new
+observations. Let's predict the price \(\hat{y}\) for a house in an area where
 *MedInc* is `#!python 3.5`:
 
 ```python
@@ -319,7 +310,8 @@ The model predicts a house value of approximately **$191,230**.
 
 #### Manual validation
 
-We can verify this prediction using our linear equation. Substituting \(x_1 = 3.5\):
+We can verify this prediction using our linear equation. Substituting
+\(x_1 = 3.5\):
 
 \[
 \begin{align}
@@ -332,21 +324,21 @@ This matches our model's prediction!
 
 ???+ question "Practice: Make your own prediction"
 
-    Calculate the predicted house price for an area where *MedInc* is 
+    Calculate the predicted house price for an area where *MedInc* is
     `#!python 5.0`.
-    
+
     1. Use `#!python model.predict()` to get the prediction.
-    2. Validate it by hand using the linear equation.
-    3. Do the results match?
+    1. Validate it by hand using the linear equation.
+    1. Do the results match?
 
 ### Evaluate the model
 
-Now we can make predictions, but we don't know how accurate they actually are. 
-We need to quantify the model's performance to determine if it generalizes 
-well to new, unseen data.
+Now we can make predictions, but we don't know how accurate they actually are.
+We need to quantify the model's performance to determine if it generalizes well
+to new, unseen data.
 
-Remember we set aside our test set earlier? This is where we use it. By 
-evaluating on data the model hasn't seen during training, we get an honest 
+Remember we set aside our test set earlier? This is where we use it. By
+evaluating on data the model hasn't seen during training, we get an honest
 assessment of its predictive power.
 
 To measure the model's performance, we'll use the coefficient of determination.
@@ -357,7 +349,7 @@ To measure the model's performance, we'll use the coefficient of determination.
 
     This section focuses on the definition implemented by `scikit-learn`.
 
-The coefficient of determination, known as the \(R^2\) score, measures the 
+The coefficient of determination, known as the \(R^2\) score, measures the
 proportion of variance in the target variable that is explained by the model.
 
 ???+ defi "\(R^2\) Score"
@@ -391,48 +383,47 @@ r2 = r2_score(y_true=y_test, y_pred=y_pred)
 print(f"R² Score: {round(r2, 4)}")
 ```
 
-``` title=">>> Output"
+```title=">>> Output"
 R² Score: 0.4589
 ```
 
 ???+ tip "Understanding \(R^2\)"
 
-    An \(R^2\) score of 0.4589 means the model explains 45.89% of the variance 
-    in house prices using only median income. While this is informative, it's
-    not great. It suggests that other factors (location, house size, etc.) 
+    An \(R^2\) score of 0.4589 means the model explains 45.89% of the variance in
+    house prices using only median income. While this is informative, it's not
+    great. It suggests that other factors (location, house size, etc.)
     significantly influence house prices.
 
 ???+ question "Find a better model"
 
-    Can you improve the \(R^2\) score? Fit new models and experiment with the 
+    Can you improve the \(R^2\) score? Fit new models and experiment with the
     following:
 
     **Model variations:**
-    
-    - Use different individual input variables (e.g., *HouseAge*, *AveRooms*, 
+
+    - Use different individual input variables (e.g., *HouseAge*, *AveRooms*,
         *AveBedrms*)
     - Use a combination of multiple input variables
     - Compare single-variable vs. multi-variable models
-    
+
     **Data preparation:**
-    
+
     - Adjust the train-test split ratio
     - Remember to use `#!python random_state` for reproducibility
-    
+
     **Analysis:**
-    
+
     - Calculate and compare \(R^2\) scores for each model
     - Inspect the intercept and coefficients for multi-variable models
     - Make predictions with your best-performing model
     - Manually verify one prediction using the linear equation
-    
-    Which combination gives you the highest \(R^2\) score? What does this 
-    tell you about which features are most important for predicting house 
-    prices?
+
+    Which combination gives you the highest \(R^2\) score? What does this tell you
+    about which features are most important for predicting house prices?
 
 ## Detour: Model workflow
 
-The workflow you practiced here forms the foundation for all supervised 
+The workflow you practiced here forms the foundation for all supervised
 learning algorithms in `scikit-learn`:
 
 ```python
@@ -452,17 +443,17 @@ y_pred = model.predict(X_test)
 score = model.score(y_test, y_pred)
 ```
 
-This consistent pattern applies to all upcoming chapters, whether you're 
+This consistent pattern applies to all upcoming chapters, whether you're
 building regression or classification models.
 
 ## Recap
 
-In this chapter, you learned the fundamentals of linear regression through a 
+In this chapter, you learned the fundamentals of linear regression through a
 practical example. The key takeaways:
 
-- **Linear regression** models the relationship between input variables and a 
+- **Linear regression** models the relationship between input variables and a
     target variable using a linear combination. Find the best-fit line by
     minimizing the sum of squared residuals.
-- **\(R^2\) score** quantifies how well the model explains variance in the 
+- **\(R^2\) score** quantifies how well the model explains variance in the
     target variable
 - **`scikit-learn` workflow** allows to easily train and evaluate model
diff --git a/docs/data-science/algorithms/supervised/tree-based/cart.md b/docs/data-science/algorithms/supervised/tree-based/cart.md
index fb53e97d..a44a7101 100644
--- a/docs/data-science/algorithms/supervised/tree-based/cart.md
+++ b/docs/data-science/algorithms/supervised/tree-based/cart.md
@@ -1,15 +1,15 @@
 # Decision Tree
 
-So far we have covered linear regression and logistic regression which are 
-limited to linear relationships. In contrast, decision trees are non-linear 
-models able to capture complex relationships in the data. They are easy to 
+So far we have covered linear regression and logistic regression which are
+limited to linear relationships. In contrast, decision trees are non-linear
+models able to capture complex relationships in the data. They are easy to
 interpret and visualize, making them a popular choice for many applications.
 
 Moreover, decision trees can be used for both regression ^^*and*^^
 classification!
 
 In this chapter, we will explore the theory behind decision trees followed by
-practical examples. As always we will use `scikit-learn` for hands-on 
+practical examples. As always we will use `scikit-learn` for hands-on
 experience.
 
 ## Basic intuition
@@ -35,13 +35,12 @@ graph TD
 Depending on the answers, you can decide whether to go skiing or not.
 
 A decision tree resembles a flowchart where each internal node represents a
-decision based on a feature (e.g., Is there any snow?), each branch represents 
-the outcome of that decision, and each leaf node represents a final 
-prediction (either a class label for classification or a continuous value 
-for regression). 
+decision based on a feature (e.g., Is there any snow?), each branch represents
+the outcome of that decision, and each leaf node represents a final prediction
+(either a class label for classification or a continuous value for regression).
 
-To get a better understanding of the terms node, branch and leaf, consider 
-the illustration of a (rotated) tree.
+To get a better understanding of the terms node, branch and leaf, consider the
+illustration of a (rotated) tree.
 
 <figure markdown="span">
     ![Decision tree illustration](../../../../assets/data-science/algorithms/tree-based/tree.png)
@@ -50,9 +49,9 @@ the illustration of a (rotated) tree.
     </figcaption>
 </figure>
 
-In the skiing example, the nodes are the questions you ask yourself. With 
-branches being a simple binary split (the answers to the question).
-The leaf nodes are the final predictions, in our case whether to go skiing.
+In the skiing example, the nodes are the questions you ask yourself. With
+branches being a simple binary split (the answers to the question). The leaf
+nodes are the final predictions, in our case whether to go skiing.
 
 <quiz>
 Given the skiing decision tree, what kind of supervised learning task is this?
@@ -77,129 +76,125 @@ which is a classic binary classification task.
 
 ???+ info
 
-    This theoretical section on decision trees follows: ^^Christopher M. 
-    Bishop. 2006. *Pattern Recognition and Machine Learning*[^1]^^
-    
-    We focus on a particular algorithm called CART 
-    (=**C**lassification **A**nd **R**egression **T**rees).
-    The theoretical foundations of CART were developed by:
-    ^^Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. 1984.
+    This theoretical section on decision trees follows: ^^Christopher M. Bishop.
+    2006\. *Pattern Recognition and Machine Learning*[^1]^^
+
+    We focus on a particular algorithm called CART (=**C**lassification **A**nd
+    **R**egression **T**rees). The theoretical foundations of CART were developed
+    by: ^^Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. 1984.
     *Classification and Regression Trees*[^2]^^
-    
-    [^1]:
-        Christopher M. Bishop. Pattern Recognition and Machine Learning. 
-        Springer, 2006. [Link](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf)
-    [^2]:
-        Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. 
-        Classification and Regression Trees. Chapman and Hall/CRC, 1984.
-        [https://doi.org/10.1201/9781315139470](https://doi.org/10.1201/9781315139470)
 
----
+    [^1]: Christopher M. Bishop. Pattern Recognition and Machine Learning.
+    Springer, 2006.
+    [Link](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf)
+    [^2]: Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone.
+    Classification and Regression Trees. Chapman and Hall/CRC, 1984.
+    [https://doi.org/10.1201/9781315139470](https://doi.org/10.1201/9781315139470)
+
+______________________________________________________________________
 
 When building a decision tree a couple of questions arise:
 
 <div class="grid cards" markdown>
 
--   :fontawesome-solid-question:{ .lg .middle } __Question__
+- :fontawesome-solid-question:{ .lg .middle } __Question__
 
-    ---
+    ______________________________________________________________________
 
     1. How do we pick the right feature for a split?
-    2. What's the decision criteria at each node?
-    3. How large do we grow the tree?
-
+    1. What's the decision criteria at each node?
+    1. How large do we grow the tree?
 
--   :fontawesome-solid-lightbulb:{ .lg .middle } __Intuition__
+- :fontawesome-solid-lightbulb:{ .lg .middle } __Intuition__
 
-    ---
+    ______________________________________________________________________
 
-    1. Which questions do we ask? Why did we ask "Can I 
-       get to a skiing resort?" and "Is there any snow?"?
-    2. It does not have to be a simple yes/no question. It can be a
-       threshold for continuous values as well. E.g., "Is there more than 
-       10cm of fresh snow?" But how do we choose the threshold?
-    3. How many questions do we ask? Why only 2 and not more?
+    1. Which questions do we ask? Why did we ask "Can I get to a skiing resort?"
+        and "Is there any snow?"?
+    1. It does not have to be a simple yes/no question. It can be a threshold for
+        continuous values as well. E.g., "Is there more than 10cm of fresh
+        snow?" But how do we choose the threshold?
+    1. How many questions do we ask? Why only 2 and not more?
 
 </div>
 
-With these questions in mind, let's dive into the theory of decision trees 
-in order to tackle them.
+With these questions in mind, let's dive into the theory of decision trees in
+order to tackle them.
 
----
+______________________________________________________________________
 
 ### Greedy optimization
 
 As a decision tree is a supervised learning algorithm, the goal is to predict
 the target variable \(y\) with a set of features \(x_1, x_2, ..., x_n\).
 
-With the data at hand, the CART algorithm finds the optimal tree 
-structure that minimizes the prediction error. In turn, the 
-optimal tree structure depends on the chosen splits. 
+With the data at hand, the CART algorithm finds the optimal tree structure that
+minimizes the prediction error. In turn, the optimal tree structure depends on
+the chosen splits.
 
 ???+ info
-    
+
     A split in CART is a binary decision rule that divides the dataset into two
     subsets based on a specific feature and threshold.
 
-    Imagine if we extend our skiing example with the split "Is there more than 
-    10cm of fresh snow?". The split divides the data into two subsets: one 
-    where observations have more than 10cm of fresh snow and another where
-    observations don't. With *amount of fresh snow* being the feature and *10cm* 
-    the threshold.
+    Imagine if we extend our skiing example with the split "Is there more than 10cm
+    of fresh snow?". The split divides the data into two subsets: one where
+    observations have more than 10cm of fresh snow and another where observations
+    don't. With *amount of fresh snow* being the feature and *10cm* the threshold.
 
-However, given large data sets, there are simply too many splitting 
-possibilities to consider at once. Hence, the tree is grown in a greedy fashion.
+However, given large data sets, there are simply too many splitting
+possibilities to consider at once. Hence, the tree is grown in a greedy
+fashion.
 
-The greedy optimization starts with a single root node splitting the data 
-into two partitions and adds additional nodes one at a time. At each step, the
+The greedy optimization starts with a single root node splitting the data into
+two partitions and adds additional nodes one at a time. At each step, the
 algorithm chooses a split using exhaustive search. The best split is determined
-by a criterion. Remember, that decision trees can deal with regression and 
+by a criterion. Remember, that decision trees can deal with regression and
 classification problems. Hence, the criterion differs for the two tasks.
 
----
+______________________________________________________________________
 
 #### Regression
 
-For regression trees, the best split (feature threshold combination) at each 
-node is determined by minimizing the *residual sum-of-squares error (RSS)*, 
+For regression trees, the best split (feature threshold combination) at each
+node is determined by minimizing the *residual sum-of-squares error (RSS)*,
 defined as:
 
 ???+ defi "Residual sum-of-squares (RSS)"
 
-    \[ 
-        RSS = \sum_{i \in t_L} (y_i - \bar{y}_L)^2 + \sum_{i \in t_R} (y_i -
-        \bar{y}_R)^2 
+    \[
+    RSS = \sum_{i \in t_L} (y_i - \bar{y}_L)^2 + \sum_{i \in t_R} (y_i -
+                \bar{y}_R)^2
     \]
 
 where \(t_L\) and \(t_R\) are the left and right child nodes after the split,
 and \(\bar{y}_L\) and \(\bar{y}_R\) are the mean target values in the
 respective nodes.
 
-The algorithm searches through all possible splits to find the one that 
+The algorithm searches through all possible splits to find the one that
 minimizes this RSS criterion.
 
 ???+ info
 
-    Since each split separates the input data into two partitions, the
-    prediction is the mean of the target variable \(y\) in the respective 
-    partition.
-    
-    Hence, intuitively speaking, we do not optimize the entire tree at once 
-    but rather optimize each split locally.
+    Since each split separates the input data into two partitions, the prediction
+    is the mean of the target variable \(y\) in the respective partition.
+
+    Hence, intuitively speaking, we do not optimize the entire tree at once but
+    rather optimize each split locally.
 
 #### Classification
 
-For classification tasks, the best split at each node is determined by minimizing 
-the *Gini impurity*. 
+For classification tasks, the best split at each node is determined by
+minimizing the *Gini impurity*.
 
 ???+ defi "Gini impurity"
 
     For a node \(t\) with \(K\) classes, the Gini impurity is defined as:
 
     \[
-       Gini(t) = \sum_{k=1}^K p_{k}(1-p_{k}) = 1 - \sum_{k=1}^K p_{k}^2
+    Gini(t) = \sum_{k=1}^K p_{k}(1-p_{k}) = 1 - \sum_{k=1}^K p_{k}^2
     \]
-    
+
     where \(p_k\) is the proportion of class \(k\) observations.
 
 The Gini impurity (sometimes referred to as Gini index) encourages leaf nodes
@@ -207,64 +202,63 @@ where the majority of observations belong to a single class.
 
 ???+ info
 
-    The prediction at each leaf node is the majority class among the training 
+    The prediction at each leaf node is the majority class among the training
     observations in that node.
 
----
+______________________________________________________________________
 
 #### TLDR
 
-No matter the task (regression or classification), with a greedy optimization 
-strategy, the CART algorithm searches for the best split using an exhaustive 
+No matter the task (regression or classification), with a greedy optimization
+strategy, the CART algorithm searches for the best split using an exhaustive
 search at each node to ultimately minimize the prediction error. Thus answering
-the first two questions, *a* (How do we pick the right feature for a split?) 
+the first two questions, *a* (How do we pick the right feature for a split?)
 and *b* (What's the decision criteria at each node?).
 
-A CART can be seen as a piecewise-constant model, as it partitions the feature 
-space into regions and assigns a constant prediction (either the mean of a 
+A CART can be seen as a piecewise-constant model, as it partitions the feature
+space into regions and assigns a constant prediction (either the mean of a
 continuous value or a label) to each region.
 
 ### Tree size
 
-Lastly, we answer question, *c* (How large do we grow the tree?).
-Put differently, when should we stop adding nodes? 
+Lastly, we answer question, *c* (How large do we grow the tree?). Put
+differently, when should we stop adding nodes?
 
-First, the tree is grown as large as possible until a stopping criterion is 
-met. This criterion can be the maximum tree depth or a minimum number of 
-observations per leaf. Second, the tree is pruned back. Pruning is the process 
-of removing nodes that do not improve the model's performance. It balances the 
+First, the tree is grown as large as possible until a stopping criterion is
+met. This criterion can be the maximum tree depth or a minimum number of
+observations per leaf. Second, the tree is pruned back. Pruning is the process
+of removing nodes that do not improve the model's performance. It balances the
 RSS error or Gini impurity against model complexity.
 
 ???+ info
 
-    If you want to dive deeper into tree pruning, we recommend reading page 665
-    of Bishop's book *Pattern Recognition and Machine Learning*[^1]
+    If you want to dive deeper into tree pruning, we recommend reading page 665 of
+    Bishop's book *Pattern Recognition and Machine Learning*[^1]
 
----
+______________________________________________________________________
 
 ## Advantages and Limitations
 
-Decision trees offer several significant advantages, but they also have their 
+Decision trees offer several significant advantages, but they also have their
 limitations:
 
 <div class="grid cards" markdown>
 
--   :fontawesome-regular-thumbs-up:{ .lg .middle } __Advantages__
+- :fontawesome-regular-thumbs-up:{ .lg .middle } __Advantages__
 
-    ---
+    ______________________________________________________________________
 
     - Easy to interpret and visualize
     - Can capture non-linear relationships
 
+- :fontawesome-regular-thumbs-down:{ .lg .middle } __Limitations__
 
--   :fontawesome-regular-thumbs-down:{ .lg .middle } __Limitations__
-
-    ---
+    ______________________________________________________________________
 
-    - Prone to overfitting, i.e., building a model that perfectly fits the 
-      training data but fails to generalize on new (unseen) data.
-    - Sensitive to data, i.e., small changes in the data can lead to 
-      significantly different trees.
+    - Prone to overfitting, i.e., building a model that perfectly fits the
+        training data but fails to generalize on new (unseen) data.
+    - Sensitive to data, i.e., small changes in the data can lead to
+        significantly different trees.
 
 </div>
 
@@ -276,14 +270,14 @@ As mentioned earlier, we will use `scikit-learn` for hands-on experience.
 [^3]:
     `scikit-learn` documentation: [Decision Trees](https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart)
 
-Functionalities around decision trees are all part of the 
+Functionalities around decision trees are all part of the
 [`tree` module](https://scikit-learn.org/stable/api/sklearn.tree.html) in
 `scikit-learn`.
 
 ### Regression
 
-First, we start with a regression task. We will use the California housing
-data to predict house prices using a decision tree regressor.
+First, we start with a regression task. We will use the California housing data
+to predict house prices using a decision tree regressor.
 
 #### Load data
 
@@ -303,7 +297,7 @@ X_train, X_test, y_train, y_test = train_test_split(
 )
 ```
 
-As always, a seed is set for reproducibility (`#!python random_state=42`). It 
+As always, a seed is set for reproducibility (`#!python random_state=42`). It
 can be any integer, you can simply pick any number.
 
 #### Fit and evaluate the model
@@ -330,17 +324,17 @@ print(f"Model performance (R²): {round(score, 2)}")
 Model performance (R²): 0.61
 ```
 
-The `score()` method returns the coefficient of determination \(R^2\). 
-You should be already familiar with \(R^2\), as it was first introduced 
-in the [Regression chapter](../regression.md#coefficient-of-determination) to 
-evaluate the fit of a linear regression.
+The `score()` method returns the coefficient of determination \(R^2\). You
+should be already familiar with \(R^2\), as it was first introduced in the
+[Regression chapter](../regression.md#coefficient-of-determination) to evaluate
+the fit of a linear regression.
 
-The decision tree model achieved an \(R^2\) of 0.61 on the test set, which 
+The decision tree model achieved an \(R^2\) of 0.61 on the test set, which
 leaves room for improvement.
 
 ???+ info
 
-    On a side note: Although we fitted a decision tree on `#!python 16512` 
+    On a side note: Although we fitted a decision tree on `#!python 16512`
     observations, the process of actually training the model is quite fast!
 
 #### Plot the tree
@@ -352,8 +346,8 @@ We can easily visualize the tree using the `plot_tree` function.
 
 ???+ tip
 
-    This is the first time that we discourage you from running the code 
-    snippet below. Soon you will know why.
+    This is the first time that we discourage you from running the code snippet
+    below. Soon you will know why.
 
 ```python
 import matplotlib.pyplot as plt
@@ -369,14 +363,14 @@ plt.show()  # use matplotlib to show the plot
     </figcaption>
 </figure>
 
-Though we can't read any of the information present, the plot hints at a huge 
+Though we can't read any of the information present, the plot hints at a huge
 tree. Due to its complexity, the model does not add much value to the
 understanding of the data (it's simply not interpretable).
 
-Actually visualizing this particular tree takes some time, hence we 
-discouraged you from executing the code.
+Actually visualizing this particular tree takes some time, hence we discouraged
+you from executing the code.
 
-But why do we get such a huge tree? By default, the CART implementation in 
+But why do we get such a huge tree? By default, the CART implementation in
 `scikit-learn` grows the tree as large as possible and does *not* prune it.
 
 ##### ... to fix
@@ -395,9 +389,9 @@ model = DecisionTreeRegressor(
 model.fit(X_train, y_train)
 ```
 
-The `max_depth` parameter limits the depth of the tree, while `min_samples_leaf`
-sets the minimum number of samples (observations) required to be in a leaf 
-node. Both prevent the tree from growing too large.
+The `max_depth` parameter limits the depth of the tree, while
+`min_samples_leaf` sets the minimum number of samples (observations) required
+to be in a leaf node. Both prevent the tree from growing too large.
 
 ???+ info
 
@@ -412,28 +406,28 @@ import matplotlib.pyplot as plt
 from sklearn.tree import plot_tree
 
 plot_tree(
-    model, 
-    filled=True,   # (1)!
+    model,
+    filled=True,  # (1)!
     feature_names=X.columns,  # (2)!
-    proportion=True  # (3)!
+    proportion=True,  # (3)!
 )
 plt.show()
 ```
 
-1. `#!python filled=True` colors nodes according to prediction values. 
-   A stronger color indicating a higher value.
-2. The parameter `feature_names` is used to label the features in the tree.
-3. `proportion=True` displays the proportion of samples in each node.
+1. `#!python filled=True` colors nodes according to prediction values. A
+    stronger color indicating a higher value.
+1. The parameter `feature_names` is used to label the features in the tree.
+1. `proportion=True` displays the proportion of samples in each node.
 
 ???+ info
-    
-    Generally, it is always good practice to consult the documentation, if 
-    you are unsure about the usage of a function/class.
 
-    Regarding `plot_tree()`, you might find some useful information in the 
+    Generally, it is always good practice to consult the documentation, if you are
+    unsure about the usage of a function/class.
+
+    Regarding `plot_tree()`, you might find some useful information in the
     [docs](https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html)
-    that can help you customize the plot to your liking.
-    So don't shy away from reading the documentation!
+    that can help you customize the plot to your liking. So don't shy away from
+    reading the documentation!
 
 <figure markdown="span">
     ![A small tree](../../../../assets/data-science/algorithms/tree-based/small-tree.png)
@@ -446,25 +440,23 @@ plt.show()
 ???+ tip
 
     The nodes are quite easy to read:
-    
-    Starting with the root node, the feature `MedInc` performs 
-    the first split. If the median income is less than 5.086, we follow the 
-    left branch else the right branch. The resulting `squared_error` of the 
-    split is shown as well. At the root node, the `squared_error` (sum of the 
-    squared differences between the actual values and the predicted value)
-    is 1.337. The lower the `squared_error`, the better the split. A "perfect
-    split" would result in a `squared_error` of 0.
-
-    The root node splits the data into two subsets, the left branch results 
-    in a subest containing 79.3% of the training data and the right branch 
-    20.7%. Compared to the root node, both additional splits lead to a 
-    decrease of the `squared_error` and thus increase the predictive power. 
-    After two more splits, we reach the leaf nodes. Each leaf node contains 
-    a value, the final prediction.
-
-Now we have a pruned tree, which reduced the risk of overfitting. However, at 
-the cost of model performance. The \(R^2\) decreased from 0.61 to 0.42 which 
-might indicate that such a simple tree might not capture the complexity of the 
+
+    Starting with the root node, the feature `MedInc` performs the first split. If
+    the median income is less than 5.086, we follow the left branch else the right
+    branch. The resulting `squared_error` of the split is shown as well. At the
+    root node, the `squared_error` (sum of the squared differences between the
+    actual values and the predicted value) is 1.337. The lower the `squared_error`,
+    the better the split. A "perfect split" would result in a `squared_error` of 0.
+
+    The root node splits the data into two subsets, the left branch results in a
+    subest containing 79.3% of the training data and the right branch 20.7%.
+    Compared to the root node, both additional splits lead to a decrease of the
+    `squared_error` and thus increase the predictive power. After two more splits,
+    we reach the leaf nodes. Each leaf node contains a value, the final prediction.
+
+Now we have a pruned tree, which reduced the risk of overfitting. However, at
+the cost of model performance. The \(R^2\) decreased from 0.61 to 0.42 which
+might indicate that such a simple tree might not capture the complexity of the
 data well.
 
 <div style="text-align: center">
@@ -476,26 +468,26 @@ data well.
 </div>
 
 In practice, you have to find the right parameters to balance model complexity
-and performance. Unfortunately, there is no one-size-fits-all solution. You 
+and performance. Unfortunately, there is no one-size-fits-all solution. You
 have to tune the parameters based on the data and the task at hand.
 
 ???+ question "Parameter tuning"
 
-    Try some different combinations of `max_depth` and `min_samples_leaf`.
-    Use the same train test split, we defined earlier.
-    
+    Try some different combinations of `max_depth` and `min_samples_leaf`. Use the
+    same train test split, we defined earlier.
+
     1. Manually change the values.
-    2. Fit the model.
-    3. Evaluate the model.
-    4. Plot the model.
-    5. Repeat! :repeat:
+    1. Fit the model.
+    1. Evaluate the model.
+    1. Plot the model.
+    1. Repeat! :repeat:
 
     Can you get an \(R^2\) higher than `#!python 0.7`?
 
 ### Classification
 
-Next, we switch to a classification task. We will re-use the breast cancer 
-data set introduced in the previous Classification chapter.
+Next, we switch to a classification task. We will re-use the breast cancer data
+set introduced in the previous Classification chapter.
 
 #### Load data
 
@@ -510,7 +502,7 @@ X_train, X_test, y_train, y_test = train_test_split(
 
 #### Fit and evaluate the model
 
-For classification trees, `scikit-learn` provides the class 
+For classification trees, `scikit-learn` provides the class
 `DecisionTreeClassifier`.
 
 ```python hl_lines="1"
@@ -518,21 +510,23 @@ from sklearn.tree import DecisionTreeClassifier
 
 model = DecisionTreeClassifier(
     # again, set max_depth and min_samples_leaf to prevent growing a huge tree
-    random_state=784, max_depth=7, min_samples_leaf=5
+    random_state=784,
+    max_depth=7,
+    min_samples_leaf=5,
 )
 ```
 
 ???+ question "Fit and evaluate the model"
 
     Now it is your time to fit and evaluate the model. Although, you have never
-    used an instance of `DecisionClassifier` before, you can use the same 
-    methods as with other models in `scikit-learn`. Simply refer to the 
-    previous regression example.
-    
+    used an instance of `DecisionClassifier` before, you can use the same methods
+    as with other models in `scikit-learn`. Simply refer to the previous regression
+    example.
+
     1. Fit the model on `X_train` and `y_train`.
-    2. Evaluate the model on `X_test` and `y_test`.
-    3. Print the model's performance.
-    4. Plot the tree.
+    1. Evaluate the model on `X_test` and `y_test`.
+    1. Print the model's performance.
+    1. Plot the tree.
 
     Lastly answer following quiz question to evaluate your result.
 
@@ -550,14 +544,14 @@ from the logistic regression.
 
 ## Recap
 
-We comprehensively explored decision trees, focusing on the CART algorithm. 
-The theory section illuminated its core mechanisms, while practical 
-examples demonstrated building and evaluating decision trees for regression and
+We comprehensively explored decision trees, focusing on the CART algorithm. The
+theory section illuminated its core mechanisms, while practical examples
+demonstrated building and evaluating decision trees for regression and
 classification tasks. Key takeaways include:
 
 - Algorithm insights into tree construction
-- Practical implementation skills 
+- Practical implementation skills
 - Understanding of decision trees' interpretability and overfitting risks
 
-Next, we'll extend our knowledge to Random Forests, an ensemble method 
+Next, we'll extend our knowledge to Random Forests, an ensemble method
 combining multiple decision trees to enhance predictive performance.
diff --git a/docs/data-science/algorithms/supervised/tree-based/forest.md b/docs/data-science/algorithms/supervised/tree-based/forest.md
index 4e270cfa..2afe8c3f 100644
--- a/docs/data-science/algorithms/supervised/tree-based/forest.md
+++ b/docs/data-science/algorithms/supervised/tree-based/forest.md
@@ -12,65 +12,65 @@ CART (Classification and Regression Trees) algorithm, we can dive right in.
 
 ???+ info
 
-    Random forests were introduced by Leo Breiman in 2001. The following 
-    section closely follows the original paper.
+    Random forests were introduced by Leo Breiman in 2001. The following section
+    closely follows the original paper.
 
     ^^Breiman, L. Random Forests. *Machine Learning 45*, 5–32 (2001).^^
     [https://doi.org/10.1023/A:1010933404324](https://doi.org/10.1023/A:1010933404324)
 
 A random forest combines multiple decision trees to create an ensemble model.
-The idea is to grow multiple trees and average their predictions. Thus, 
+The idea is to grow multiple trees and average their predictions. Thus,
 resulting in a more robust model that improves generalization and reduces
 overfitting.
 
 The randomness in a random forest stems from two techniques:
 
 1. Bootstrap sampling
-2. Random feature selection
+1. Random feature selection
 
 ### Bootstrap sampling
 
-The first technique is known as **bootstrap sampling**. Given a
-training set of size $N$, we draw $N$ samples ==with replacement==. This means 
-that some samples may be repeated, while others may not be included at all. 
-This results in a new training set of the same size as the original, but with 
-some samples missing and others duplicated.
+The first technique is known as **bootstrap sampling**. Given a training set of
+size $N$, we draw $N$ samples ==with replacement==. This means that some
+samples may be repeated, while others may not be included at all. This results
+in a new training set of the same size as the original, but with some samples
+missing and others duplicated.
 
-Each tree is fit on a different bootstrap sample. Intuitively speaking, this 
+Each tree is fit on a different bootstrap sample. Intuitively speaking, this
 means that each tree sees a slightly different "version" of the training data.
 
 ### Random feature selection
 
-The second technique is **random feature selection**. 
-Remember, that a CART is grown by selecting the best split at each node.
-This is done by considering all features. Contrary when growing trees for a 
-random forest, we only consider a random subset of features at each split. 
+The second technique is **random feature selection**. Remember, that a CART is
+grown by selecting the best split at each node. This is done by considering all
+features. Contrary when growing trees for a random forest, we only consider a
+random subset of features at each split.
 
----
+______________________________________________________________________
 
 ### Putting it all together
 
 Each tree in a random forest is fit on a bootstrap sample and uses a random
-subset of features at each split.
-In case of regression, the predictions of all trees are simply averaged. In 
-case of classification, the majority vote is taken. The majority vote in a 
-random forest classification means that the class predicted most frequently by
-the individual trees is selected as the final prediction.
-
-No matter the task, classification or regression: it was observed that 
-introducing randomness in the tree-growing process improves the model 
+subset of features at each split. In case of regression, the predictions of all
+trees are simply averaged. In case of classification, the majority vote is
+taken. The majority vote in a random forest classification means that the class
+predicted most frequently by the individual trees is selected as the final
+prediction.
+
+No matter the task, classification or regression: it was observed that
+introducing randomness in the tree-growing process improves the model
 performance.
 
 ???+ info
 
-    Contrary to the classic CART, random forests do not constrain the tree 
-    growth. I.e., trees are fully grown and not pruned.
+    Contrary to the classic CART, random forests do not constrain the tree growth.
+    I.e., trees are fully grown and not pruned.
 
 ## Examples
 
-With a basic understanding of random forests we take a look at some 
-examples. As always, we'll use our favorite machine learning package 
-`scikit-learn` (at least that of the author :wink:).
+With a basic understanding of random forests we take a look at some examples.
+As always, we'll use our favorite machine learning package `scikit-learn` (at
+least that of the author :wink:).
 
 In order to focus on the random forest implementation and its parameters, we'll
 reuse the California housing data (for regression) and the breast cancer data
@@ -82,8 +82,8 @@ Let's start with building a random forest to predict California housing prices.
 
 #### Load data
 
-As usual, we load the data and split it into a training and test set in 
-order to evaluate the model later on.
+As usual, we load the data and split it into a training and test set in order
+to evaluate the model later on.
 
 ```python
 from sklearn.datasets import fetch_california_housing
@@ -98,8 +98,8 @@ X_train, X_test, y_train, y_test = train_test_split(
 
 #### Fit the model
 
-Just like with decision trees, `scikit-learn` provides two separate classes 
-for regression and classification, namely `RandomForestRegressor` and 
+Just like with decision trees, `scikit-learn` provides two separate classes for
+regression and classification, namely `RandomForestRegressor` and
 `RandomForestClassifier`. Both are part of the `ensemble` module.
 
 ```python
@@ -109,8 +109,8 @@ model = RandomForestRegressor(random_state=784)  # (1)!
 model.fit(X_train, y_train)
 ```
 
-1. As a random forest is well random :sweat_smile:, we set the 
-   `random_state` to ensure the reproducibility of our results.
+1. As a random forest is well random :sweat_smile:, we set the `random_state`
+    to ensure the reproducibility of our results.
 
 Depending on your setup, the fitting process might take a couple of seconds.
 
@@ -127,18 +127,18 @@ Model performance (R²): 0.81
 
 ???+ info
 
-    Remember, that the `score()` method of a decision tree regressor 
-    (`DecisionTreeRegressor`) returned the coefficient of determination 
-    \(R^2\). The same applies to random forests regressors.
+    Remember, that the `score()` method of a decision tree regressor
+    (`DecisionTreeRegressor`) returned the coefficient of determination \(R^2\).
+    The same applies to random forests regressors.
 
 Compared to a single tree with an \(R^2\) of 0.61, the random forest performs
-considerably better with an \(R^2\) of 0.81. You can re-visit the according 
+considerably better with an \(R^2\) of 0.81. You can re-visit the according
 section [here](cart.md#fit-and-evaluate-the-model).
 
 ???+ question "How many trees are in the forest?"
-    
-    Consult the `scikit-learn` docs to find out how many trees are in the 
-    forest by default. Use the following question for self-assessment.
+
+    Consult the `scikit-learn` docs to find out how many trees are in the forest by
+    default. Use the following question for self-assessment.
 
 <quiz>
 How many trees form a forest by default?
@@ -152,16 +152,13 @@ The parameter `n_estimators` defaults to 100 trees.
 
 ???+ info
 
-    If you want to get closer to the original definition of a random forest 
-    regressor by Breiman, you have to set the `max_features` parameter. 
-    Specifically, with \(m\) features, the number of features considered at 
-    each split should be \(\frac{m}{3}\) for regression.
+    If you want to get closer to the original definition of a random forest
+    regressor by Breiman, you have to set the `max_features` parameter.
+    Specifically, with \(m\) features, the number of features considered at each
+    split should be \(\frac{m}{3}\) for regression.
 
     ```python hl_lines="2"
-    RandomForestRegressor(
-        max_features=len(X_train.columns) // 3,
-        random_state=784
-    )
+    RandomForestRegressor(max_features=len(X_train.columns) // 3, random_state=784)
     ```
 
     By default, `scikit-learn` considers \(m\) features for each split.
@@ -169,9 +166,9 @@ The parameter `n_estimators` defaults to 100 trees.
 ???+ tip
 
     If you're unsure how to set parameters of a model (such as `max_features`),
-    stick to the defaults. `scikit-learn` provides sensible defaults 
-    that work well. In later chapters, we will explore methods to 
-    automatically tune these hyperparameters.
+    stick to the defaults. `scikit-learn` provides sensible defaults that work
+    well. In later chapters, we will explore methods to automatically tune these
+    hyperparameters.
 
 ### Classification
 
@@ -180,14 +177,14 @@ Next, we switch to a classification task.
 ???+ question
 
     Load the breast cancer data, fit and evaluate a random forest.
-    
+
     1. Load the data and split it into a training and test set.
-    2. Load the appropriate random forest class.
-    3. Fit the model.
-    4. Evaluate the model on the test set.
+    1. Load the appropriate random forest class.
+    1. Fit the model.
+    1. Evaluate the model on the test set.
 
-    Hint: This and the previous chapter should provide all necessary
-    information, to solve the tasks.
+    Hint: This and the previous chapter should provide all necessary information,
+    to solve the tasks.
 
 #### Inspecting the forest
 
@@ -211,24 +208,20 @@ print(model.estimators_)  # (1)!
 `estimators_` is a list of individual tree instances. If you're dealing with a
 `RandomForestRegressor`, `estimators_` is a list of `DecisionTreeRegressor`.
 
-In most cases, you won't need to inspect the individual trees. Nevertheless,
-we can utilize this information to solidify our understanding of random 
-forests.
+In most cases, you won't need to inspect the individual trees. Nevertheless, we
+can utilize this information to solidify our understanding of random forests.
 
----
+______________________________________________________________________
 
 ### Stronger together
 
-We fit a random forest classifier on a synthetic data set to 
-==literally== illustrate the different trees. First, we generate the data.
+We fit a random forest classifier on a synthetic data set to ==literally==
+illustrate the different trees. First, we generate the data.
 
 ```python
 from sklearn.datasets import make_classification
 
-X, y = make_classification(
-    random_state=42,
-    n_clusters_per_class=1
-)
+X, y = make_classification(random_state=42, n_clusters_per_class=1)
 ```
 
 Next, we initialize and fit a random forest classifier.
@@ -240,13 +233,13 @@ classifier = RandomForestClassifier(
 classifier.fit(X, y)
 ```
 
-Note, that we set the number of trees to `#!python 4`. We keep the number 
-small as we visualize them later on. The `max_depth` parameter limits the 
-depth of each tree to `#!python 3`. This is done to perform pruning and thus 
-keep the trees simple and easier to plot.
+Note, that we set the number of trees to `#!python 4`. We keep the number small
+as we visualize them later on. The `max_depth` parameter limits the depth of
+each tree to `#!python 3`. This is done to perform pruning and thus keep the
+trees simple and easier to plot.
 
 Finally, we visualize all trees. We access the trees via the `estimators_`
-attribute and plot them using the familiar `plot_tree()` function. Everything 
+attribute and plot them using the familiar `plot_tree()` function. Everything
 else is just plot customization.
 
 ```python hl_lines="5 7"
@@ -276,26 +269,26 @@ plt.show()
     </figcaption>
 </figure>
 
-Although there is a lot of information cramped inside one figure, at first 
-glance it is obvious that all four trees are different. Each of them differs
-in splits (feature and threshold), number of nodes and predictions.
+Although there is a lot of information cramped inside one figure, at first
+glance it is obvious that all four trees are different. Each of them differs in
+splits (feature and threshold), number of nodes and predictions.
 
 Each one of these trees on their own might not generalize well, hence they are
-often referred to as weak learners. However, when combined, they form a 
+often referred to as weak learners. However, when combined, they form a
 "strong" model. That's the essence of an ensemble method!
 
 ### Feature importance
 
-One of the most powerful attribute of random forests is their ability to 
-assess feature importance: measuring how much each input variable contributes 
-to predicting the target variable.
+One of the most powerful attribute of random forests is their ability to assess
+feature importance: measuring how much each input variable contributes to
+predicting the target variable.
 
-Remember that trees are fitted on a [bootstrap](forest.md#bootstrap-sampling) 
-training set. Since some samples are left out during this process, we can use 
-these to measure the importance of each feature. These unused observations are 
-called "out-of-bag" (OOB) samples. For each feature, the OOB samples are 
-randomly permuted (shuffled) and the increase in prediction error is measured. 
-Features that lead to larger increases in error when permuted are considered 
+Remember that trees are fitted on a [bootstrap](forest.md#bootstrap-sampling)
+training set. Since some samples are left out during this process, we can use
+these to measure the importance of each feature. These unused observations are
+called "out-of-bag" (OOB) samples. For each feature, the OOB samples are
+randomly permuted (shuffled) and the increase in prediction error is measured.
+Features that lead to larger increases in error when permuted are considered
 more important.
 
 Let's examine feature importance using the breast cancer dataset:
@@ -318,27 +311,26 @@ print(rf.feature_importances_)
 
     To keep the example concise, we did not perform a train test split.
 
-Feature importance values are a `#!python list` of `#!python float`s. 
-Each value corresponds to a feature in the order they were passed to the
-model. The values are normalized and sum to `#!python 1.0`. 
-A higher value indicates that the feature contributes more to making correct 
-predictions.
+Feature importance values are a `#!python list` of `#!python float`s. Each
+value corresponds to a feature in the order they were passed to the model. The
+values are normalized and sum to `#!python 1.0`. A higher value indicates that
+the feature contributes more to making correct predictions.
 
 Feature importance can help with:
 
 1. Feature selection: Identifying which features are most relevant for
-   predictions
-2. Model interpretation: Understanding which features drive the model's
-   decisions
-3. Data collection: Guiding future data collection efforts by highlighting
-   important measurements
+    predictions
+1. Model interpretation: Understanding which features drive the model's
+    decisions
+1. Data collection: Guiding future data collection efforts by highlighting
+    important measurements
 
 ???+ question "Visualize the feature importance"
 
-    Generate a bar plot to visualize the feature importance.
-    Use any package of your choice. For convenience, you can use the 
-    following code snippet to get started.
-    
+    Generate a bar plot to visualize the feature importance. Use any package of
+    your choice. For convenience, you can use the following code snippet to get
+    started.
+
     ```python
     import pandas as pd
 
@@ -367,6 +359,6 @@ sensitivity to data changes. While slightly less interpretable than single
 trees, random forests provide better generalization, more robust predictions,
 and useful insights through feature importance measures.
 
-With `scikit-learn`, you are now able to build a random forest for regression 
+With `scikit-learn`, you are now able to build a random forest for regression
 and classification tasks. You have also learned how to inspect individual trees
 and assess feature importance.
diff --git a/docs/data-science/algorithms/unsupervised/clustering.md b/docs/data-science/algorithms/unsupervised/clustering.md
index 1e006a7d..789e300b 100644
--- a/docs/data-science/algorithms/unsupervised/clustering.md
+++ b/docs/data-science/algorithms/unsupervised/clustering.md
@@ -1,16 +1,16 @@
 # Clustering
 
-In this section, we will start to explore unsupervised learning, where we work 
-with data that isn't accompanied by labels. One of the primary techniques 
-within this realm is clustering, which aims to uncover patterns or structures 
-in the data by grouping similar data points together. A popular method for 
-achieving this is k-means clustering, which aims to identify clusters of 
+In this section, we will start to explore unsupervised learning, where we work
+with data that isn't accompanied by labels. One of the primary techniques
+within this realm is clustering, which aims to uncover patterns or structures
+in the data by grouping similar data points together. A popular method for
+achieving this is k-means clustering, which aims to identify clusters of
 similar observations.
 
 ## K-means
 
-K-means was briefly introduced in the [Introduction](../index.md#example_1) to 
-Supervised vs. Unsupervised Learning and used to segment customers based on 
+K-means was briefly introduced in the [Introduction](../index.md#example_1) to
+Supervised vs. Unsupervised Learning and used to segment customers based on
 their annual spending and average basket size.
 
 <div style="text-align: center;">
@@ -22,37 +22,36 @@ their annual spending and average basket size.
 </div>
 
 The algorithm groups similar data points together based on their attributes
-without being told what these groups should be. 
+without being told what these groups should be.
 
 To get a better understanding of k-means, we will explore the theory behind it
-and employ the algorithm to cluster data from Spotify and a semiconductor 
+and employ the algorithm to cluster data from Spotify and a semiconductor
 manufacturer.
 
 ### Theory
 
 ???+ info
 
-    The theoretical part is adapted from:
-    ^^Christopher M. Bishop. 2006. *Pattern Recognition and Machine 
-    Learning*[^1]^^
+    The theoretical part is adapted from: ^^Christopher M. Bishop. 2006. *Pattern
+    Recognition and Machine Learning*[^1]^^
 
-    [^1]:
-    Christopher M. Bishop. Pattern Recognition and Machine Learning. 
-    Springer, 2006. [Link](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf)
+    [^1]: Christopher M. Bishop. Pattern Recognition and Machine Learning.
+    Springer, 2006.
+    [Link](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf)
 
 Assume a set of features \(x_1, x_2, ..., x_n\). K-means partitions the data
-into \(K\) number of clusters. Each cluster is represented by \(\mu_k\), 
-which can be seen as the center of a cluster \(k\).
+into \(K\) number of clusters. Each cluster is represented by \(\mu_k\), which
+can be seen as the center of a cluster \(k\).
 
-Intuitively speaking, the goal is to assign each data point \(x_n\) to the 
-cluster with the closest center \(\mu_k\). 
+Intuitively speaking, the goal is to assign each data point \(x_n\) to the
+cluster with the closest center \(\mu_k\).
 
 #### The objective
 
 Since, the optimal assignment of data points to specific clusters is not known,
-the objective is to minimize the sum of squared distances between data 
-points and their assigned cluster centers.
-This is known as the **distortion measure**:
+the objective is to minimize the sum of squared distances between data points
+and their assigned cluster centers. This is known as the **distortion
+measure**:
 
 ???+ defi "Distortion measure"
 
@@ -61,33 +60,33 @@ This is known as the **distortion measure**:
     \]
 
     where:
-    
+
     - \(N\) is the number of data points,
     - \(K\) being the number of clusters,
-    - \(r_{nk}\) is a binary indicator of whether data point \(x_n\) is 
-      assigned to cluster \(k\),
+    - \(r_{nk}\) is a binary indicator of whether data point \(x_n\) is assigned to
+        cluster \(k\),
     - \(\mu_k\) representing the cluster center.
 
-In short, we want to find the optimal \(r_{nk}\) and \(\mu_k\) that minimize 
+In short, we want to find the optimal \(r_{nk}\) and \(\mu_k\) that minimize
 the distortion measure \(J\).
 
-\(J\) is minimized in an iterative process. First, we initialize \(\mu_k\) 
-with some random values. Then we alternate between two steps:
+\(J\) is minimized in an iterative process. First, we initialize \(\mu_k\) with
+some random values. Then we alternate between two steps:
 
-1. **Assignment step**: Keep \(\mu_k\) fixed. Minimize \(J\) with respect 
-    to \(r_{nk}\). This is done by assigning each data point to the closest 
+1. **Assignment step**: Keep \(\mu_k\) fixed. Minimize \(J\) with respect to
+    \(r_{nk}\). This is done by assigning each data point to the closest
     cluster center.
-2. **Update step**: Keep \(r_{nk}\) fixed. Minimize \(J\) with respect to 
-    \(\mu_k\). This is done by updating the cluster centers to the mean of 
-    the data points assigned to the cluster.
+1. **Update step**: Keep \(r_{nk}\) fixed. Minimize \(J\) with respect to
+    \(\mu_k\). This is done by updating the cluster centers to the mean of the
+    data points assigned to the cluster.
 
 Step 1 can be seen as re-assigning the data points to clusters, while step 2
 re-computes the cluster centers.
 
 ???+ info
 
-    Since \(\mu_k\) is the mean of the data points assigned to cluster \(k\),
-    we speak of the k-means algorithm.
+    Since \(\mu_k\) is the mean of the data points assigned to cluster \(k\), we
+    speak of the k-means algorithm.
 
 The optimization of \(J\) is guaranteed to converge, but it might not find the
 global minimum. The final solution depends on the initial cluster centers.
@@ -95,11 +94,11 @@ global minimum. The final solution depends on the initial cluster centers.
 ???+ question "Get a better understanding"
 
     To improve your understanding of the k-means algorithm, either watch the
-    following video or visit the interactive visualization.
-    Both variants illustrate the iterative process of k-means.
+    following video or visit the interactive visualization. Both variants
+    illustrate the iterative process of k-means.
 
 === "Option 1: :fontawesome-brands-youtube: Video"
-    
+
     <div style="text-align: center;">
         <iframe width="560" height="315" 
             src="https://www.youtube.com/embed/R2e3Ls9H_fc?si=Lz4jq8Fbxjr1BmeL" 
@@ -111,25 +110,26 @@ global minimum. The final solution depends on the initial cluster centers.
     </div>
 
 === "Option 2: :fontawesome-solid-globe: Website"
-    
-    Visit the site [clustering-visualizer.web.app/kmeans](https://clustering-visualizer.web.app/kmeans).
+
+    Visit the site
+    [clustering-visualizer.web.app/kmeans](https://clustering-visualizer.web.app/kmeans).
     Use mouse clicks to draw data points. Click on "START".
-    
-    The web app illustrates the iterative algorithm. You can watch the 
-    data points being assigned to clusters and the update of cluster centers 
-    which are denoted in the app as \(C_1, C_2, ... , C_N\).
+
+    The web app illustrates the iterative algorithm. You can watch the data points
+    being assigned to clusters and the update of cluster centers which are denoted
+    in the app as \(C_1, C_2, ... , C_N\).
 
 #### Elbow method :flexed_biceps:
 
 So far we have not discussed the number of clusters \(K\) in depth. Since the
-algorithm requires the number of clusters as an input, it is crucial to choose 
+algorithm requires the number of clusters as an input, it is crucial to choose
 \(K\) wisely.
 
-One common approach to determine the optimal number of clusters is the
-**elbow method**. The idea is to plot the distortion measure \(J\) (inertia)
-for different values of \(K\). The plot will show a sharp decrease in \(J\) 
-as \(K\) increases. The optimal number of clusters is the point where the 
-decrease flattens out, resembling an elbow.
+One common approach to determine the optimal number of clusters is the **elbow
+method**. The idea is to plot the distortion measure \(J\) (inertia) for
+different values of \(K\). The plot will show a sharp decrease in \(J\) as
+\(K\) increases. The optimal number of clusters is the point where the decrease
+flattens out, resembling an elbow.
 
 <figure markdown="span">
     ![Elbow method](../../../assets/data-science/algorithms/clustering/elbow-method.png)
@@ -148,40 +148,39 @@ clustering semiconductor data.
 
 ### Recommendation system
 
-If you're using a music streaming service, you're familiar with listening to 
-playlist. At the end of a playlist, the service recommends you similar songs 
+If you're using a music streaming service, you're familiar with listening to
+playlist. At the end of a playlist, the service recommends you similar songs
 based on the previous songs.
 
-We will build such a recommendation system (a rudimentary one) with 
-k-means. The goal is to cluster songs based on their audio features and 
-recommend similar songs to the user.
+We will build such a recommendation system (a rudimentary one) with k-means.
+The goal is to cluster songs based on their audio features and recommend
+similar songs to the user.
 
-To build our own recommendation system, we will use a modified 
-Spotify dataset.
+To build our own recommendation system, we will use a modified Spotify dataset.
 
 ???+ info
 
-    The original data can be found on 
+    The original data can be found on
     [Kaggle](https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated?resource=download).
 
-    The modified data we are using, contains songs from 2024 up until now 
-    (time of writing: January 31, 2025).
-    
----
+    The modified data we are using, contains songs from 2024 up until now (time of
+    writing: January 31, 2025).
+
+______________________________________________________________________
 
 ???+ question "Download and read data"
 
     1. Download the data set.
-    2. Read it with `pandas` and for convenience assign it to a variable called
-       `data`. Then you will be able to use the following code snippets more
-       easily.
-    3. Print the first rows of `data`.
+    1. Read it with `pandas` and for convenience assign it to a variable called
+        `data`. Then you will be able to use the following code snippets more
+        easily.
+    1. Print the first rows of `data`.
 
 <div class="center-button" markdown>
 [Download Spotify tracks :fontawesome-solid-download:](../../../assets/data-science/algorithms/clustering/spotify.csv){ .md-button }
 </div>
 
----
+______________________________________________________________________
 
 With the data set loaded, we pick the following audio features for clustering:
 
@@ -205,9 +204,9 @@ X = data[features]
 ???+ question "Have a look at the data"
 
     1. Look at the first couple of rows of the `DataFrame` `X`.
-    2. Check for potential missing values.
+    1. Check for potential missing values.
 
-    Hint: If you need a refresh on missing values, visit the 
+    Hint: If you need a refresh on missing values, visit the
     [Data preprocessing](../../data/preprocessing.md#missing-values) chapter.
 
 You might have noticed that all features are numerical. In fact, k-means
@@ -215,12 +214,12 @@ You might have noticed that all features are numerical. In fact, k-means
 
 ???+ danger
 
-    K-means clustering relies on Euclidean distances, which ==only make 
-    sense for numerical data==.
+    K-means clustering relies on Euclidean distances, which ==only make sense for
+    numerical data==.
 
-    :warning: Never use k-means for categorical data, even if you encode the 
-    categories as numbers or labels. Distances between categorical values 
-    are not meaningful!
+    :warning: Never use k-means for categorical data, even if you encode the
+    categories as numbers or labels. Distances between categorical values are not
+    meaningful!
 
     For clustering categorical data, use specialized algorithms like k-modes or
     other appropriate methods.
@@ -243,15 +242,16 @@ min        0.093900      0.001740  ...      0.000010     46.999000
 max        0.988000      0.998000  ...      0.989000    236.089000
 ```
 
-These basic statistics reveal that the features have different scales.
-For example, compare `tempo` and `danceability`. Tempo ranges from 
-`#!python 46` to `#!python 236`, while danceability ranges from 
-`#!python 0.0939` to `#!python 0.988`.
+These basic statistics reveal that the features have different scales. For
+example, compare `tempo` and `danceability`. Tempo ranges from `#!python 46` to
+`#!python 236`, while danceability ranges from `#!python 0.0939` to
+`#!python 0.988`.
 
-Thus, we apply a [Z-Score normalization](../../data/preprocessing.md#z-score-normalization)
-to all features (to have a mean of `0` and a standard deviation of `1`). 
-This prevents k-means to disproportionately weigh features like `tempo` and 
-ensures each feature contributes equally to the distance calculations.
+Thus, we apply a
+[Z-Score normalization](../../data/preprocessing.md#z-score-normalization) to
+all features (to have a mean of `0` and a standard deviation of `1`). This
+prevents k-means to disproportionately weigh features like `tempo` and ensures
+each feature contributes equally to the distance calculations.
 
 ```python
 from sklearn.preprocessing import StandardScaler
@@ -276,7 +276,7 @@ print(cluster_indices)
 array([4, 0, 3, ..., 1, 1, 2], dtype=int32)
 ```
 
-The `n_clusters` parameter specifies the number of clusters. We set it to 
+The `n_clusters` parameter specifies the number of clusters. We set it to
 `#!python 5` for now. The `random_state` parameter ensures reproducibility.
 Using the `fit_predict()` method, we obtain the cluster indices for each data
 point. In this case, these indices range from `#!python 0` to `#!python 4`.
@@ -288,17 +288,18 @@ This is where the elbow method comes into play. :flexed_biceps:
 
 #### Elbow method
 
-With the attribute `inertia_`, we can access the distortion measure \(J\).
-From the k-means docs:
+With the attribute `inertia_`, we can access the distortion measure \(J\). From
+the k-means docs:
 
 > `inertia_`:
-> 
+>
 > Sum of squared distances of samples to their closest cluster center,...
-> 
-> -- <cite>[KMeans docs](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)</cite>
+>
+>
+> <cite>[KMeans docs](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)</cite>
 
 In a loop we fit the k-means algorithm for different numbers of clusters \(K\)
-and store the corresponding distortion measure (`inertia_`). Then we plot the 
+and store the corresponding distortion measure (`inertia_`). Then we plot the
 results.
 
 We define a function to apply the elbow method:
@@ -321,19 +322,21 @@ def elbow_method(X, max_clusters=15):
     return distortions
 ```
 
-By default, the function `elbow_method()` tries values for \(K\) from 
-`#!python 1` to `#!python 15` and stores the corresponding distortion measure 
-in a `DataFrame`. 
+By default, the function `elbow_method()` tries values for \(K\) from
+`#!python 1` to `#!python 15` and stores the corresponding distortion measure
+in a `DataFrame`.
 
----
+______________________________________________________________________
 
 ???+ question "Apply the elbow method"
 
     1. Apply the `elbow_method()` on our scaled data `X`.
-    2. Create a line plot with the number of clusters (K) on the x-axis and 
-        the distortion measure on the y-axis.
 
-        Hint: Use the [`plot()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)
+    1. Create a line plot with the number of clusters (K) on the x-axis and the
+        distortion measure on the y-axis.
+
+        Hint: Use the
+        [`plot()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)
         method of the resulting `DataFrame`.
 
 Expand the below section to see a plot as possible solution.
@@ -350,14 +353,14 @@ Expand the below section to see a plot as possible solution.
 #### Choice paralysis
 
 Like in our example, it is not always obvious how many clusters to pick,
-because the "elbow" can sometimes be subtle or ambiguous. Ideally, 
-you choose the point where the distortion/inertia sharply decreases and then 
-levels off, forming an elbow-like bend in the plot.
+because the "elbow" can sometimes be subtle or ambiguous. Ideally, you choose
+the point where the distortion/inertia sharply decreases and then levels off,
+forming an elbow-like bend in the plot.
 
-In this example, possible candidates for the number of clusters \(K\) are 
+In this example, possible candidates for the number of clusters \(K\) are
 `#!python 5`, `#!python 6` or `#!python 7`. As we have to make a choice, we
-choose `#!python 6` clusters.
-Now, we have to simply fit the k-means algorithm with `#!python n_clusters=6`.
+choose `#!python 6` clusters. Now, we have to simply fit the k-means algorithm
+with `#!python n_clusters=6`.
 
 ```python
 kmeans = KMeans(n_clusters=6, random_state=42)
@@ -374,13 +377,13 @@ cluster_indices = kmeans.fit_predict(X)
     </iframe>
 </div>
 
-The goal of this exercise is to recommend a song based on a previous
-track. The idea is to pick a song as recommendation that is in the same 
-cluster as the previous one. To do so, we can use the `cluster_indices` to 
-recommend similar songs.
+The goal of this exercise is to recommend a song based on a previous track. The
+idea is to pick a song as recommendation that is in the same cluster as the
+previous one. To do so, we can use the `cluster_indices` to recommend similar
+songs.
 
-Since the `cluster_indices` are in the same order as our initial `data`, we 
-can simply assign them as a new column.
+Since the `cluster_indices` are in the same order as our initial `data`, we can
+simply assign them as a new column.
 
 ```python
 data["cluster"] = cluster_indices
@@ -396,12 +399,12 @@ print(data.head())
 4  6dOtVTDdiauQNBQEDOtlAB  BIRDS OF A FEATHER          Billie Eilish  ...   0.438  104.978        4
 ```
 
-Now, that we assigned a cluster to all `#!python 11320` tracks, we can easily 
-recommend a song based on a given `spotify_id` (the unique identifier of a 
-song on the platform).
+Now, that we assigned a cluster to all `#!python 11320` tracks, we can easily
+recommend a song based on a given `spotify_id` (the unique identifier of a song
+on the platform).
 
-Use the below functions to see your recommender system in action. Don't 
-worry about the details of these functions.
+Use the below functions to see your recommender system in action. Don't worry
+about the details of these functions.
 
 ```python
 def print_track_info(track):
@@ -463,57 +466,52 @@ Cluster index: 4
     recommendation. Try it out!
 
     1. Pick another `spotify_id` and recommend a song.
-    2. Repeat the process a couple of times.
-
+    1. Repeat the process a couple of times.
 
 #### Are the recommendations good?
 
 As you've tried the recommender system a couple of times, you might have
-wondered if the recommendations are actually good?! 
-:thinking_face:
+wondered if the recommendations are actually good?! :thinking_face:
 
-Simply put, you have to be the judge if we were actually able to cluster 
+Simply put, you have to be the judge if we were actually able to cluster
 similar songs together and build a good recommendation system.
 
-In this application, it's quite intuitive: If you as a user like the 
-recommendations and keep listening to the recommended songs, the system is 
+In this application, it's quite intuitive: If you as a user like the
+recommendations and keep listening to the recommended songs, the system is
 successful.
 
-
 ???+ info
-    
-    When talking about supervised tasks, we were able to measure the 
-    performance of our models. However, in unsupervised learning, like 
-    clustering, we do not have labels to compare our results to. Thus, 
-    evaluating the performance of unsupervised learning methods is challenging.
-    
-    In practice, you have to rely on domain knowledge to interpret the 
-    results and assess the quality of the model.
 
----
+    When talking about supervised tasks, we were able to measure the performance of
+    our models. However, in unsupervised learning, like clustering, we do not have
+    labels to compare our results to. Thus, evaluating the performance of
+    unsupervised learning methods is challenging.
+
+    In practice, you have to rely on domain knowledge to interpret the results and
+    assess the quality of the model.
+
+______________________________________________________________________
 
 ### Semiconductor data
 
-K-means is not only useful for recommendation systems, but also for
-anomaly detection. The idea is to form clusters which in turn can be used to
-detect the outliers/anomalies.
+K-means is not only useful for recommendation systems, but also for anomaly
+detection. The idea is to form clusters which in turn can be used to detect the
+outliers/anomalies.
 
 ???+ info
 
     The data is adapted from the UCI Machine Learning Repository.[^2]
-    
-    [^2]:
-        McCann, M. & Johnston, A. (2008). SECOM [Dataset]. 
-        UCI Machine Learning Repository. 
-        [https://doi.org/10.24432/C54305](https://doi.org/10.24432/C54305)
+
+    [^2]: McCann, M. & Johnston, A. (2008). SECOM [Dataset]. UCI Machine Learning
+    Repository. [https://doi.org/10.24432/C54305](https://doi.org/10.24432/C54305)
 
 In this example, you will apply k-means to semiconductor data.
 
 ???+ question "Download and read data"
 
     1. Download the below data set.
-    2. Read it with `pandas`.
-    3. Have a look at the data.
+    1. Read it with `pandas`.
+    1. Have a look at the data.
 
 <div class="center-button" markdown>
 [Download semiconductor data :fontawesome-solid-download:](../../../assets/data-science/algorithms/clustering/semiconductor.csv){ .md-button }
@@ -522,47 +520,48 @@ In this example, you will apply k-means to semiconductor data.
 Each row in the data set
 
 > represents a single production entity with associated measured features [...]
-> 
-> -- <cite>UCI Machine Learning Repository</cite>
+>
+> <cite>UCI Machine Learning Repository</cite>
 
 ???+ question "Apply k-means"
 
     Solve the following tasks to apply k-means to the semiconductor data:
 
     1. Are there any missing values in the data?
-    2. Deal with potential missing values; choose any suitable strategy. We 
-        recommend to utilize the [`SimpleImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html) with your chosen strategy. The application 
-        of the `SimpleImputer` should be straightforward as it implements the
-        methods you already know, e.g., `fit_transform()`.
-    3. Do you need to scale the features? If so, apply a `StandardScaler`.
-    4. Use the elbow method to determine the number of clusters.
-    5. Fit the k-means algorithm with the optimal number of clusters.
-
-    Hint: You can reuse the functions and code snippets from the Spotify
-    example.
+    1. Deal with potential missing values; choose any suitable strategy. We
+        recommend to utilize the
+        [`SimpleImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html)
+        with your chosen strategy. The application of the `SimpleImputer` should
+        be straightforward as it implements the methods you already know, e.g.,
+        `fit_transform()`.
+    1. Do you need to scale the features? If so, apply a `StandardScaler`.
+    1. Use the elbow method to determine the number of clusters.
+    1. Fit the k-means algorithm with the optimal number of clusters.
+
+    Hint: You can reuse the functions and code snippets from the Spotify example.
 
 ??? info
 
-    If you have solved the above tasks, you might wonder how to interpret 
-    your clustering results. Moreover, how can you detect potential anomalies?
+    If you have solved the above tasks, you might wonder how to interpret your
+    clustering results. Moreover, how can you detect potential anomalies?
 
-    Again, it all depends on domain knowledge. If you're a expert in the 
-    semiconductor industry you might be able to tell if the clusters 
-    make sense and if there are any anomalies in the data. Otherwise, 
-    interpretation can be quite challenging.
+    Again, it all depends on domain knowledge. If you're a expert in the
+    semiconductor industry you might be able to tell if the clusters make sense and
+    if there are any anomalies in the data. Otherwise, interpretation can be quite
+    challenging.
 
 ## Recap
 
-In this chapter, we introduced k-means clustering. We covered the theory 
+In this chapter, we introduced k-means clustering. We covered the theory
 followed by two practical examples: building a recommendation system for
 Spotify tracks and clustering semiconductor data.
 
 We employed the elbow method to determine the optimal number of clusters and
 discussed the challenges of evaluating clustering results.
 
-In the upcoming chapter, we introduce another unsupervised method, 
-namely Principal Component Analysis (PCA) to reduce the dimensionality of data.
-PCA can be useful in various ways:
+In the upcoming chapter, we introduce another unsupervised method, namely
+Principal Component Analysis (PCA) to reduce the dimensionality of data. PCA
+can be useful in various ways:
 
 - reducing the computational complexity of algorithms
 - visualizing high-dimensional data in a 2D or 3D space
diff --git a/docs/data-science/algorithms/unsupervised/dim-reduction.md b/docs/data-science/algorithms/unsupervised/dim-reduction.md
index 8a6e4b07..fc581eb9 100644
--- a/docs/data-science/algorithms/unsupervised/dim-reduction.md
+++ b/docs/data-science/algorithms/unsupervised/dim-reduction.md
@@ -2,73 +2,73 @@
 
 ## Principal Component Analysis (PCA)
 
-In data science and machine learning, we often encounter data sets with 
-hundreds or even thousands of features. We speak of high-dimensional data 
-sets. While these features may contain valuable information, working with 
-such high-dimensional data can be computationally expensive, prone to 
-overfitting, and difficult to visualize. This is where another 
-unsupervised method, dimensionality reduction comes in — a technique used to 
-simplify data sets, while retaining much of the critical information.
-
-One of the most widely used methods for dimensionality reduction is 
-Principal Component Analysis (PCA). PCA transforms a high-dimensional (= 
-lots of features) data set into a smaller set of features (components). In
-practice, PCA can reduce hundreds of features down to just 2 or 3 
-features, making PCA an ideal tool for visualization, preprocessing, and 
-feature extraction.
-
-In this section, we will explain the inner workings of PCA and apply it to
-the semiconductor data set.
+In data science and machine learning, we often encounter data sets with
+hundreds or even thousands of features. We speak of high-dimensional data sets.
+While these features may contain valuable information, working with such
+high-dimensional data can be computationally expensive, prone to overfitting,
+and difficult to visualize. This is where another unsupervised method,
+dimensionality reduction comes in — a technique used to simplify data sets,
+while retaining much of the critical information.
+
+One of the most widely used methods for dimensionality reduction is Principal
+Component Analysis (PCA). PCA transforms a high-dimensional (= lots of
+features) data set into a smaller set of features (components). In practice,
+PCA can reduce hundreds of features down to just 2 or 3 features, making PCA an
+ideal tool for visualization, preprocessing, and feature extraction.
+
+In this section, we will explain the inner workings of PCA and apply it to the
+semiconductor data set.
 
 ### What is PCA?
 
-PCA is a **linear transformation technique** that identifies the directions 
-(also called **principal components**) in which the data varies the most. 
-These principal components capture as much variance as possible. PCA has a 
-variety of applications, such as:
+PCA is a **linear transformation technique** that identifies the directions
+(also called **principal components**) in which the data varies the most. These
+principal components capture as much variance as possible. PCA has a variety of
+applications, such as:
 
 - **Data visualization**: Plot a dimensionality reduced data set in 2D.
 - **Preprocessing**: Removing noise or redundant features while retaining the
-  essential patterns in data.
+    essential patterns in data.
 - **Feature engineering**: Summarizing high-dimensional data into a smaller set
-  of meaningful features.
+    of meaningful features.
 
 ### How does it work?
 
 PCA follows these essential steps:
 
 1. **Compute the covariance matrix**: PCA captures relationships between
-   features by calculating the covariance between them.
+    features by calculating the covariance between them.
 
     ???+ info
-    
-        Think of the covariance matrix as the "spread" of the data. PCA looks 
-        at the interaction :fontawesome-solid-arrow-right: the correlation of 
-        features with each other. Visit the 
+
+        Think of the covariance matrix as the "spread" of the data. PCA looks at the
+        interaction :fontawesome-solid-arrow-right: the correlation of features with
+        each other. Visit the
         [correlation chapter](../../../statistics/bivariate/Correlation.md#covariance)
         in the statistics course to learn more about covariance.
 
-2. **Eigen decomposition**: Identify the eigenvalues and eigenvectors of the
-   covariance matrix. The eigenvectors represent the directions of the
-   principal components, while the eigenvalues represent the amount of variance
-   captured by each component.
+1. **Eigen decomposition**: Identify the eigenvalues and eigenvectors of the
+    covariance matrix. The eigenvectors represent the directions of the
+    principal components, while the eigenvalues represent the amount of
+    variance captured by each component.
 
     ???+ info
-    
-        If you want to know more about eigenvalues and eigenvectors, check out
-        this [site](https://www.mathsisfun.com/algebra/eigenvalue.html).
 
-3. **Rank components**: Components are ranked by their eigenvalues. The first
-   principal component captures the most variance, the second captures the
-   next-most, and so on.
-4. **Transform the data**: Project the original data onto the top principal
-   components to reduce its dimensionality.
+        If you want to know more about eigenvalues and eigenvectors, check out this
+        [site](https://www.mathsisfun.com/algebra/eigenvalue.html).
+
+1. **Rank components**: Components are ranked by their eigenvalues. The first
+    principal component captures the most variance, the second captures the
+    next-most, and so on.
+
+1. **Transform the data**: Project the original data onto the top principal
+    components to reduce its dimensionality.
 
 ### The mathematical objective
 
-Let’s assume we have a data set \(X\) with \(p\) features (dimensions). We
-aim to transform \(X\) into a new matrix \(Z\) with \(k\) features such
-that \(k < p\), while retaining as much variance as possible.
+Let’s assume we have a data set \(X\) with \(p\) features (dimensions). We aim
+to transform \(X\) into a new matrix \(Z\) with \(k\) features such that
+\(k < p\), while retaining as much variance as possible.
 
 The transformation (described previously under point 4) is defined as:
 
@@ -79,24 +79,23 @@ The transformation (described previously under point 4) is defined as:
     \]
 
     Where:
-    
+
     - \(Z\) is the transformed data set in the lower-dimensional space,
     - \(W\) is a matrix whose columns are the top \(k\) eigenvectors of the
         covariance matrix of \(X\).
 
 ???+ tip
 
-    Dimensionality reduction helps in combating the *curse of dimensionality*, 
-    a phenomenon where the performance of algorithms deteriorates with an 
-    increase in the number of features. Algorithms like clustering 
-    often struggle to find meaningful patterns when working with a 
-    high-dimensional data set.
+    Dimensionality reduction helps in combating the *curse of dimensionality*, a
+    phenomenon where the performance of algorithms deteriorates with an increase in
+    the number of features. Algorithms like clustering often struggle to find
+    meaningful patterns when working with a high-dimensional data set.
 
 ## Example
 
-It’s time to apply PCA to real-world data. We'll revisit the semiconductor
-data set that we used in the previous clustering chapter. The first goal 
-is to use PCA to reduce the data set's dimensions and visualize them.
+It’s time to apply PCA to real-world data. We'll revisit the semiconductor data
+set that we used in the previous clustering chapter. The first goal is to use
+PCA to reduce the data set's dimensions and visualize them.
 
 ### Prepare the data
 
@@ -132,8 +131,8 @@ scaled_data = scaler.fit_transform(data)
 
 ### Apply PCA
 
-We now apply PCA to reduce the dimensions. First, we fit the PCA model on
-the `scaled_data`:
+We now apply PCA to reduce the dimensions. First, we fit the PCA model on the
+`scaled_data`:
 
 ```python
 from sklearn.decomposition import PCA
@@ -142,11 +141,11 @@ pca = PCA(n_components=2, random_state=42)  # (1)!
 components = pca.fit_transform(scaled_data)
 ```
 
-1. Although the above definition of PCA is deterministic, the actual 
-   implementation can be stochastic (depending on the solver used). Since
-   `svd_solver` is set to `#!python "auto"` by default, the results can 
-   vary slightly. Long story short, setting `random_state` ensures 
-   reproducibility in all cases.
+1. Although the above definition of PCA is deterministic, the actual
+    implementation can be stochastic (depending on the solver used). Since
+    `svd_solver` is set to `#!python "auto"` by default, the results can vary
+    slightly. Long story short, setting `random_state` ensures reproducibility
+    in all cases.
 
 `n_components=2` specifies that we want to reduce the data set to 2 dimensions.
 
@@ -167,7 +166,7 @@ plt.show()
 ```
 
 1. The `alpha` parameter controls the transparency of the points. A value of
-   `#!python 0.5` makes the points semi-transparent.
+    `#!python 0.5` makes the points semi-transparent.
 
 <figure markdown="span">
     ![PCA visualized](../../../assets/data-science/algorithms/dim-reduction/pca.svg)
@@ -178,21 +177,20 @@ plt.show()
     </figcaption>
 </figure>
 
-To quickly recap so far:
-We were able to reduce the semiconductor data set from `#!python 590` 
-features to just `#!python 2`.
+To quickly recap so far: We were able to reduce the semiconductor data set from
+`#!python 590` features to just `#!python 2`.
 
 #### Plot interpretation
 
-The scatter plot shows the data set in a 2D space with each observation as
-a point. Additionally, we can observe clusters. Since, principal 
-components are ranked by the amount of variance they capture, the first
-component (PC1) is "more important" than the second component (PC2).
+The scatter plot shows the data set in a 2D space with each observation as a
+point. Additionally, we can observe clusters. Since, principal components are
+ranked by the amount of variance they capture, the first component (PC1) is
+"more important" than the second component (PC2).
 
 Therefore, differences along the x-axis (PC1) are more significant than
-differences along the y-axis (PC2). As we are interested in potential 
-anomalies in semiconductor products, we can detect some observations that might
-be well worth some further investigation:
+differences along the y-axis (PC2). As we are interested in potential anomalies
+in semiconductor products, we can detect some observations that might be well
+worth some further investigation:
 
 <figure markdown="span">
     ![Potential anomalies](../../../assets/data-science/algorithms/dim-reduction/potential-anomalies.png)
@@ -201,31 +199,31 @@ be well worth some further investigation:
     </figcaption>
 </figure>
 
-A majority of the data points are clustered in the upper left corner. 
-Contrary, these single observations with a high difference on the x-axis 
-(PC1) might be anomalies (annotated by these arrows). Although, samples 
-within the encircled area have their differences on the y-axis (PC2),
-they are still worth investigating.
+A majority of the data points are clustered in the upper left corner. Contrary,
+these single observations with a high difference on the x-axis (PC1) might be
+anomalies (annotated by these arrows). Although, samples within the encircled
+area have their differences on the y-axis (PC2), they are still worth
+investigating.
 
 ???+ question "Re-apply PCA on unscaled data"
 
     What would happen if you apply PCA to the unscaled data?
-    
+
     1. Create a new PCA instance with `n_components=2`.
-    2. Fit the PCA model on the `data` (unscaled) and transform it.
-    3. Visualize the new components in a 2D scatter plot.
-    4. Compare the results with the previous PCA visualization.
+    1. Fit the PCA model on the `data` (unscaled) and transform it.
+    1. Visualize the new components in a 2D scatter plot.
+    1. Compare the results with the previous PCA visualization.
 
 ???+ tip
 
     PCA is sensitive to the scale of the data. Thus, the scaled data nicely
-    separates the clusters, while the unscaled data does not. So be sure to 
-    pick the right preprocessing steps for your data.
+    separates the clusters, while the unscaled data does not. So be sure to pick
+    the right preprocessing steps for your data.
 
 ### Explained variance
 
 When evaluating a PCA model, it is crucial to understand how much variance is
-captured by each principal component. Simply access the 
+captured by each principal component. Simply access the
 `explained_variance_ratio_` attribute:
 
 ```python
@@ -244,38 +242,37 @@ capture roughly `10%` of the variance.
 
 ???+ tip
 
-    Put simply, our two principal components capture `10%` of the variance
-    of the original `#!python 590` features which is not that great. 
+    Put simply, our two principal components capture `10%` of the variance of the
+    original `#!python 590` features which is not that great.
     :slightly_frowning_face:
 
 Unfortunately, when dealing with real world data, results may not be as
-promising as expected. In this case, we might need to consider more
-components to capture a higher percentage of the variance.
+promising as expected. In this case, we might need to consider more components
+to capture a higher percentage of the variance.
 
 ???+ info "Choosing the number of components"
-    
+
     It is essential to choose the right number of components. For example, you
-    could use the components as features for another machine learning model,
-    hence you want to retain as much information as possible.
-    
-    However, the choice of how many components to keep is subjective. 
-    A common approach is to retain enough components to explain 90-95% of 
-    the variance.
+    could use the components as features for another machine learning model, hence
+    you want to retain as much information as possible.
+
+    However, the choice of how many components to keep is subjective. A common
+    approach is to retain enough components to explain 90-95% of the variance.
 
-???+ question "Number of components to exceed 95% variance" 
+???+ question "Number of components to exceed 95% variance"
 
     Using the *scaled* semiconductor dataset:
-    
+
     1. Create a PCA model to analyze the variance in the data
-    2. Determine the minimum number of principal components needed to explain 
-       at least 95% of the total variance
-    
+    1. Determine the minimum number of principal components needed to explain at
+        least 95% of the total variance
+
     Solution approaches:
 
     - You can use the `explained_variance_ratio_` attribute, OR
-    - There is an alternative approach that requires only 3 lines of code 
-      maximum (hint: google and check the PCA documentation)
-    
+    - There is an alternative approach that requires only 3 lines of code maximum
+        (hint: google and check the PCA documentation)
+
     Use the following quiz question to evaluate your answer.
 
 <quiz>
@@ -303,17 +300,17 @@ solution.
     def elbow_method(X, max_clusters=15):
         inertia = []
         K = range(1, max_clusters + 1)
-    
+
         for k in K:
             model = KMeans(n_clusters=k, random_state=42)
             model.fit(X)
             inertia.append(model.inertia_)
-    
+
         # for convenience store in a DataFrame
         distortions = pd.DataFrame(
             {"k (number of cluster)": K, "inertia (J)": inertia}
         )
-    
+
         return distortions
     ```
 
@@ -371,12 +368,12 @@ components.plot(
 plt.show()
 ```
 
-To summarize, we applied the same preprocessing steps, reduced the data to
-2 dimensions using PCA. Afterward, we called the elbow method on the 2 
-components to determine the optimal number of clusters. Then we applied
-k-means with `#!python n_clusters=5`. Finally, we plot the 2 components and 
-color the observations according to their corresponding clusters. Have a look 
-at the resulting plots.
+To summarize, we applied the same preprocessing steps, reduced the data to 2
+dimensions using PCA. Afterward, we called the elbow method on the 2 components
+to determine the optimal number of clusters. Then we applied k-means with
+`#!python n_clusters=5`. Finally, we plot the 2 components and color the
+observations according to their corresponding clusters. Have a look at the
+resulting plots.
 
 === "Clustered components"
 
@@ -387,12 +384,12 @@ at the resulting plots.
         </figcaption>
     </figure>
 
-    The plot shows the semiconductor data set clustered into 5 groups. 
-    Each color represents a different cluster. The clusters are well 
-    separated in the 2D space.
+    The plot shows the semiconductor data set clustered into 5 groups. Each color
+    represents a different cluster. The clusters are well separated in the 2D
+    space.
 
 === "Elbow method"
-    
+
     <figure markdown="span">
         ![Elbow method on 2 principal components](../../../assets/data-science/algorithms/dim-reduction/elbow-pca-kmeans.svg)
         <figcaption>
@@ -400,21 +397,19 @@ at the resulting plots.
         </figcaption>
     </figure>
 
-    The plot shows the distortion (inertia) for different numbers of 
-    clusters. This time around, we can distinctly see an elbow at `k=5` 
-    clusters. :flexed_biceps:
+    The plot shows the distortion (inertia) for different numbers of clusters. This
+    time around, we can distinctly see an elbow at `k=5` clusters. :flexed_biceps:
 
----
+______________________________________________________________________
 
 ## Recap
 
-In this chapter, we concluded the Supervised vs. Unsupervised Learning 
-portion of this course and introduced **Principal Component Analysis 
-(PCA)**, a linear technique for dimensionality reduction.
+In this chapter, we concluded the Supervised vs. Unsupervised Learning portion
+of this course and introduced **Principal Component Analysis (PCA)**, a linear
+technique for dimensionality reduction.
 
-We discussed the inner workings of PCA and applied it to the semiconductor 
-data set, where we could identify potential anomalies in the data. We also
+We discussed the inner workings of PCA and applied it to the semiconductor data
+set, where we could identify potential anomalies in the data. We also
 visualized the data set in a 2D space, making it easier to interpret and
-analyze.
-Lastly, a combination of PCA and k-means revealed distinct clusters in the 
-semiconductor data set.
+analyze. Lastly, a combination of PCA and k-means revealed distinct clusters in
+the semiconductor data set.
diff --git a/docs/data-science/basics/intro.md b/docs/data-science/basics/intro.md
index 4d0e6178..4404f062 100644
--- a/docs/data-science/basics/intro.md
+++ b/docs/data-science/basics/intro.md
@@ -6,21 +6,19 @@ The terms data science and machine learning are often used interchangeably.
 Let's explore them to get a better understanding of this course's content.
 
 === ":bar_chart: Data Science"
-    
-    **Data Science** is an interdisciplinary field that combines statistics, 
-    programming and domain knowledge to extract insights from data. As a data 
-    scientist, you could work in vastly different domains, from healthcare and 
-    finance to manufacturing and entertainment. The core skills remain the 
-    same, but the questions you answer and the data you work with vary greatly.
 
+    **Data Science** is an interdisciplinary field that combines statistics,
+    programming and domain knowledge to extract insights from data. As a data
+    scientist, you could work in vastly different domains, from healthcare and
+    finance to manufacturing and entertainment. The core skills remain the same,
+    but the questions you answer and the data you work with vary greatly.
 
 === ":robot: Machine Learning"
 
-    **Machine Learning (ML)** is a subset of Data Science that focuses on 
-    building algorithms that learn patterns from data to make predictions or 
-    decisions.
+    **Machine Learning (ML)** is a subset of Data Science that focuses on building
+    algorithms that learn patterns from data to make predictions or decisions.
 
----
+______________________________________________________________________
 
 <div style="text-align: center">
     <i>The primary focus of this course is the data science workflow, from 
@@ -29,13 +27,13 @@ Let's explore them to get a better understanding of this course's content.
     </i>
 </div>
 
----
+______________________________________________________________________
 
 ## What to Expect
 
 Before diving into examples and workflows, let's set realistic expectations.
 
-Data science is fundamentally about **understanding and insight**, not 
+Data science is fundamentally about **understanding and insight**, not
 perfection. You won't find models that are 100% accurate and that's okay - it's
 not the goal. Instead, data science helps us:
 
@@ -48,27 +46,27 @@ not the goal. Instead, data science helps us:
 
 Chances are you've already used services built by data scientists today:
 
-- :material-currency-usd: **Dynamic Pricing**: Airlines and concert platforms 
+- :material-currency-usd: **Dynamic Pricing**: Airlines and concert platforms
     adjust prices based on demand, time and user behavior
-- :material-movie: **Recommendation Systems**: Netflix suggests movies based 
-    on your viewing history; Instagram curates your feed
-- :material-email: **Spam Detection**: Your email provider filters unwanted 
+- :material-movie: **Recommendation Systems**: Netflix suggests movies based on
+    your viewing history; Instagram curates your feed
+- :material-email: **Spam Detection**: Your email provider filters unwanted
     messages automatically
 
 In this course, we'll build models for tasks like:
 
-- :material-home: **Price Prediction**: Estimating house prices based on 
+- :material-home: **Price Prediction**: Estimating house prices based on
     features like size and location
 - :material-hospital: **Medical Diagnosis**: Classifying tumors as malignant or
     benign
-- :material-alert: **Anomaly Detection**: Identifying faulty products in 
+- :material-alert: **Anomaly Detection**: Identifying faulty products in
     manufacturing data
 
 ## Building blocks
 
-A typical data science project includes several stages, from collecting raw 
-data to deploying models in production. This course focuses on the 
-**core workflow**:
+A typical data science project includes several stages, from collecting raw
+data to deploying models in production. This course focuses on the **core
+workflow**:
 
 <div style="text-align: center">
 
@@ -84,23 +82,22 @@ data to deploying models in production. This course focuses on the
 </div>
 
 | Stage                  | What You'll Learn                              |
-|------------------------|------------------------------------------------|
+| ---------------------- | ---------------------------------------------- |
 | **Data Preparation**   | Inspect, clean and structure datasets          |
 | **Data Preprocessing** | Transform features (encoding, scaling, etc., ) |
 | **Modeling**           | Train different machine learning algorithms    |
 | **Evaluation**         | Measure performance and interpret results      |
 
-
 ???+ tip "Iterative Process"
 
-    Data science is rarely linear. You’ll repeatedly cycle through collecting
-    data, preparing it, training models and evaluating results. Each evaluation
-    highlights new issues (e.g., missing data or unrealistic assumptions) that 
-    send you back to earlier stages to improve your approach.
+    Data science is rarely linear. You’ll repeatedly cycle through collecting data,
+    preparing it, training models and evaluating results. Each evaluation
+    highlights new issues (e.g., missing data or unrealistic assumptions) that send
+    you back to earlier stages to improve your approach.
 
----
+______________________________________________________________________
 
-Throughout the course, we'll use hands-on Python examples. By the end, you'll 
+Throughout the course, we'll use hands-on Python examples. By the end, you'll
 apply these skills to a complete project from start to finish.
 
 Let's start by setting up your computer for the data science journey.
diff --git a/docs/data-science/basics/setup.md b/docs/data-science/basics/setup.md
index cf5e0493..db481c14 100644
--- a/docs/data-science/basics/setup.md
+++ b/docs/data-science/basics/setup.md
@@ -1,20 +1,20 @@
 # Setup
 
-To get started, we setup the programming environment. Follow these couple
-of steps to get ready, no prerequisites needed.
+To get started, we setup the programming environment. Follow these couple of
+steps to get ready, no prerequisites needed.
 
 ## Visual Studio Code
 
-First, install a code editor. We urge you to instal Visual Studio Code 
-(VS Code) a free and open-source editor developed by Microsoft 
+First, install a code editor. We urge you to instal Visual Studio Code (VS
+Code) a free and open-source editor developed by Microsoft
 :fontawesome-brands-windows:.
 
-If you don't have Visual Studio Code already installed, download it from their 
+If you don't have Visual Studio Code already installed, download it from their
 website: <https://code.visualstudio.com/>.
 
 ### Profile
 
-To quickstart your VS Code setup, download our profile that includes essential 
+To quickstart your VS Code setup, download our profile that includes essential
 plugins and convenient settings tailored for data science work.
 
 <div class="center-button" markdown>
@@ -29,20 +29,20 @@ The profile comes with the following essential extensions:
 - **Python Debugger** - Debug your Python code
 - **Jupyter** - Work with Jupyter Notebooks directly in VS Code
 
-Additionally, stylistic plugins are included for a more pleasant coding 
-experience and auto-save is enabled by default so you never lose your work. 
+Additionally, stylistic plugins are included for a more pleasant coding
+experience and auto-save is enabled by default so you never lose your work.
 :rocket:
 
 ## `uv`
 
 From the Python course you should already be familiar with the package manager
-`pip`. That background will help you quickly understand `uv`, a modern tool 
-that not only replaces `pip` for package management but also handles Python 
+`pip`. That background will help you quickly understand `uv`, a modern tool
+that not only replaces `pip` for package management but also handles Python
 installations.
 
-**Why the switch?** While `pip` remains widely used and important to 
-understand, this course aims to prepare you for modern real-world projects. 
-`uv` has become a popular, state-of-the-art tool in modern Python development 
+**Why the switch?** While `pip` remains widely used and important to
+understand, this course aims to prepare you for modern real-world projects.
+`uv` has become a popular, state-of-the-art tool in modern Python development
 and learning it now will give you a competitive advantage.
 
 ???+ tip "No prior Python install necessary"
@@ -53,28 +53,30 @@ and learning it now will give you a competitive advantage.
 
 === ":fontawesome-brands-windows: Windows"
 
-    Open Windows Powershell. Visit the `uv` documentation under under 
-    "Standalone installer" [link](https://docs.astral.sh/uv/getting-started/installation/#__tabbed_1_2).
+    Open Windows Powershell. Visit the `uv` documentation under under "Standalone
+    installer"
+    [link](https://docs.astral.sh/uv/getting-started/installation/#__tabbed_1_2).
     Make sure the Windows tab is selected.
-    
+
     Return to PowerShell and paste the installer command shown in the docs.
 
     ![uv standalone installation](../../assets/data-science/basics/setup/uv-win-install.png)
 
 === ":fontawesome-brands-apple: MacOS / :fontawesome-brands-linux: Linux"
 
-    On macOS or Linux, open Terminal. Visit the `uv` documentation under 
-    "Standalone installer", [link](https://docs.astral.sh/uv/getting-started/installation/). 
-    Make sure the macOS or Linux tab is selected.
-    
+    On macOS or Linux, open Terminal. Visit the `uv` documentation under
+    "Standalone installer",
+    [link](https://docs.astral.sh/uv/getting-started/installation/). Make sure the
+    macOS or Linux tab is selected.
+
     Return to your terminal and paste the installer command.
 
 Press ++enter++ to execute the command
 
----
+______________________________________________________________________
 
-Regardless of your operating system, upon completion you should see 
-something like:
+Regardless of your operating system, upon completion you should see something
+like:
 
 ```
 Downloading uv
@@ -84,14 +86,14 @@ Downloading uv
 everything's installed!
 ```
 
-You can now close the Terminal (:fontawesome-brands-apple: macOS / 
-:fontawesome-brands-linux: Linux) or PowerShell (:fontawesome-brands-windows: 
+You can now close the Terminal (:fontawesome-brands-apple: macOS /
+:fontawesome-brands-linux: Linux) or PowerShell (:fontawesome-brands-windows:
 Windows).
 
 ???+ info
 
-    The following steps are OS-agnostic; they are the same for Windows, macOS 
-    and Linux.
+    The following steps are OS-agnostic; they are the same for Windows, macOS and
+    Linux.
 
 ### 1. Create a project
 
@@ -99,19 +101,18 @@ Now, we will cover a typical workflow to set up and initialize a new project.
 
 ???+ info
 
-    A project is a folder that contains all scripts, configuration and data 
-    files that belong together. Everything for the project lives in that 
-    folder.
+    A project is a folder that contains all scripts, configuration and data files
+    that belong together. Everything for the project lives in that folder.
 
-Create a new folder named `data-science` in an easy-to-find location you’ll 
-use throughout this course.
+Create a new folder named `data-science` in an easy-to-find location you’ll use
+throughout this course.
 
-Open VS Code. Go to File → Open Folder…, select the `data-science` folder. 
-VS Code will open a new window.
+Open VS Code. Go to File → Open Folder…, select the `data-science` folder. VS
+Code will open a new window.
 
 ???+ tip
 
-    For more on navigating VS Code, see the Python course chapter: 
+    For more on navigating VS Code, see the Python course chapter:
     [link](../../python-extensive/ide.md)
 
 ### 2. Initialize the project
@@ -123,11 +124,11 @@ uv init --vcs none  # (1)!
 ```
 
 1. With the `--vcs` flag a **v**ersion **c**ontrol **s**ystem can be specified.
-    By default `--vcs git` is set, which initializes a git repository. Since 
+    By default `--vcs git` is set, which initializes a git repository. Since
     git is not within the scope of this project, we set `--vcs` to none.
 
-This initializes the project. `uv` creates a few files in your folder. 
-Your workspace should look like this:
+This initializes the project. `uv` creates a few files in your folder. Your
+workspace should look like this:
 
 <figure markdown="span">
     <img 
@@ -150,16 +151,16 @@ With the project structure:
 
 Click through these new files:
 
-- `.python-version` Contains the Python version used by your virtual 
+- `.python-version` Contains the Python version used by your virtual
     environment.
 - `main.py` An entry script to verify the setup (we’ll revisit this later).
 - `pyproject.toml` Project metadata such as name and version.
-- `README.md` An empty README for a project description; you can ignore it for 
+- `README.md` An empty README for a project description; you can ignore it for
     now.
 
 ### 3. Virtual Environment
 
-With an initialized project we can easily set up a virtual environment. To do 
+With an initialized project we can easily set up a virtual environment. To do
 so simply run:
 
 ```bash
@@ -175,23 +176,24 @@ uv sync
 
 ???+ tip "Virtual Environments?"
 
-    If you need a refresh on virtual environments, what they do and their 
-    purpose, read through the corresponding section in the Python course: 
+    If you need a refresh on virtual environments, what they do and their purpose,
+    read through the corresponding section in the Python course:
     [link](../../python-extensive/packages.md#virtual-environments)
 
 #### What happens during `uv sync`?
 
 When you run `uv sync`, three things happen automatically:
 
-1. **Python installation**: `uv` checks the `.python-version` file and installs 
-   the specified Python version if it's not already available on your machine.
+1. **Python installation**: `uv` checks the `.python-version` file and installs
+    the specified Python version if it's not already available on your
+    machine.
 
-2. **Virtual environment**: A `.venv` folder is created at the root of your 
-   project, containing an isolated Python environment for your project.
+1. **Virtual environment**: A `.venv` folder is created at the root of your
+    project, containing an isolated Python environment for your project.
 
-3. **Dependency locking**: A `uv.lock` file is generated. This file pins all 
-   package versions used in your project, ensuring anyone else can faithfully 
-   recreate the exact same environment.
+1. **Dependency locking**: A `uv.lock` file is generated. This file pins all
+    package versions used in your project, ensuring anyone else can faithfully
+    recreate the exact same environment.
 
 ???+ danger "No manual edits"
 
@@ -199,14 +201,14 @@ When you run `uv sync`, three things happen automatically:
 
 #### Test your setup
 
-Let's verify everything works by running the `main.py` script that was created 
+Let's verify everything works by running the `main.py` script that was created
 during initialization:
 
 ```bash
 uv run main.py
 ```
 
-If you have a similar output, you've successfully created your first project. 
+If you have a similar output, you've successfully created your first project.
 :tada:
 
 ```title=">>> Output"
@@ -215,22 +217,22 @@ Hello from data-science!
 
 ???+ info "No activation needed"
 
-    Notice that the `run` command automatically invokes the project's virtual 
-    environment, meaning you do not have to activate the environment 
-    beforehand. In practice that means you create your scripts and simply 
-    execute them without an activated environment.
+    Notice that the `run` command automatically invokes the project's virtual
+    environment, meaning you do not have to activate the environment beforehand. In
+    practice that means you create your scripts and simply execute them without an
+    activated environment.
 
 ### 4. Packages
 
-Since, we will be working with a couple of different packages, we have to 
+Since, we will be working with a couple of different packages, we have to
 discuss commands for installing and removing packages.
 
 ???+ info "Again, no activation needed"
 
-    Once again, you don't have to activate your environment to install and 
-    remove packages. With `uv`, you can manage dependencies directly from any 
-    terminal in your project folder, the virtual environment is "handled" 
-    automatically in the background.
+    Once again, you don't have to activate your environment to install and remove
+    packages. With `uv`, you can manage dependencies directly from any terminal in
+    your project folder, the virtual environment is "handled" automatically in the
+    background.
 
 To install packages use the `add` command:
 
@@ -244,7 +246,7 @@ replace `<package-name>` for example with `pandas`:
 uv add pandas
 ```
 
-After a successful installation, take some time to open the `pyproject.toml` 
+After a successful installation, take some time to open the `pyproject.toml`
 file. Under dependencies you should find the `pandas` package.
 
 ```toml title="pyproject.toml" hl_lines="7-9" linenums="1"
@@ -259,17 +261,17 @@ dependencies = [
 ]
 ```
 
-The content of `uv.lock` was changed as well, the file contains more info on 
-the installed packages such as `pandas` and its dependencies as well 
-(i.e., `numpy`, `python-dateutil`, `six` and `tzdata`).
+The content of `uv.lock` was changed as well, the file contains more info on
+the installed packages such as `pandas` and its dependencies as well (i.e.,
+`numpy`, `python-dateutil`, `six` and `tzdata`).
 
 ???+ tip "Share a project"
 
     If you share your project, be sure to include the files `.python-version`,
-    `pyproject.toml` and `uv.lock`. These allow for a recreation of your 
-    virtual environment.
+    `pyproject.toml` and `uv.lock`. These allow for a recreation of your virtual
+    environment.
 
----
+______________________________________________________________________
 
 Let's remove the package with the `remove` command:
 
@@ -277,18 +279,20 @@ Let's remove the package with the `remove` command:
 uv remove pandas
 ```
 
-Again, you can check both `pyproject.toml` and `uv.lock` which are 
+Again, you can check both `pyproject.toml` and `uv.lock` which are
 automatically updated accordingly.
 
 ???+ question "Get a script running"
 
     1. Create a new script called `plot.py`
-    2. Paste following example (taken from [matplotlib docs](https://matplotlib.org/stable/gallery/lines_bars_and_markers/curve_error_band.html)) within your script:
+
+    1. Paste following example (taken from
+        [matplotlib docs](https://matplotlib.org/stable/gallery/lines_bars_and_markers/curve_error_band.html))
+        within your script:
 
         ```python title="plot.py" linenums="1"
         import matplotlib.pyplot as plt
         import numpy as np
-
         from matplotlib.patches import PathPatch
         from matplotlib.path import Path
 
@@ -302,22 +306,23 @@ automatically updated accordingly.
         ax.set(aspect=1)
         plt.show()
         ```
-    3. Determine necessary packages to get this script running and install 
-        them with `uv`.
-    4. Lastly, the script with `uv`.
 
+    1. Determine necessary packages to get this script running and install them
+        with `uv`.
+
+    1. Lastly, the script with `uv`.
 
 ## Python Scripts or Jupyter Notebooks?
 
-For this course, you can work with Python scripts (`.py` files) and/or 
-Jupyter Notebooks (`.ipynb` files). Both are supported in VS Code and each has
-its strengths.
+For this course, you can work with Python scripts (`.py` files) and/or Jupyter
+Notebooks (`.ipynb` files). Both are supported in VS Code and each has its
+strengths.
 
 <div class="grid cards" markdown>
 
--   :fontawesome-brands-python:{ .lg .middle } __Python Scripts__
+- :fontawesome-brands-python:{ .lg .middle } __Python Scripts__
 
-    ---
+    ______________________________________________________________________
 
     :fontawesome-regular-thumbs-up: Advantages
 
@@ -326,7 +331,7 @@ its strengths.
     - Runs faster without cell-by-cell overhead
     - Cleaner debugging with standard tools
 
-    ---
+    ______________________________________________________________________
 
     :fontawesome-regular-thumbs-down: Disadvantages
 
@@ -334,9 +339,9 @@ its strengths.
     - Need to rerun entire script for changes
     - Harder to visualize intermediate results
 
--   :simple-jupyter:{ .lg .middle } __Jupyter Notebooks__
+- :simple-jupyter:{ .lg .middle } __Jupyter Notebooks__
 
-    ---
+    ______________________________________________________________________
 
     :fontawesome-regular-thumbs-up: Advantages
 
@@ -345,7 +350,7 @@ its strengths.
     - Combines documentation and code
     - Easier to share findings with non-programmers
 
-    ---
+    ______________________________________________________________________
 
     :fontawesome-regular-thumbs-down: Disadvantages
 
@@ -358,38 +363,38 @@ its strengths.
 
 ???+ tip "Our recommendation"
 
-    Many data scientists use both: notebooks for exploration, scripts for 
-    production. Simply experiment with both. For quick prototyping lean towards
-    a :simple-jupyter: Jupyter Notebook. For more refined code switch to 
-    :fontawesome-brands-python: Python scripts. 
+    Many data scientists use both: notebooks for exploration, scripts for
+    production. Simply experiment with both. For quick prototyping lean towards a
+    :simple-jupyter: Jupyter Notebook. For more refined code switch to
+    :fontawesome-brands-python: Python scripts.
 
----
+______________________________________________________________________
 
 ## Wrap-Up
 
-You've successfully set up your development environment! Throughout this 
-course, you'll create multiple projects using the workflow covered in 
-sections 1-4. Don't worry about memorizing every step—just refer back to this 
-page when needed.
+You've successfully set up your development environment! Throughout this
+course, you'll create multiple projects using the workflow covered in sections
+1-4. Don't worry about memorizing every step—just refer back to this page when
+needed.
 
 For quick reference, here's a cheat sheet:
 
 ???+ note "Cheat Sheet - Project Setup"
 
     1. Create a new folder for your project
-    2. Open the folder in VS Code
-    3. In the terminal, run:
-       ```bash
-       uv init --vcs none
-       uv sync
-       ```
-    4. Install packages as needed:
-       ```bash
-       uv add <package-name>
-       ```
-    5. Run your code:
-       ```bash
-       uv run <script-name>.py
-       ```
-    
+    1. Open the folder in VS Code
+    1. In the terminal, run:
+        ```bash
+        uv init --vcs none
+        uv sync
+        ```
+    1. Install packages as needed:
+        ```bash
+        uv add <package-name>
+        ```
+    1. Run your code:
+        ```bash
+        uv run <script-name>.py
+        ```
+
     **Need help?** Run `uv --help` for more commands and options.
diff --git a/docs/data-science/data/basics.md b/docs/data-science/data/basics.md
index 7ef23f93..9274e0b8 100644
--- a/docs/data-science/data/basics.md
+++ b/docs/data-science/data/basics.md
@@ -1,48 +1,47 @@
 # Data Basics
 
-This chapter kicks off the foundational building blocks of a data science 
-pipeline. We start by taking a closer look at data itself. Understanding 
-different attribute types is crucial for choosing appropriate visualizations, 
+This chapter kicks off the foundational building blocks of a data science
+pipeline. We start by taking a closer look at data itself. Understanding
+different attribute types is crucial for choosing appropriate visualizations,
 preprocessing techniques and machine learning algorithms.
 
 ???+ question "Create a new project"
 
-    1. For this chapter create a new project. Revisit the 
+    1. For this chapter create a new project. Revisit the
         [wrap-up](../basics/setup.md#wrap-up) section from the setup guide.
-    2. Install the packages `seaborn` and `pandas`
+    1. Install the packages `seaborn` and `pandas`
 
 ## Tabular Data
 
-Throughout this course, we will primarily work with **tabular data**, simply 
-think of spreadsheets. Tabular data is organized in a rectangular 
-format with:
+Throughout this course, we will primarily work with **tabular data**, simply
+think of spreadsheets. Tabular data is organized in a rectangular format with:
 
 - **Rows**: Individual observations or samples (e.g., one student)
 - **Columns**: Attributes or features describing each observation (e.g., name,
-  age, average grade)
+    age, average grade)
 
 | Name    | Age | Average Grade |
-|---------|-----|---------------|
+| ------- | --- | ------------- |
 | Claudia | 19  | 1.45          |
 | Stefan  | 22  | 3.4           |
 | Max     | 20  | 2.12          |
 
-Each row represents one student, while each column contains a specific 
+Each row represents one student, while each column contains a specific
 attribute about that student.
 
-Understanding the structure of tabular data is essential because most machine 
-learning algorithms expect data in this format. Now let's explore what types 
-of information each column can contain.
+Understanding the structure of tabular data is essential because most machine
+learning algorithms expect data in this format. Now let's explore what types of
+information each column can contain.
 
 ## Attribute Types
 
-Not all data is created equal. The type of data in each column determines 
-what operations we can perform and which visualizations make sense. We 
-distinguish between two main categories: numerical and categorical data.
+Not all data is created equal. The type of data in each column determines what
+operations we can perform and which visualizations make sense. We distinguish
+between two main categories: numerical and categorical data.
 
 ### Numerical (Quantitative)
 
-Numerical data represents measurable quantities, i.e., values you can perform 
+Numerical data represents measurable quantities, i.e., values you can perform
 mathematical operations on.
 
 ```python
@@ -61,20 +60,20 @@ Maximum temperature: 25.1°C
 
 Numerical data comes in two types:
 
-**Continuous**: Can take any value within a range, including decimals. 
-Examples include temperature (22.5°C), body mass (3750.5g) or height (1.75m).
+**Continuous**: Can take any value within a range, including decimals. Examples
+include temperature (22.5°C), body mass (3750.5g) or height (1.75m).
 
-**Discrete**: Can only take specific, countable values, typically integers. 
+**Discrete**: Can only take specific, countable values, typically integers.
 Examples include number of students (5) or age (22).
 
 ???+ tip
 
-    A simple rule of thumb: If you can meaningfully have fractional values, 
-    it's continuous. If counting whole units makes more sense, it's discrete.
+    A simple rule of thumb: If you can meaningfully have fractional values, it's
+    continuous. If counting whole units makes more sense, it's discrete.
 
 ### Categorical (Qualitative)
 
-Categorical data represents qualities or characteristics that place 
+Categorical data represents qualities or characteristics that place
 observations into groups or categories.
 
 ```python
@@ -84,7 +83,7 @@ print(f"Unique colors: {colors.nunique()}")
 print(f"Most common: {colors.mode().squeeze()}")  # (1)!
 ```
 
-1. The `mode()` method returns a `pd.Series` with a single value, hence we 
+1. The `mode()` method returns a `pd.Series` with a single value, hence we
     `squeeze()` the value.
 
 ```title=">>> Output"
@@ -96,24 +95,24 @@ Categorical data can be further divided into two types:
 
 #### Nominal
 
-Nominal data has no inherent order, the categories are just different names 
-or labels. Examples include colors or country names.
+Nominal data has no inherent order, the categories are just different names or
+labels. Examples include colors or country names.
 
 #### Ordinal
 
-Ordinal data has a meaningful order or ranking between categories, but the 
-distance between categories isn't necessarily equal. Examples include t-shirt 
-sizes (XS, S, M, L, XL) or education levels (High School, Bachelor's, Master's, 
+Ordinal data has a meaningful order or ranking between categories, but the
+distance between categories isn't necessarily equal. Examples include t-shirt
+sizes (XS, S, M, L, XL) or education levels (High School, Bachelor's, Master's,
 PhD).
 
----
+______________________________________________________________________
 
-Now that we understand different data types, let's see them in action with 
-real data.
+Now that we understand different data types, let's see them in action with real
+data.
 
 ## Penguins
 
-We'll use the Palmer Penguins dataset, which contains measurements of three 
+We'll use the Palmer Penguins dataset, which contains measurements of three
 penguin species observed on islands in the Palmer Archipelago, Antarctica.
 
 <figure markdown="span">
@@ -128,13 +127,13 @@ penguin species observed on islands in the Palmer Archipelago, Antarctica.
 
 ???+ info
 
-    The Palmer Penguins dataset was collected and made available by 
-    Dr. Kristen Gorman and the Palmer Station, Antarctica LTER.[^1] It's 
-    become a popular dataset for education.
+    The Palmer Penguins dataset was collected and made available by Dr. Kristen
+    Gorman and the Palmer Station, Antarctica LTER.[^1] It's become a popular
+    dataset for education.
 
-    [^1]:
-        Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. 
-        R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.
+    [^1]: Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago
+    (Antarctica) penguin data. R package version 0.1.0.
+    https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.
 
 <div style="text-align: center">
     <iframe src="https://www.google.com/maps/embed?pb=!1m18!1m12!1m3!1d33687766.0689931!2d-46.851737808150574!3d-43.213299436835!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13.1!3m3!1m2!1s0xbc78dd6dc38c572b%3A0xe609367aeed33087!2sPalmer-Archipel!5e0!3m2!1sde!2sat!4v1770285077736!5m2!1sde!2sat" 
@@ -194,9 +193,9 @@ max      6300.000000
 Name: body_mass_g, dtype: float64
 ```
 
-The mean body mass is roughly 4200g (about 4.2kg or 9.3 pounds), with values 
-ranging from 2700g to 6300g. This variation is quite substantial, the heaviest 
-penguins are more than twice the weight of the lightest ones! The standard 
+The mean body mass is roughly 4200g (about 4.2kg or 9.3 pounds), with values
+ranging from 2700g to 6300g. This variation is quite substantial, the heaviest
+penguins are more than twice the weight of the lightest ones! The standard
 deviation of 802g indicates considerable variability in penguin sizes.
 
 ???+ info "Missing values"
@@ -205,8 +204,8 @@ deviation of 802g indicates considerable variability in penguin sizes.
     `#!python "body_mass_g"`, resulting again in 344 penguins.
 
     For now, we don't worry about missing values as pandas excludes them when
-    applying methods such as the `describe()` method above. The subsequent 
-    chapters will dive into missing values.
+    applying methods such as the `describe()` method above. The subsequent chapters
+    will dive into missing values.
 
 ### Categorical attributes
 
@@ -225,26 +224,28 @@ freq       168
 ```
 
 Notice how pandas automatically infers the data type and calculates appropriate
-metrics. Unlike numerical data, calculating mean, min or max would be 
+metrics. Unlike numerical data, calculating mean, min or max would be
 meaningless for categorical data.
 
 ### Visualizing different attribute types
 
 A key component of data science is visualization, which helps us understand
-patterns and distributions in our data. Different attribute types require 
+patterns and distributions in our data. Different attribute types require
 different visualization approaches.
 
 #### Numerical
 
-For numerical attributes like `#!python "body_mass_g"`, we can create a 
-boxplot which shows the median, quartiles and outliers.
+For numerical attributes like `#!python "body_mass_g"`, we can create a boxplot
+which shows the median, quartiles and outliers.
 
 ???+ tip "Plotting with pandas"
 
-    Both `pandas.DataFrame` and `pandas.Series` objects have a built-in 
-    `plot()` method that provides quick access to various plot types. Check out
-    the documentation for [DataFrame.plot()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)
-    and [Series.plot()](https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.html)
+    Both `pandas.DataFrame` and `pandas.Series` objects have a built-in `plot()`
+    method that provides quick access to various plot types. Check out the
+    documentation for
+    [DataFrame.plot()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)
+    and
+    [Series.plot()](https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.html)
     to see which plots are supported via the `kind` argument.
 
 ```python
@@ -261,14 +262,14 @@ plt.show()
     >
 </figure>
 
-For numerical data, other suitable plots include histograms 
-(`#!python kind="hist"`) for showing distribution patterns, or scatter plots 
+For numerical data, other suitable plots include histograms
+(`#!python kind="hist"`) for showing distribution patterns, or scatter plots
 (`#!python kind="scatter"`) for revealing relationships between two numerical
 variables (like `#!python "flipper_length_mm"` vs. `#!python "body_mass_g"`).
 
 #### Categorical
 
-For categorical data like penguin `#!python "sex"`, a bar chart or pie chart 
+For categorical data like penguin `#!python "sex"`, a bar chart or pie chart
 displays the frequency of each category.
 
 ```python
@@ -285,28 +286,28 @@ plt.show()
     >
 </figure>
 
-The visualization reveals that male and female penguins are nearly equally 
+The visualization reveals that male and female penguins are nearly equally
 distributed in the dataset.
 
 ???+ tip "Choosing the right plot for categorical data"
 
-    While pie charts work well for showing proportions, bar charts are often 
-    preferred when comparing more than 3-4 categories or when precise comparison 
-    of values is important. Try `#!python kind="bar"` to see the difference!
+    While pie charts work well for showing proportions, bar charts are often
+    preferred when comparing more than 3-4 categories or when precise comparison of
+    values is important. Try `#!python kind="bar"` to see the difference!
 
 #### Exercises
 
 ???+ question "Exercise 1: Explore bill length"
 
     1. Calculate basic statistics for `#!python "bill_length_mm"`
-    2. Create a histogram to visualize its distribution
+    1. Create a histogram to visualize its distribution
 
     What's the median bill length? Do you notice any patterns?
 
 ???+ question "Exercise 2: Island distribution"
 
     1. Count how many penguins were observed on each island
-    2. Create a bar chart showing the distribution
+    1. Create a bar chart showing the distribution
 
     Which island has the most penguin observations?
 
@@ -315,7 +316,7 @@ distributed in the dataset.
 In this chapter, we established the foundation for understanding data:
 
 - Tabular data organizes information in rows (observations) and columns
-(attributes/features)
+    (attributes/features)
 - Numerical data represents measurable quantities (continuous or discrete)
 - Categorical data represents groups or categories (nominal or ordinal)
 - Different data types require different visualization approaches
@@ -324,10 +325,9 @@ In this chapter, we established the foundation for understanding data:
 
 Expand your knowledge with these related topics:
 
-- **[Plotting Guide](../../python-extensive/plotting.md)**:
-    Learn to configure plots, add styling, titles and customize visualizations
-- **[Distributions](../../statistics/univariate/Frequency.md)**:
-    Dive deeper into statistical distributions and advanced visualization techniques
+- **[Plotting Guide](../../python-extensive/plotting.md)**: Learn to configure
+    plots, add styling, titles and customize visualizations
+- **[Distributions](../../statistics/univariate/Frequency.md)**: Dive deeper
+    into statistical distributions and advanced visualization techniques
 - **[Pandas Documentation](https://pandas.pydata.org/docs/user_guide/index.html)**:
     Comprehensive guide to data manipulation with pandas
-
diff --git a/docs/data-science/data/preparation.md b/docs/data-science/data/preparation.md
index d582a116..868dd491 100644
--- a/docs/data-science/data/preparation.md
+++ b/docs/data-science/data/preparation.md
@@ -5,40 +5,43 @@
 ???+ info
 
     Starting with this chapter, we will work with ^^adapted data^^ from:
-    
-    ^^S. Moro, P. Cortez and P. Rita (2014). *A Data-Driven Approach to 
-    Predict the Success of Bank Telemarketing*[^1]^^
-    
-    [^1]:
-        Decision Support Systems, Volume 62, June 2014, Pages 22-31:
-        [https://doi.org/10.1016/j.dss.2014.03.001](https://doi.org/10.1016/j.dss.2014.03.001)
-    
-    The publicly available dataset is from a Portuguese retail bank 
-    and houses information on direct marketing campaigns (phone calls). Bank 
-    customers were contacted and asked to subscribe to a term deposit. Using 
-    this practical example, we will explore the realms of:
-    
+
+    ^^S. Moro, P. Cortez and P. Rita (2014). *A Data-Driven Approach to Predict the
+    Success of Bank Telemarketing*[^1]^^
+
+    [^1]: Decision Support Systems, Volume 62, June 2014, Pages 22-31:
+    [https://doi.org/10.1016/j.dss.2014.03.001](https://doi.org/10.1016/j.dss.2014.03.001)
+
+    The publicly available dataset is from a Portuguese retail bank and houses
+    information on direct marketing campaigns (phone calls). Bank customers were
+    contacted and asked to subscribe to a term deposit. Using this practical
+    example, we will explore the realms of:
+
     - Data merging
+
     - Data cleaning
+
     - Data transformation
+
     - Machine learning (with selected algorithms)
+
     - Comparison of model performance
-    - Model persistence (practical guide on how to save and load machine 
-      learning models)
-    
-      Eventually, you will end up with a model that predicts whether a customer
-      will subscribe to a term deposit or not.
+
+    - Model persistence (practical guide on how to save and load machine learning
+        models)
+
+        Eventually, you will end up with a model that predicts whether a customer
+        will subscribe to a term deposit or not.
 
 ## Obtaining the data
 
 ???+ tip "Set up a project"
 
-    Once again, set up a new project with `uv`. Call this one `bank_marketing`.
-    We will perform all steps from data merging to saving the model in this 
-    project.
+    Once again, set up a new project with `uv`. Call this one `bank_marketing`. We
+    will perform all steps from data merging to saving the model in this project.
 
-    If you are having trouble setting up a virtual environment, please refer 
-    to the [`uv` wrap-up](../basics/setup.md#wrap-up) section.
+    If you are having trouble setting up a virtual environment, please refer to the
+    [`uv` wrap-up](../basics/setup.md#wrap-up) section.
 
 Let's dive right in and download both files:
 
@@ -47,7 +50,7 @@ Let's dive right in and download both files:
 [Bank Marketing Social Features :fontawesome-solid-download:](../../assets/data-science/data/bank-social.csv){ .md-button }
 </div>
 
-Place the files in a new folder called `data`. Your project now should look 
+Place the files in a new folder called `data`. Your project now should look
 like this:
 
 ```
@@ -60,28 +63,27 @@ like this:
 
 ???+ question "Open the files"
 
-    Before we start, simply open the files with a text editor.
-    Scroll through both files and read a couple of rows to get acquainted with 
-    the data.
+    Before we start, simply open the files with a text editor. Scroll through both
+    files and read a couple of rows to get acquainted with the data.
 
 ## Read the files
 
-Since we are obviously dealing with two rather large files, we opt to read 
-them with `Python` :fontawesome-brands-python:. At the end of this section
-we end up with a single (clean!) data set.
+Since we are obviously dealing with two rather large files, we opt to read them
+with `Python` :fontawesome-brands-python:. At the end of this section we end up
+with a single (clean!) data set.
 
 ???+ info
-    
-    Conveniently, in our case the data was already collected, saving us hours 
-    and hours of work. Thus, we can focus on the data preparation step. 
-    Since data is commonly obtained from different sources and in various 
-    different formats, both data sets we have at hand (`bank.tsv` and `bank-social.csv`)
-    will mimic theses scenarios.
-
-To start, we are using `pandas` for reading and manipulating data. If you 
-haven't already, install the package within your environment. 
-Assuming your Jupyter Notebook or script is located at the project's root, we
-start by reading the first file :fontawesome-solid-arrow-right: `bank.tsv`.
+
+    Conveniently, in our case the data was already collected, saving us hours and
+    hours of work. Thus, we can focus on the data preparation step. Since data is
+    commonly obtained from different sources and in various different formats, both
+    data sets we have at hand (`bank.tsv` and `bank-social.csv`) will mimic theses
+    scenarios.
+
+To start, we are using `pandas` for reading and manipulating data. If you
+haven't already, install the package within your environment. Assuming your
+Jupyter Notebook or script is located at the project's root, we start by
+reading the first file :fontawesome-solid-arrow-right: `bank.tsv`.
 
 ```python
 import pandas as pd
@@ -89,20 +91,18 @@ import pandas as pd
 data = pd.read_csv("data/bank.tsv", sep="\t")
 ```
 
-Although, we can use a simple single-liner to read the file, there are a 
-couple of things to break down:
-
-1. We are dealing with a tab-separated file, meaning values within the file 
-   are separated by a tab character (`\t`). The fact that we are dealing 
-   with a tab-separated file is indicated by the file extension `.tsv` and 
-   the space surrounding the values within the file.
-2. Although we do not have a `csv` file at hand, `pandas` is versatile enough 
-   to handle different separators. 
-   Thus, we can utilize the `#!python pd.read_csv()` function to read the 
-   file. ==Tip==: All sorts of text files can be usually read with 
-   `#!python pd.read_csv()`.
-3. Lastly, the `sep` parameter is set to `\t` to indicate the tab 
-   separation.
+Although, we can use a simple single-liner to read the file, there are a couple
+of things to break down:
+
+1. We are dealing with a tab-separated file, meaning values within the file are
+    separated by a tab character (`\t`). The fact that we are dealing with a
+    tab-separated file is indicated by the file extension `.tsv` and the space
+    surrounding the values within the file.
+1. Although we do not have a `csv` file at hand, `pandas` is versatile enough
+    to handle different separators. Thus, we can utilize the
+    `#!python pd.read_csv()` function to read the file. ==Tip==: All sorts of
+    text files can be usually read with `#!python pd.read_csv()`.
+1. Lastly, the `sep` parameter is set to `\t` to indicate the tab separation.
 
 Let's read the second file :fontawesome-solid-arrow-right: `bank-social.csv`.
 
@@ -115,56 +115,55 @@ Which separator is used in the file?
 - [x] ; (semicolon)
 - [ ] , (comma)
 
-Values are separated by a semicolon.
-</quiz>
+Values are separated by a semicolon. </quiz>
 
 ???+ question "Read the second file"
-    
-    Simply read the second file (`bank-social.csv`) with `pd.read_csv()` 
-    and specify the appropriate separator. Store the `DataFrame` in a 
-    variable called `data_social`.
+
+    Simply read the second file (`bank-social.csv`) with `pd.read_csv()` and
+    specify the appropriate separator. Store the `DataFrame` in a variable called
+    `data_social`.
 
 ## Duplicated data
 
-Now, with both files in memory, let's examine them closer in order to 
-perform a merge.
+Now, with both files in memory, let's examine them closer in order to perform a
+merge.
 
 === "`print(data.head())`"
 
-       | id | age | default | housing | ... | cons.conf.idx | euribor3m | nr.employed | y  |
-       |----|-----|---------|---------|-----|---------------|-----------|-------------|----|
-       | 1  | 30  | no      | yes     | ... | -46.2         | 1.313     | 5099.1      | no |
-       | 2  | 39  | no      | no      | ... | -36.4         | 4.855     | 5191.0      | no |
-       | 3  | 25  | no      | yes     | ... | -41.8         | 4.962     | 5228.1      | no |
-       | 4  | 38  | no      | unknown | ... | -41.8         | 4.959     | 5228.1      | no |
-       | 5  | 47  | no      | yes     | ... | -42.0         | 4.191     | 5195.8      | no |
+    | id  | age | default | housing | ... | cons.conf.idx | euribor3m | nr.employed | y   |
+    | --- | --- | ------- | ------- | --- | ------------- | --------- | ----------- | --- |
+    | 1   | 30  | no      | yes     | ... | -46.2         | 1.313     | 5099.1      | no  |
+    | 2   | 39  | no      | no      | ... | -36.4         | 4.855     | 5191.0      | no  |
+    | 3   | 25  | no      | yes     | ... | -41.8         | 4.962     | 5228.1      | no  |
+    | 4   | 38  | no      | unknown | ... | -41.8         | 4.959     | 5228.1      | no  |
+    | 5   | 47  | no      | yes     | ... | -42.0         | 4.191     | 5195.8      | no  |
 
-      The rows represent customers and the columns are features of the 
-      customers. The column `y` indicates whether a customer subscribed to a 
-      term deposit or not. Customers are uniquely identified by the `id` 
-      column. Later on, we will have a closer look at the attributes when 
-      modelling the data.
+    The rows represent customers and the columns are features of the customers. The
+    column `y` indicates whether a customer subscribed to a term deposit or not.
+    Customers are uniquely identified by the `id` column. Later on, we will have a
+    closer look at the attributes when modelling the data.
 
 === "`print(data_social.head())`"
 
-      | id   | job           | marital | education           |
-      |------|---------------|---------|---------------------|
-      | 2178 | technician    | married | professional.course |
-      | 861  | blue-collar   | single  | professional.course |
-      | 3020 | technician    | married | professional.course |
-      | 2129 | self-employed | married | basic.9y            |
-      | 3201 | blue-collar   | married | basic.9y            |
+    | id   | job           | marital | education           |
+    | ---- | ------------- | ------- | ------------------- |
+    | 2178 | technician    | married | professional.course |
+    | 861  | blue-collar   | single  | professional.course |
+    | 3020 | technician    | married | professional.course |
+    | 2129 | self-employed | married | basic.9y            |
+    | 3201 | blue-collar   | married | basic.9y            |
 
-      Again, each row represents a customer (uniquely identified with `id`).
-      The remaining columns `job`, `marital`, and `education` are social
-      attributes.
+    Again, each row represents a customer (uniquely identified with `id`). The
+    remaining columns `job`, `marital`, and `education` are social attributes.
 
----
+______________________________________________________________________
 
 Let's examine the shape of both `DataFrame`s as well.
 
 ```python
-print(f"Shape of data: {data.shape}; Shape of data_social: {data_social.shape}")
+print(
+    f"Shape of data: {data.shape}; Shape of data_social: {data_social.shape}"
+)
 ```
 
 ```title=">>> Output"
@@ -172,8 +171,8 @@ Shape of data: (4530, 18); Shape of data_social: (4304, 4)
 ```
 
 The output indicates that `data` contains more observations (customers) than
-`data_social`. However, first and foremost it is good practice to check 
-for duplicated data.
+`data_social`. However, first and foremost it is good practice to check for
+duplicated data.
 
 ```python
 # check for duplicated rows
@@ -191,7 +190,7 @@ data = data.drop_duplicates()
 ```
 
 ???+ question "Check for duplicates"
-    
+
     Check for duplicates in `data_social` and remove them if necessary.
 
 <quiz>
@@ -201,16 +200,15 @@ How many duplicates are present in `data_social`?
 - [ ] 411
 - [x] 376
 
-`data_social` has 376 duplicated rows.
-</quiz>
+`data_social` has 376 duplicated rows. </quiz>
 
 ???+ info "A note on `#!python pd.DataFrame.drop_duplicates()`"
-        
-    By default, the method `#!python pd.DataFrame.drop_duplicates()` removes
-    all duplicated rows. However, you can pass an argument to `subset` in 
-    order to remove duplicates based on specific columns. For example, if we 
-    want to drop duplicates based on the `id` column, we can do so by:
-    
+
+    By default, the method `#!python pd.DataFrame.drop_duplicates()` removes all
+    duplicated rows. However, you can pass an argument to `subset` in order to
+    remove duplicates based on specific columns. For example, if we want to drop
+    duplicates based on the `id` column, we can do so by:
+
     ```python
     data_social = data_social.drop_duplicates(subset=["id"])
     ```
@@ -219,45 +217,45 @@ How many duplicates are present in `data_social`?
 
 ## Merge methods
 
-To combine both data sets we will use the `#!python pd.DataFrame.merge()` 
+To combine both data sets we will use the `#!python pd.DataFrame.merge()`
 method to
 
 > Merge DataFrame or named Series objects with a database-style join
-> 
-> -- <cite>[pandas `merge()`docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html)</cite>
+>
+>
+> <cite>[pandas `merge()`docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html)</cite>
 
-Looking at the `how` parameter we are presented with 5 (!) different options 
-to perform a merge. The most common ones are:
+Looking at the `how` parameter we are presented with 5 (!) different options to
+perform a merge. The most common ones are:
 
 - `#!python "left"`
 - `#!python "right"`
 - `#!python "inner"`
 - `#!python "outer"`
 
-In order to be able to choose the appropriate method, we need to break them 
+In order to be able to choose the appropriate method, we need to break them
 down:
 
 ![Merge types](../../assets/data-science/data/merge-types.png)
 
-- **Left join**: The resulting `DataFrame` will contain all rows from the 
-  left `DataFrame` (data 1) and the matched rows from the right `DataFrame` 
-  (data 2).
-- **Right join**: The resulting `DataFrame` will contain all rows from the 
-  right `DataFrame` (data 2) and the matched rows from the left `DataFrame` 
-  (data 1).
-- **Inner join**: The resulting `DataFrame` will contain only the rows that 
-  have matching values in both `DataFrame`s.
-- **Outer join**: The resulting `DataFrame` will contain all rows from both 
-  `DataFrame`s.
-
+- **Left join**: The resulting `DataFrame` will contain all rows from the left
+    `DataFrame` (data 1) and the matched rows from the right `DataFrame` (data
+    2).
+- **Right join**: The resulting `DataFrame` will contain all rows from the
+    right `DataFrame` (data 2) and the matched rows from the left `DataFrame`
+    (data 1).
+- **Inner join**: The resulting `DataFrame` will contain only the rows that
+    have matching values in both `DataFrame`s.
+- **Outer join**: The resulting `DataFrame` will contain all rows from both
+    `DataFrame`s.
 
 ### Perform merges
 
-To get further acquainted with merge methods, we simply perform them all.
-But first, we need to pick a column which uniquely identifies a row (customer)
-in both data sets. Conveniently, we have the `id` column. Regardless
-of the merge we perform, the parameter `on` requires a column to match the 
-rows (in our case we set `#!python on="id"`).
+To get further acquainted with merge methods, we simply perform them all. But
+first, we need to pick a column which uniquely identifies a row (customer) in
+both data sets. Conveniently, we have the `id` column. Regardless of the merge
+we perform, the parameter `on` requires a column to match the rows (in our case
+we set `#!python on="id"`).
 
 ```python
 left_join = data.merge(data_social, on="id", how="left")
@@ -268,7 +266,7 @@ outer_join = data.merge(data_social, on="id", how="outer")
 
 #### A closer look
 
-To comprehend on of these merges, have a look at the resulting shape and the 
+To comprehend on of these merges, have a look at the resulting shape and the
 identifiers the `DataFrame` contains. Let's examine the `right_join`:
 
 ```python
@@ -286,64 +284,61 @@ All identifiers from data_social are present? True
 
 A breakdown of the code snippet:
 
-1. `equal_nrows` indicates that all rows from `data_social` are present in 
-   `right_join`.
+1. `equal_nrows` indicates that all rows from `data_social` are present in
+    `right_join`.
 
     ???+ info
-    
-        `#!python len(data_social)` is equivalent to 
+
+        `#!python len(data_social)` is equivalent to
         `#!python data_social.shape[0]`.
 
-2. To verify that `right_join` contains all identifiers of `data_social`, 
-   we make use of the `#!python pd.Series.isin()` method. This method checks 
-   whether each element of a `Series` is contained in another `Series`.
+1. To verify that `right_join` contains all identifiers of `data_social`, we
+    make use of the `#!python pd.Series.isin()` method. This method checks
+    whether each element of a `Series` is contained in another `Series`.
 
     ???+ info
-    
-        `#!python pd.Series.all()` returns `#!python True` if all elements in 
-        the `Series` are `#!python True`.
-    
+
+        `#!python pd.Series.all()` returns `#!python True` if all elements in the
+        `Series` are `#!python True`.
+
 ???+ question "Counter check"
 
     Extend, the previous code snippet:
 
     1. Do the number of rows from `data` and `right_join` match?
-    2. Are all identifiers from `data` present in `right_join`?
+    1. Are all identifiers from `data` present in `right_join`?
 
-Our examinations should conclude that `right_join` contains all 
-rows/customer data from `data_social` and solely the matching records 
-from `data`.
+Our examinations should conclude that `right_join` contains all rows/customer
+data from `data_social` and solely the matching records from `data`.
 
----
+______________________________________________________________________
 
 ???+ question "Examine remaining merges"
 
-    Look at the shapes of the remaining merges (`left_join`, 
-    `inner_join`, `outer_join`) to get a better understanding of the
-    different merge methods and its results.
-    
-    Compare the shape of each merge with the shapes of `data` and 
-    `data_social`.
-    
+    Look at the shapes of the remaining merges (`left_join`, `inner_join`,
+    `outer_join`) to get a better understanding of the different merge methods and
+    its results.
+
+    Compare the shape of each merge with the shapes of `data` and `data_social`.
+
 ### Final merge (application)
 
-With a solid understanding of different merge methods, we revisit our 
-initial task to join both data sets. But how can we choose the appropriate 
-method? The short answer; it depends on your data and task at hand. 
-There is no method that fits all scenarios.
+With a solid understanding of different merge methods, we revisit our initial
+task to join both data sets. But how can we choose the appropriate method? The
+short answer; it depends on your data and task at hand. There is no method that
+fits all scenarios.
 
 <blockquote class="reddit-embed-bq" style="height:500px" data-embed-height="740"><a href="https://www.reddit.com/r/ProgrammerHumor/comments/13oxtqs/sql_explained/">SQL Explained</a><br> by<a href="https://www.reddit.com/user/UnorthodoxPrimitive/">u/UnorthodoxPrimitive</a> in<a href="https://www.reddit.com/r/ProgrammerHumor/">ProgrammerHumor</a></blockquote><script async="" src="https://embed.reddit.com/widgets.js" charset="UTF-8"></script>
 
-Since we are eventually fitting a machine learning model on the data, we are 
-interested in customer data that is present in both data sets. I.e., we 
-want to prevent the introduction of additional missing values that would 
-result from an outer join. Furthermore, a left join would leave us with missing
-attributes from `social_data`. The same applies to a right join, just vice 
-versa.
+Since we are eventually fitting a machine learning model on the data, we are
+interested in customer data that is present in both data sets. I.e., we want to
+prevent the introduction of additional missing values that would result from an
+outer join. Furthermore, a left join would leave us with missing attributes
+from `social_data`. The same applies to a right join, just vice versa.
 
-Long story short, we opt for an `#!python "inner"` merge (or join) which 
-leaves us with only the customers that are present in both data sets. The 
-final merge is as simple as:
+Long story short, we opt for an `#!python "inner"` merge (or join) which leaves
+us with only the customers that are present in both data sets. The final merge
+is as simple as:
 
 ```python
 data_merged = data.merge(data_social, on="id", how="inner")
@@ -360,7 +355,7 @@ Shape of data_merged: (3928, 21)
 ```
 
 We end up with `#!python 3928` customers that are present in both data sets.
-Lastly, we can write the merged data set to a new file. Let's use a common 
+Lastly, we can write the merged data set to a new file. Let's use a common
 format :fontawesome-solid-arrow-right: `csv` with the default `,` as separator.
 
 ```python
@@ -370,8 +365,9 @@ data_merged.to_csv("data/bank-merged.csv", index=False)
 With `#!python index=False`, we do ==not==
 
 > Write row names (index).
-> 
-> -- <cite>[pandas `to_csv()` docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html)</cite>
+>
+>
+> <cite>[pandas `to_csv()` docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html)</cite>
 
 <div style="text-align: center;">
 
@@ -384,8 +380,8 @@ With `#!python index=False`, we do ==not==
 
 ## Recap
 
-Using the bank marketing data, we have seen how to find and remove duplicated 
+Using the bank marketing data, we have seen how to find and remove duplicated
 data, explored different merge methods and ended up with a single data set.
 
-In the next chapter, we will explore this data further, look for missing 
-values and perform some basic data transformations.
+In the next chapter, we will explore this data further, look for missing values
+and perform some basic data transformations.
diff --git a/docs/data-science/data/preprocessing.md b/docs/data-science/data/preprocessing.md
index 6838c776..18babe7f 100644
--- a/docs/data-science/data/preprocessing.md
+++ b/docs/data-science/data/preprocessing.md
@@ -1,30 +1,31 @@
 # Data Preprocessing
 
 ![Continue your quest!](../../assets/data-science/data/continue-quest.png)
+
 <figcaption style="text-align: center;">
     Continue your data preprocessing quest! 🧙‍♂️
 </figcaption>
 
 ???+ info "`scikit-learn`"
-    
+
     This chapter introduces the package
-    [`scikit-learn`](https://scikit-learn.org/stable/), the swiss-army knife
-    for data preprocessing, transformation and machine learning.
+    [`scikit-learn`](https://scikit-learn.org/stable/), the swiss-army knife for
+    data preprocessing, transformation and machine learning.
 
-    We will continue to work with the Portuguese retail bank data 
-    set[^1] and preprocess it further. Alongside we start to explore 
-    `scikit-learn`'s functionalities.
+    We will continue to work with the Portuguese retail bank data set[^1] and
+    preprocess it further. Alongside we start to explore `scikit-learn`'s
+    functionalities.
 
-    [^1]:
-        Decision Support Systems, Volume 62, June 2014, Pages 22-31:
-        [https://doi.org/10.1016/j.dss.2014.03.001](https://doi.org/10.1016/j.dss.2014.03.001)
+    [^1]: Decision Support Systems, Volume 62, June 2014, Pages 22-31:
+    [https://doi.org/10.1016/j.dss.2014.03.001](https://doi.org/10.1016/j.dss.2014.03.001)
 
-    Check-out the excellent [scikit-learn documentation](https://scikit-learn.org/stable/).
+    Check-out the excellent
+    [scikit-learn documentation](https://scikit-learn.org/stable/).
 
 ## Prerequisites
 
-If you have followed the previous chapter closely, your project structure 
-looks like this:
+If you have followed the previous chapter closely, your project structure looks
+like this:
 
 ```plaintext hl_lines="2 5"
 📁 bank_marketing/
@@ -35,12 +36,12 @@ looks like this:
 └───── 📄 bank-social.csv
 ```
 
-With `bank-merged.csv` being the `#!python "inner"` join of `bank.csv` and 
-`social.csv`, minus all duplicated customer data. 
+With `bank-merged.csv` being the `#!python "inner"` join of `bank.csv` and
+`social.csv`, minus all duplicated customer data.
 
-If you are missing the file `bank-merged.csv`, we strongly recommend you to 
-go back and complete the previous chapter. For the sake of completeness, 
-we provide a distilled version of the code from 
+If you are missing the file `bank-merged.csv`, we strongly recommend you to go
+back and complete the previous chapter. For the sake of completeness, we
+provide a distilled version of the code from
 [Data preparation](preparation.md):
 
 ??? info "Merge the data sets"
@@ -48,33 +49,32 @@ we provide a distilled version of the code from
     ```python linenums="1"
     # Steps from the Data preparation chapter
     import pandas as pd
-    
+
     data = pd.read_csv("bank.tsv", sep="\t")
     data_social = pd.read_csv("bank-social.csv", sep=";")
-    
+
     data = data.drop_duplicates()
     data_social = data_social.drop_duplicates()
-    
+
     data_merged = data.merge(data_social, on="id", how="inner")
     data_merged.to_csv("data/bank-merged.csv", index=False)
     ```
-    
+
     <div class="center-button" markdown>
     [Merged data :fontawesome-solid-download:](../../assets/data-science/data/bank-merged.csv){ .md-button }
     </div>
 
-Again, we urge you to use a virtual environment which by now, should be second 
+Again, we urge you to use a virtual environment which by now, should be second
 nature anyway.
 
 ???+ info
 
-    To follow along, create a new script or Jupyter notebook within your 
-    project.
+    To follow along, create a new script or Jupyter notebook within your project.
 
 ## Missing values
 
-After dropping duplicates and merging the data, the next step is to check 
-for missing values. First, we read the data.
+After dropping duplicates and merging the data, the next step is to check for
+missing values. First, we read the data.
 
 ```python
 import pandas as pd
@@ -88,16 +88,16 @@ We chain a couple of methods to count the missing values in each column.
 print(data.isna().sum())
 ```
 
-The `#!python isna()` method checks each element and whether it is a missing 
-value or not. The result is a `DataFrame` with boolean values of the same 
-shape as the initial `DataFrame` (in our case `data`), with `#!python True` 
-being a missing value. With the chaining of `#!python sum()` we simply sum the 
+The `#!python isna()` method checks each element and whether it is a missing
+value or not. The result is a `DataFrame` with boolean values of the same shape
+as the initial `DataFrame` (in our case `data`), with `#!python True` being a
+missing value. With the chaining of `#!python sum()` we simply sum the
 `#!python True` values (missing values) for each column.
 
 A truncated version of the output is shown below:
 
 | Column    | Missing Values |
-|-----------|----------------|
+| --------- | -------------- |
 | id        | 0              |
 | age       | 0              |
 | default   | 0              |
@@ -108,16 +108,16 @@ A truncated version of the output is shown below:
 | education | 0              |
 | ...       | ...            |
 
-It seems like the columns have no missing values. To sum missing values of 
-the whole `DataFrame`, we can chain another `#!python sum()`.
+It seems like the columns have no missing values. To sum missing values of the
+whole `DataFrame`, we can chain another `#!python sum()`.
 
 ```python
 print(data.isna().sum().sum())
 ```
 
-The output once more indicates that the whole data set has `#!python 0` 
-missing values. So far so good, but this is not the end of the story (who 
-saw that coming 🤯).
+The output once more indicates that the whole data set has `#!python 0` missing
+values. So far so good, but this is not the end of the story (who saw that
+coming 🤯).
 
 <div style="text-align: center;">
     <iframe src="https://giphy.com/embed/aWPGuTlDqq2yc" width="480" height="254" style="" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/celebrity-reshuffle-aWPGuTlDqq2yc"></a></p>
@@ -126,45 +126,45 @@ saw that coming 🤯).
     </figcaption>
 </div>
 
-Although, it seems like we don't have to bother with missing values, they 
-are simply a bit more hidden.
+Although, it seems like we don't have to bother with missing values, they are
+simply a bit more hidden.
 
 ### Missing values in disguise
 
-`pandas` considers types like `#!python None` or `#!python np.nan` as 
-missing. However in practice, missing values are encoded in various ways.
-For instance, strings like `#!python "NA"` or integers like `#!python -999` 
-are used. Consequently, we can't detect these ways of encoding with 
-simply calling `#!python isna()`.
+`pandas` considers types like `#!python None` or `#!python np.nan` as missing.
+However in practice, missing values are encoded in various ways. For instance,
+strings like `#!python "NA"` or integers like `#!python -999` are used.
+Consequently, we can't detect these ways of encoding with simply calling
+`#!python isna()`.
 
-Since we have to manually detect these encoded missing values, it is 
-essential to have a good understanding of the data. Let's get more 
-familiarized with the data.
+Since we have to manually detect these encoded missing values, it is essential
+to have a good understanding of the data. Let's get more familiarized with the
+data.
 
-Visit the UCI Machine Learning Repository 
-[here](https://archive.ics.uci.edu/dataset/222/bank+marketing) which also 
-hosts the data set and some additional information. Interestingly, the section 
+Visit the UCI Machine Learning Repository
+[here](https://archive.ics.uci.edu/dataset/222/bank+marketing) which also hosts
+the data set and some additional information. Interestingly, the section
 *Dataset Information* states:
 
 > **Has Missing Values?**
 >
 > No
 
-Although that might be technical correct (the data contains no empty values), 
+Although that might be technical correct (the data contains no empty values),
 we have to dig deeper.
 
 ???+ question "Detect the encoding of missing values"
-    
-    Open the [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/222/bank+marketing).
-    Look at the *Variables Table*. How are the missing values encoded in the 
-    data set?
+
+    Open the
+    [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/222/bank+marketing).
+    Look at the *Variables Table*. How are the missing values encoded in the data
+    set?
 
     Use the following quiz question to validate your answer.
 
-    Remember, the bigger picture :fontawesome-solid-arrow-right:
-    by getting more familiar with the data, we can train a better fitting 
-    model to predict the target variable `y` (subscribed to term deposit or 
-    not).
+    Remember, the bigger picture :fontawesome-solid-arrow-right: by getting more
+    familiar with the data, we can train a better fitting model to predict the
+    target variable `y` (subscribed to term deposit or not).
 
 <quiz>
 How are missing values encoded in this specific data set?
@@ -180,16 +180,16 @@ education).
 
 ### Missing values uncovered
 
-Now that we uncovered the encoding of missing values, we replace them with 
+Now that we uncovered the encoding of missing values, we replace them with
 `#!python None` to properly detect them and handle them more easily.
 
 ???+ question "Replace encoding with `#!python None`"
-    
-    Since, you've detected the particular encoding of missing values, replace 
-    them with `#!python None` across the whole data frame.
-    
-    Use the `DataFrame.replace()` method and read the 
-    [docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html), 
+
+    Since, you've detected the particular encoding of missing values, replace them
+    with `#!python None` across the whole data frame.
+
+    Use the `DataFrame.replace()` method and read the
+    [docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html),
     especially the *Examples* section for usage guidance.
 
 After solving the question, we (again) sum up the missing values per column.
@@ -201,7 +201,7 @@ print(data.isna().sum())
 A truncated version of the output:
 
 | Column    | Missing Values |
-|-----------|----------------|
+| --------- | -------------- |
 | id        | 0              |
 | age       | 0              |
 | default   | 760            |
@@ -212,8 +212,8 @@ A truncated version of the output:
 | education | 161            |
 | ...       | ...            |
 
-At first glance, a lot of columns contain missing values. Let's calculate 
-the ratio to get a better feeling.
+At first glance, a lot of columns contain missing values. Let's calculate the
+ratio to get a better feeling.
 
 ```python hl_lines="4"
 count_missing = data.isna().sum()
@@ -224,7 +224,7 @@ print(missing_ratio.round(2))
 ```
 
 | Column    | Missing Values (%) |
-|-----------|--------------------|
+| --------- | ------------------ |
 | id        | 0.00               |
 | age       | 0.00               |
 | default   | 19.35              |
@@ -235,28 +235,28 @@ print(missing_ratio.round(2))
 | education | 4.10               |
 | ...       | ...                |
 
-Compared to the initial observation where we found `#!python 0` 
-missing values across the whole data set, it's a stark contrast.
+Compared to the initial observation where we found `#!python 0` missing values
+across the whole data set, it's a stark contrast.
 
-Looking at the attribute *default*, nearly a fifth of the observations are 
-missing (19.35 %). Other attributes contain less missing values, yet we still 
-need to handle them. Therefore, we explore different strategies to deal 
-with missing values.
+Looking at the attribute *default*, nearly a fifth of the observations are
+missing (19.35 %). Other attributes contain less missing values, yet we still
+need to handle them. Therefore, we explore different strategies to deal with
+missing values.
 
 ???+ info
 
-    Though it might not seem much, being able to detect these missing values 
-    will prove invaluable in the future.
+    Though it might not seem much, being able to detect these missing values will
+    prove invaluable in the future.
 
-    By identifying and properly handling these gaps, we might be able to 
-    train a better fitting model as unaddressed missing values can lead to 
-    biased predictions. Most importantly, most algorithms can't handle 
-    missing values at all.
+    By identifying and properly handling these gaps, we might be able to train a
+    better fitting model as unaddressed missing values can lead to biased
+    predictions. Most importantly, most algorithms can't handle missing values at
+    all.
 
 ### Sources
 
-We have extensively covered how to detect missing values but have not 
-talked about their possible origins.
+We have extensively covered how to detect missing values but have not talked
+about their possible origins.
 
 The reasons for missing values can be manifold:
 
@@ -273,8 +273,8 @@ The reasons for missing values can be manifold:
 
 ### Drop columns/rows
 
-One simple way to handle missing values is to drop (i.e. remove) the 
-respective columns which contain any missing values.
+One simple way to handle missing values is to drop (i.e. remove) the respective
+columns which contain any missing values.
 
 ```python
 data_dropped = data.dropna(axis=1)
@@ -282,33 +282,35 @@ data_dropped = data.dropna(axis=1)
 
 `#!python axis=1` specified the columns to be dropped.
 
-To comprehend the impact of this operation, we calculate the number of 
-columns that were removed.
+To comprehend the impact of this operation, we calculate the number of columns
+that were removed.
 
 ```python
 print(data.shape[1] - data_dropped.shape[1])
 ```
+
 This operation removed `#!python 6` out of `#!python 21` columns/attributes.
 
 ???+ question "Remove rows with missing values"
 
-    Contrary, we can leave all columns and instead drop the rows containing 
-    missing values.
+    Contrary, we can leave all columns and instead drop the rows containing missing
+    values.
 
-    1. Use the [`DataFrame.dropna()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html)
-    method to remove rows with missing values.
-    2. Calculate the number of rows that were removed.
+    1. Use the
+        [`DataFrame.dropna()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html)
+        method to remove rows with missing values.
+    1. Calculate the number of rows that were removed.
 
 #### Set a threshold
 
-Instead of dropping all rows/columns with gaps, we can set a threshold to only 
+Instead of dropping all rows/columns with gaps, we can set a threshold to only
 drop columns/rows with a certain amount of missing values.
 
 To specify a threshold, make use of the `thresh` parameter, which takes an
-`#!python int` value of ^^non-missing^^ values that a column/row must have, 
-to ^^not^^ be dropped.
+`#!python int` value of ^^non-missing^^ values that a column/row must have, to
+^^not^^ be dropped.
 
-As an example, we would like to remove all columns holding more than 10 % 
+As an example, we would like to remove all columns holding more than 10 %
 missing values.
 
 ```python hl_lines="7 10"
@@ -326,8 +328,8 @@ diff = data.shape[1] - data_dropped_threshold.shape[1]
 print(f"Number of columns dropped: {diff}")
 ```
 
-1.  The `#!python math.ceil()` function is used to round up the threshold 
-    value to the next integer.
+1. The `#!python math.ceil()` function is used to round up the threshold value
+    to the next integer.
 
 ```title=">>> Output"
 3535.2000000000003
@@ -335,34 +337,35 @@ print(f"Number of columns dropped: {diff}")
 Number of columns dropped: 1
 ```
 
-A single column was dropped and therefore exceeded the 10 % threshold of 
+A single column was dropped and therefore exceeded the 10 % threshold of
 missing values.
 
----
+______________________________________________________________________
 
-Depending on the data at hand, dropping rows or columns might be a valid 
-option, if you're dealing with a small number of missing values. However, in 
+Depending on the data at hand, dropping rows or columns might be a valid
+option, if you're dealing with a small number of missing values. However, in
 other cases these operations might lead to a significant loss of information.
-Since, we are dealing with a substantial amount of missing values, we are 
+Since, we are dealing with a substantial amount of missing values, we are
 looking for more sophisticated ways to handle them.
 
 ### Imputation techniques
 
-What about filling in the missing values? The process of replacing missing 
+What about filling in the missing values? The process of replacing missing
 values is called imputation.
 
 ![](../../assets/data-science/data/imputation.gif)
+
 <figcaption style="text-align: center;">
     Data imputation
 </figcaption>
 
-There are various imputation techniques available, each with its own
-advantages and disadvantages.
+There are various imputation techniques available, each with its own advantages
+and disadvantages.
 
 ##### Fill manually
 
-Of course, there is always the option to fill the values manually which 
-could be time-consuming and infeasible for large data sets.
+Of course, there is always the option to fill the values manually which could
+be time-consuming and infeasible for large data sets.
 
 ##### Global constant
 
@@ -373,17 +376,17 @@ constant, i.e., filling gaps across ^^all^^ columns with the same value.
 data_filled = data.fillna("no")
 ```
 
-This method is straightforward and easy to implement. However, there are 
-some drawbacks:
+This method is straightforward and easy to implement. However, there are some
+drawbacks:
 
 - how to choose the global constant?
-- introduces further challenges with mixed attributes (i.e., 
-  nominal/ordinal and numerical attributes)
+- introduces further challenges with mixed attributes (i.e., nominal/ordinal
+    and numerical attributes)
 
 ##### Central tendency
 
-Another common approach is to replace missing values with the mean, median,
-or mode of the respective column.
+Another common approach is to replace missing values with the mean, median, or
+mode of the respective column.
 
 Fill a nominal attribute with the mode:
 
@@ -413,31 +416,31 @@ np.float64(40.1433299389002)
 
 ???+ info
 
-    Since the bank data does not contain any numerical attribute with 
-    missing values, the above code snippet assumed gaps in *age*. As there 
-    are none, the operation did not change the data. 
+    Since the bank data does not contain any numerical attribute with missing
+    values, the above code snippet assumed gaps in *age*. As there are none, the
+    operation did not change the data.
 
 #### Machine Learning
 
 Lastly, we can use machine learning algorithms to predict the missing values.
-The idea is to estimate the missing values based on the other attributes. 
+The idea is to estimate the missing values based on the other attributes.
 Linear regression, k-nearest neighbors, or decision trees are common choices.
 
 ???+ info
 
-    As we have not covered machine learning yet, we won't get into the details.
-    But feel free to return to this section. Especially, 
+    As we have not covered machine learning yet, we won't get into the details. But
+    feel free to return to this section. Especially,
     [this](https://scikit-learn.org/stable/auto_examples/impute/plot_missing_values.html)
-    scikit-learn comparison of imputation techniques (including k-nearest 
+    scikit-learn comparison of imputation techniques (including k-nearest
     neighbors) is a good starting point for further exploration.
 
 ## Transformation
 
-Step by step, we are getting closer to actually training a machine learning 
+Step by step, we are getting closer to actually training a machine learning
 model. Beforehand, we introduce data transformations that are commonly applied
 to improve the fit of the model.
 
-For starters, install the `scikit-learn` package within your activated 
+For starters, install the `scikit-learn` package within your activated
 environment.
 
 ```bash
@@ -456,20 +459,20 @@ From now on, we will heavily use `scikit-learn`'s functionalities.
 
 ### Discretize numerical attributes
 
-When dealing with noisy data, it is often beneficial to discretize 
-numerical (continuous) attributes.
+When dealing with noisy data, it is often beneficial to discretize numerical
+(continuous) attributes.
 
 ???+ info "Noise in data"
 
-    Noise is a random error or variance in a measured variable. It is 
-    meaningless information that can distort the data.
-    
-    Noise can be identified using basic statistical methods and 
-    visualization techniques like boxplots or scatter plots.
+    Noise is a random error or variance in a measured variable. It is meaningless
+    information that can distort the data.
+
+    Noise can be identified using basic statistical methods and visualization
+    techniques like boxplots or scatter plots.
 
-The process of discretizing is called binning. I.e., the continuous data 
-is separated into intervals (bins).
-Bins can generally lead to a smoothing effect which in turn reduce the noise.
+The process of discretizing is called binning. I.e., the continuous data is
+separated into intervals (bins). Bins can generally lead to a smoothing effect
+which in turn reduce the noise.
 
 As an example, we pick the attribute *age* and visualize it with a boxplot.
 
@@ -481,21 +484,21 @@ As an example, we pick the attribute *age* and visualize it with a boxplot.
 ??? tip "Create a static boxplot"
 
     To create a static version of the boxplot, perfect for a quick overview:
-      
+
     ```python
     import matplotlib.pyplot as plt
-    
+
     data["age"].plot(kind="box")  # (1)!
     plt.show()
     ```
-    
-    1.  The `#!python plot()` method uses `matplotlib` as backend.
-  
+
+    1. The `#!python plot()` method uses `matplotlib` as backend.
+
     <div style="text-align: center;">
         <img src="/assets/data-science/data/age-boxplot.svg" alt="Age boxplot">
     </div>
 
-Since, *age* contains outliers, we discretize the attribute *age* into five 
+Since, *age* contains outliers, we discretize the attribute *age* into five
 bins with the same width.
 
 ```python
@@ -506,12 +509,12 @@ bins.fit(data[["age"]])
 age_binned = bins.transform(data[["age"]])  # (1)!
 ```
 
-1.  The additional square brackets in `#!python data[["age"]]` are used to 
-    select the column *age* as a `DataFrame` (instead of a `Series`). 
-    This is necessary for the `#!python transform()` method as a 
-    two-dimensional input is required.
+1. The additional square brackets in `#!python data[["age"]]` are used to
+    select the column *age* as a `DataFrame` (instead of a `Series`). This is
+    necessary for the `#!python transform()` method as a two-dimensional input
+    is required.
 
-The above snippet returns 5 bins with a width of 14 years. Inspect the bin 
+The above snippet returns 5 bins with a width of 14 years. Inspect the bin
 edges with:
 
 ```python
@@ -522,34 +525,33 @@ print(bins.bin_edges_)
 [array([18., 32., 46., 60., 74., 88.])]
 ```
 
-Though the actual binning is just two three lines of code, we have a couple of 
+Though the actual binning is just two three lines of code, we have a couple of
 things to dissect.
 
 ???+ tip "Working with `scikit-learn`"
 
-    Although the package is named `scikit-learn`, it is imported as 
-    `#!python import sklearn`. Package names on 
-    [PyPI (Python Package Index)](../../python/packages.md/#pypi)
-    can be different from the import name.
+    Although the package is named `scikit-learn`, it is imported as
+    `#!python import sklearn`. Package names on
+    [PyPI (Python Package Index)](../../python/packages.md/#pypi) can be different
+    from the import name.
 
-    ---
+    ______________________________________________________________________
 
-    `scikit-learn` frequently uses classes (e.g., `KBinsDiscretizer`)
-    to represent different models and preprocessing techniques. Two important 
-    methods that many of these classes implement are `fit` and `transform`.
+    `scikit-learn` frequently uses classes (e.g., `KBinsDiscretizer`) to represent
+    different models and preprocessing techniques. Two important methods that many
+    of these classes implement are `fit` and `transform`.
 
-    - `#!python fit(X)`: This method is used to learn the parameters from the 
-    data (referred to as `X`). 
-    
-    - `#!python transform(X)`: This method is used to apply the learned 
-    parameters to the data :fontawesome-solid-arrow-right: `X`.
+    - `#!python fit(X)`: This method is used to learn the parameters from the data
+        (referred to as `X`).
 
-    Put simply, think about the `#!python fit(X)` method as scikit-learn takes 
-    a look at the data and learns from it. The `#!python transform(X)` 
-    method then transfers this knowledge and applies it to the data.
+    - `#!python transform(X)`: This method is used to apply the learned parameters
+        to the data :fontawesome-solid-arrow-right: `X`.
 
-    The `#!python fit_transform()` method combines both of these steps in one.
+    Put simply, think about the `#!python fit(X)` method as scikit-learn takes a
+    look at the data and learns from it. The `#!python transform(X)` method then
+    transfers this knowledge and applies it to the data.
 
+    The `#!python fit_transform()` method combines both of these steps in one.
 
 Alternatively, use `#!python strategy="quantile"` to bin the data based on
 quantiles and thus create bins with the same number of observations.
@@ -565,12 +567,13 @@ print(bins.bin_edges_)
 [array([18., 31., 36., 41., 50., 88.])]
 ```
 
-No matter the strategy `#!python "uniform"` or `#!python "quantile"`, a 
-matrix is returned with the
+No matter the strategy `#!python "uniform"` or `#!python "quantile"`, a matrix
+is returned with the
 
 > bin identifier encoded as an integer value.
-> 
-> [`KBinsDiscretizer` docs](https://scikit-learn.> org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html)
+>
+> [`KBinsDiscretizer` docs](https://scikit-learn.>
+> org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html)
 
 ### Normalization
 
@@ -589,10 +592,10 @@ Min-Max normalization scales the data to a fixed range, usually [0, 1].
     X' = \frac{X - X_{min}}{X_{max} - X_{min}}
     \]
 
-    where \(X\) is the original value, \(X_{min}\) is the minimum value of the 
+    where \(X\) is the original value, \(X_{min}\) is the minimum value of the
     feature, and \(X_{max}\) is the maximum value of the feature.
 
-This technique is useful when you want to ensure that all features have the 
+This technique is useful when you want to ensure that all features have the
 same scale without distorting differences in the ranges of values.
 
 To illustrate the normalization, we use the attribute *euribor3m* (3 month
@@ -601,7 +604,7 @@ Euribor rate).
 > Euribor is short for Euro Interbank Offered Rate. The Euribor rates are based
 > on the average interest rates at which a large panel of European banks borrow
 > funds from one another.
-> 
+>
 > [euribor-rates.eu](https://www.euribor-rates.eu/en/)
 
 ```python
@@ -634,17 +637,17 @@ Min: 0.0, Max: 1.0
 ???+ question "Normalization of new data"
 
     Assume new data is added:
-    
+
     ```python
     new_data = pd.DataFrame({"euribor3m": [0.5, 5.0, 2.5]})
     ```
-    We would like to transform these three new interest rates using the Min 
-    Max normalization.
-    Remember that the `MinMaxScaler` was already fitted on the original 
-    data with \(X_{min}=0.635\) and \(X_{max}=4.97\).
 
-    Answer the following quiz question. Look at the formula again and try 
-    to answer the question without executing code.
+    We would like to transform these three new interest rates using the Min Max
+    normalization. Remember that the `MinMaxScaler` was already fitted on the
+    original data with \(X_{min}=0.635\) and \(X_{max}=4.97\).
+
+    Answer the following quiz question. Look at the formula again and try to answer
+    the question without executing code.
 
 <quiz>
 What happens if you call `#!python transform(new_data)`?
@@ -652,19 +655,20 @@ What happens if you call `#!python transform(new_data)`?
 - [ ] The new data is normalized.
 - [x] The normalization works, but the range [0, 1] is not preserved.
 
-Since the newly added Euribor rates of 0.5 and 5.0, are lower or
-higher than the previous minimum and maximum respectively, the normalization
-will not preserve the range [0, 1], i.e. resulting in the normalized values:
+Since the newly added Euribor rates of 0.5 and 5.0, are lower or higher than
+the previous minimum and maximum respectively, the normalization will not
+preserve the range [0, 1], i.e. resulting in the normalized values:
 
 ```python
 [[-0.03114187], [1.00692042], [0.43021915]]
 ```
+
 </quiz>
 
 #### Z-Score Normalization
 
 Z-Score normalization, also known as standardization, scales the data based on
-the mean and standard deviation of an attribute. 
+the mean and standard deviation of an attribute.
 
 ???+ defi "Definition: Z-Score Normalization"
 
@@ -672,32 +676,32 @@ the mean and standard deviation of an attribute.
     X' = \frac{X - \mu}{\sigma}
     \]
 
-    where \(\mu\) is the mean of the feature and \(\sigma\) is the standard 
+    where \(\mu\) is the mean of the feature and \(\sigma\) is the standard
     deviation.
 
-This technique centers the data around zero with a standard deviation of one, 
+This technique centers the data around zero with a standard deviation of one,
 which is useful for algorithms assuming normally distributed data.
 
 ???+ question "Apply Z-Score normalization"
 
-    Use the [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
-    from `scikit-learn` to apply Z-Score normalization to the attribute 
-    *campaign* (number of times a customer was contacted).
+    Use the
+    [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
+    from `scikit-learn` to apply Z-Score normalization to the attribute *campaign*
+    (number of times a customer was contacted).
 
     1. Fit the `StandardScaler` on the data.
-    2. Transform the data.
-    3. Calculate and print the mean and standard deviation of the transformed 
-    data.
+    1. Transform the data.
+    1. Calculate and print the mean and standard deviation of the transformed data.
 
 ### One-Hot Encoding
 
-So far we have focused on numerical attributes. But what about 
-categorical variables? Since, many machine learning algorithms can't handle 
-categorical attributes directly, they need to be encoded. One common technique
-is to one-hot encode these attributes.
+So far we have focused on numerical attributes. But what about categorical
+variables? Since, many machine learning algorithms can't handle categorical
+attributes directly, they need to be encoded. One common technique is to
+one-hot encode these attributes.
 
-Imagine the toy example below to illustrate the concept of one-hot encoding 
-on the feature *job*.
+Imagine the toy example below to illustrate the concept of one-hot encoding on
+the feature *job*.
 
 <div style="text-align: center;">
     <video width="100%" height="700" controls>
@@ -708,28 +712,28 @@ on the feature *job*.
 
 ???+ defi "Definition: One-Hot Encoding"
 
-    One-hot encoding is a technique to convert categorical attributes into 
-    numerical attributes. Each category is represented as a binary vector 
-    where only one bit is set to 1 (hot) and the rest are set to 0 (cold).
+    One-hot encoding is a technique to convert categorical attributes into
+    numerical attributes. Each category is represented as a binary vector where
+    only one bit is set to 1 (hot) and the rest are set to 0 (cold).
 
-The class [`OneHotEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)
-from `scikit-learn` can be used to encode categorical attributes to a one-hot 
+The class
+[`OneHotEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)
+from `scikit-learn` can be used to encode categorical attributes to a one-hot
 encoded representation.
 
 ???+ question "Apply One-Hot Encoding"
 
-    Use the `OneHotEncoder` to encode the the attribute *job* from the 
-    following toy `DataFrame` (same as in the video).
+    Use the `OneHotEncoder` to encode the the attribute *job* from the following
+    toy `DataFrame` (same as in the video).
 
     ```python
     toy_data = pd.DataFrame(
         {"id": [1, 2, 3, 4], "job": ["engineer", "student", "teacher", "student"]}
     )
     ```
-    
-    1. Apply an instance of `#!python OneHotEncoder(sparse_output=False)` to 
-    *job*.
-    2. Check if the resulting matrix matches with the one in the video.
+
+    1. Apply an instance of `#!python OneHotEncoder(sparse_output=False)` to *job*.
+    1. Check if the resulting matrix matches with the one in the video.
 
 ### Label Encoding
 
@@ -737,12 +741,13 @@ Lastly, we introduce label encoding. Label encoding is another technique to
 encode categorical attributes. Instead of creating a binary vector for each
 category, label encoding assigns a unique integer to each category.
 
-`scikit-learn`'s [`LabelEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)
-is specifically designed to encode the target variable (i.e., the attribute we 
+`scikit-learn`'s
+[`LabelEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)
+is specifically designed to encode the target variable (i.e., the attribute we
 want to predict). In our case, we apply the `LabelEncoder` to the column named
 `#!python "y"`.
 
-`#!python "y"` :fontawesome-solid-arrow-right: represents if the client 
+`#!python "y"` :fontawesome-solid-arrow-right: represents if the client
 subscribed to a term deposit or not.
 
 ```python
@@ -760,21 +765,21 @@ Unique values: ['no' 'yes']
 [0 0 0 ... 0 0 0]
 ```
 
-As `#!python "y"` contains the values `#!python "yes"` and `#!python "no"`, 
-we retrieve a binary representation with `#!python 0` and `#!python 1`.
+As `#!python "y"` contains the values `#!python "yes"` and `#!python "no"`, we
+retrieve a binary representation with `#!python 0` and `#!python 1`.
 
 ## Recap
 
-In this chapter, we have extensively covered missing values. The challenges to 
-detect them in the first place and how to properly encode them. We explored 
-different strategies to deal with missing values, from dropping columns/rows 
-to imputation techniques.
+In this chapter, we have extensively covered missing values. The challenges to
+detect them in the first place and how to properly encode them. We explored
+different strategies to deal with missing values, from dropping columns/rows to
+imputation techniques.
 
-Using `scikit-learn` we were able to easily apply transformation to the 
+Using `scikit-learn` we were able to easily apply transformation to the
 Portuguese retail bank data set. We discretized (`KBinsDiscretizer`) numerical
-attributes, normalized them (`MinMaxScaler`, `StandardScaler`), and encoded 
-categorical features with one-hot encoding (`OneHotEncoder`). Lastly, we 
+attributes, normalized them (`MinMaxScaler`, `StandardScaler`), and encoded
+categorical features with one-hot encoding (`OneHotEncoder`). Lastly, we
 briefly covered the encoding of target variables with the `LabelEncoder`.
 
-With all these preprocessing steps, we are now well-equipped to dive into 
-the machine learning part and are closer to training our first model.
+With all these preprocessing steps, we are now well-equipped to dive into the
+machine learning part and are closer to training our first model.
diff --git a/docs/data-science/index.md b/docs/data-science/index.md
index 1985bda9..72b65f53 100644
--- a/docs/data-science/index.md
+++ b/docs/data-science/index.md
@@ -14,19 +14,19 @@ examples!
 ## Course overview
 
 1. Basics: Introduction to various terms (data science, machine learning, etc.)
-   and data basics such as attribute types.
-2. Data Preparation & Preprocessing: Data cleaning, integration, and 
-   transformation.
-3. Supervised vs. Unsupervised Learning: Exploration of both terms, and 
-   coverage of various algorithms.
-4. Data Science in Practice: Step-by-step guide to a real-world data science 
-   project.
+    and data basics such as attribute types.
+1. Data Preparation & Preprocessing: Data cleaning, integration, and
+    transformation.
+1. Supervised vs. Unsupervised Learning: Exploration of both terms, and
+    coverage of various algorithms.
+1. Data Science in Practice: Step-by-step guide to a real-world data science
+    project.
 
 ## Tools
 
 As always, we use Python and these great packages :heart:
 
----
+______________________________________________________________________
 
 <div class="row" style="display: flex; justify-content: space-around;">
     <div class="col">
@@ -47,7 +47,7 @@ As always, we use Python and these great packages :heart:
     </div>
 </div>
 
----
+______________________________________________________________________
 
 ## Sneak peek
 
@@ -55,55 +55,55 @@ Here is a sneak peek of selected topics we cover in this course:
 
 <div class="grid cards" markdown>
 
--   __:deciduous_tree: Decision trees__ 
+- __:deciduous_tree: Decision trees__
+
+    ______________________________________________________________________
 
-    --- 
-    
     What are decision trees? How do they work?
 
--   <figure markdown="span">
-        <img src="/assets/data-science/algorithms/tree-based/tree.png">
+- <figure markdown="span">
+    <img src="/assets/data-science/algorithms/tree-based/tree.png">
     </figure>
 
--   <figure markdown="span">
-        <img 
-            src="/assets/data-science/index/spotify-snippet.png" 
-            style="border-radius: 15px;"
-        >
+- <figure markdown="span">
+    <img
+    src="/assets/data-science/index/spotify-snippet.png"
+    style="border-radius: 15px;"
+    >
     </figure>
 
--   __:musical_note: Recommender system__ 
+- __:musical_note: Recommender system__
+
+    ______________________________________________________________________
+
+    With a large Spotify data set, we build a recommender system. At the end you
+    will be able to recommend songs.
+
+- __:mechanical_arm: Elbow method__
 
-    --- 
-    
-    With a large Spotify data set, we build a recommender system. At the end
-    you will be able to recommend songs.
+    ______________________________________________________________________
 
--   __:mechanical_arm: Elbow method__ 
+    We introduce the elbow method and how it can help us to (for example) refine
+    the recommender system.
 
-    --- 
-    
-    We introduce the elbow method and how it can help us to (for example) 
-    refine the recommender system.
- 
--   <figure markdown="span">
-        <img src="/assets/data-science/algorithms/clustering/elbow-method.png">
+- <figure markdown="span">
+    <img src="/assets/data-science/algorithms/clustering/elbow-method.png">
     </figure>
 
--   <figure markdown="span">
-        <img 
-            src="https://static1.cbrimages.com/wordpress/wp-content/uploads/2023/12/split-images-of-transformers-anime.jpg" 
-            style="border-radius: 15px;"
-        >
+- <figure markdown="span">
+    <img
+    src="https://static1.cbrimages.com/wordpress/wp-content/uploads/2023/12/split-images-of-transformers-anime.jpg"
+    style="border-radius: 15px;"
+    >
     </figure>
 
--   __:chart_with_upwards_trend: Data Science in Practice__ 
+- __:chart_with_upwards_trend: Data Science in Practice__
 
-    ---
+    ______________________________________________________________________
 
-    We present a step-by-step guide to a real-world data science project. 
-    The project applies concepts, algorithms and techniques introduced up to 
-    this point.
+    We present a step-by-step guide to a real-world data science project. The
+    project applies concepts, algorithms and techniques introduced up to this
+    point.
 
     ... oh, and along the way we cover a different type of transformer.
 
@@ -116,10 +116,10 @@ science field. Additionally, you will be capable of preprocessing real-world
 data, selecting and applying appropriate algorithms to solve practical
 problems.
 
----
+______________________________________________________________________
 
 <div style="text-align: center">
     <h3>Let's get started! 🚀</h3>
 </div>
 
----
\ No newline at end of file
+______________________________________________________________________
diff --git a/docs/data-science/practice/bonus.md b/docs/data-science/practice/bonus.md
index b3bcc2f3..16e6392f 100644
--- a/docs/data-science/practice/bonus.md
+++ b/docs/data-science/practice/bonus.md
@@ -1,8 +1,8 @@
 ## Introduction
 
-This bonus chapter demonstrates the usage of a pipeline in conjunction with 
-a grid search to automate the modelling process. Again, we are utilizing 
-the bank marketing data. However, this time around we streamline the following:
+This bonus chapter demonstrates the usage of a pipeline in conjunction with a
+grid search to automate the modelling process. Again, we are utilizing the bank
+marketing data. However, this time around we streamline the following:
 
 - Data preprocessing
 - Model evaluation
@@ -10,22 +10,21 @@ the bank marketing data. However, this time around we streamline the following:
 - Model selection
 - Re-training the model on the entire dataset
 
-... basically every step we had taken in "Data Science in Practice" block. 
-Moreover, with a pipeline and grid search, we can easily evaluate additional 
+... basically every step we had taken in "Data Science in Practice" block.
+Moreover, with a pipeline and grid search, we can easily evaluate additional
 model types and apply a more sophisticated way to evaluate their performance.
 
 ???+ tip
 
-    This chapter serves as an additional outlook for further topics you 
-    could explore, targeting your curiosity. Some concepts and techniques 
-    used in this chapter were not covered in this course. We won't explain 
-    them in much detail here, as they are beyond the scope of this course. 
-    Nonetheless, they could prove valuable for your future machine learning 
-    journey.
+    This chapter serves as an additional outlook for further topics you could
+    explore, targeting your curiosity. Some concepts and techniques used in this
+    chapter were not covered in this course. We won't explain them in much detail
+    here, as they are beyond the scope of this course. Nonetheless, they could
+    prove valuable for your future machine learning journey.
 
 If you're still around, great! Let's get started with some code. :rocket:
 
----
+______________________________________________________________________
 
 ## Quickstart
 
@@ -33,30 +32,31 @@ If you just need a blueprint for your next project, here's the whole thing.
 
 ??? code
 
-    ~~~python linenums="1"
+    ```python linenums="1"
     {% include "../../assets/data-science/practical/bonus.py" %}
-    ~~~
+
+    ```
 
 1. Open the `bank_model` project (from the Data Science in Practice block).
-2. Copy and execute the code.
-3. Done!
+1. Copy and execute the code.
+1. Done!
 
 If you want to know more about the individual parts, keep reading.
 
----
+______________________________________________________________________
 
 ## Plan of attack
 
 We start by defining a bunch of things:
 
 1. Custom transformer, for data imputation and returning a `DataFrame`
-2. `ColumnTransformer` for preprocessing the data
-3. Pipeline to combine all steps
-4. Grid defining all models and parameters to be evaluated
-5. Grid search to find the best model
+1. `ColumnTransformer` for preprocessing the data
+1. Pipeline to combine all steps
+1. Grid defining all models and parameters to be evaluated
+1. Grid search to find the best model
 
-Then we simply need to apply the pipeline and grid search to the data. 
-Finally, we save the best model.
+Then we simply need to apply the pipeline and grid search to the data. Finally,
+we save the best model.
 
 ## Implementation
 
@@ -88,23 +88,25 @@ class DataFrameImputer(BaseEstimator, TransformerMixin):
         )
 ```
 
-The custom transformer has implement the `fit()` and `transform()` methods. 
+The custom transformer has implement the `fit()` and `transform()` methods.
 Since we are not passing the target variable `y` to the `fit` method, we
 "ignore" it by defining it as `#!python y=None`.
 
-If you want to know more about custom transformers or even custom estimators 
+If you want to know more about custom transformers or even custom estimators
 (models), check out these resources:
 
-- Custom transformer from a function [:octicons-link-external-16:](https://scikit-learn.org/stable/modules/preprocessing.html#custom-transformers)
-- `TransformerMixin` [:octicons-link-external-16:](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html)
-- Custom estimator [:octicons-link-external-16:](https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator)
+- Custom transformer from a function
+    [:octicons-link-external-16:](https://scikit-learn.org/stable/modules/preprocessing.html#custom-transformers)
+- `TransformerMixin`
+    [:octicons-link-external-16:](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html)
+- Custom estimator
+    [:octicons-link-external-16:](https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator)
 
 ???+ tip
 
-    `DataFrameImputer` returns a `pandas` `DataFrame` which allows us to 
-    easily chain the imputation step together with our trusted 
-    `ColumnTransformer` within a pipeline. In this case, that's the whole 
-    purpose of the custom transformer.
+    `DataFrameImputer` returns a `pandas` `DataFrame` which allows us to easily
+    chain the imputation step together with our trusted `ColumnTransformer` within
+    a pipeline. In this case, that's the whole purpose of the custom transformer.
 
 ### 2. `ColumnTransformer`
 
@@ -160,8 +162,8 @@ preprocessor = ColumnTransformer(
 
 ### 3. Pipeline
 
-A pipeline is a sequence of steps where each step is a tuple containing
-a name and a transformer/estimator.
+A pipeline is a sequence of steps where each step is a tuple containing a name
+and a transformer/estimator.
 
 ```python
 from sklearn.feature_selection import VarianceThreshold
@@ -180,16 +182,16 @@ pipe = Pipeline(
 Our pipeline consists of the following sequential steps:
 
 1. `#!python "imputer"` - Impute missing values
-2. `#!python "preprocessor"` - Apply all further preprocessing steps
-3. `#!python "variance"` - Remove features with zero variance (removes all 
-   constant features)
-4. `#!python "classifier"` - Apply a classifier (to be defined later)
+1. `#!python "preprocessor"` - Apply all further preprocessing steps
+1. `#!python "variance"` - Remove features with zero variance (removes all
+    constant features)
+1. `#!python "classifier"` - Apply a classifier (to be defined later)
 
 ???+ tip
 
-    You can modify pipelines to your liking. For example you could add 
-    another feature selection step. Or what about applying a PCA and then a 
-    classifier? The possibilities are endless!
+    You can modify pipelines to your liking. For example you could add another
+    feature selection step. Or what about applying a PCA and then a classifier? The
+    possibilities are endless!
 
 ### 4. Grid
 
@@ -224,15 +226,15 @@ grid = [
 The grid contains four different models:
 
 1. Random Forest
-2. Support Vector Machine (not discussed in this course)
-3. Logistic Regression
-4. Multi-layer Perceptron (Neural Network - not discussed in this course)
+1. Support Vector Machine (not discussed in this course)
+1. Logistic Regression
+1. Multi-layer Perceptron (Neural Network - not discussed in this course)
 
 We will evaluate all these models and each hyperparameter combination.
 
 ???+ info
 
-    Names in the grid dictionary must match the names in the pipeline 
+    Names in the grid dictionary must match the names in the pipeline
     (`#!python "classifier"`). The double underscore `#!python "__"` is used to
     indicate that the parameter belongs to the classifier in the pipeline.
 
@@ -254,27 +256,28 @@ search = GridSearchCV(
 )
 ```
 
-1. Use all available CPU cores (`#!python n_jobs=-1`). This speeds up the 
-   process significantly.
-2. We use a stratified k-fold cross-validation with 5 splits. Each fold 
-   preserves the percentage of samples for each class.
+1. Use all available CPU cores (`#!python n_jobs=-1`). This speeds up the
+    process significantly.
+1. We use a stratified k-fold cross-validation with 5 splits. Each fold
+    preserves the percentage of samples for each class.
 
 Basically, we are evaluating all models and hyperparameters using a
-(stratified) k-fold cross-validation (read more about cross-validation 
+(stratified) k-fold cross-validation (read more about cross-validation
 [here](https://scikit-learn.org/stable/modules/cross_validation.html#k-fold)).
 The `StratifiedKFold` thus replaces our simple `train_test_split()`.
 
-To evaluate the models, we are calculating two performance metrics: balanced 
-accuracy and ROC AUC (`#!python scoring=["balanced_accuracy", "roc_auc"]`).
-The best model is selected based on the balanced accuracy (`#!python 
-refit="balanced_accuracy"`) and then ==retrained on the entire dataset==!
+To evaluate the models, we are calculating two performance metrics: balanced
+accuracy and ROC AUC (`#!python scoring=["balanced_accuracy", "roc_auc"]`). The
+best model is selected based on the balanced accuracy
+(`#!python  refit="balanced_accuracy"`) and then ==retrained on the entire
+dataset==!
 
 ???+ info
 
-    The grid search eliminates the need to compare models manually, it performs 
-    hyperparameter tuning, and it selects the best model for us. Lastly, we 
-    won't even have to re-train it on the entire dataset, as the grid search
-    already does that for us! :exploding_head:
+    The grid search eliminates the need to compare models manually, it performs
+    hyperparameter tuning, and it selects the best model for us. Lastly, we won't
+    even have to re-train it on the entire dataset, as the grid search already does
+    that for us! :exploding_head:
 
 ## Application
 
@@ -289,8 +292,7 @@ X, y = data.drop(columns="y"), data["y"]
 search.fit(X, y)
 
 print(
-    f"Best score: {search.best_score_}\n"
-    f"Best estimator: {search.best_params_}"
+    f"Best score: {search.best_score_}\nBest estimator: {search.best_params_}"
 )
 ```
 
@@ -312,7 +314,7 @@ search.best_estimator_.predict()
 
 That's it! You've automated the whole modelling process. :tada:
 
----
+______________________________________________________________________
 
 <div style="text-align: center">
 
@@ -323,4 +325,5 @@ That's it! You've automated the whole modelling process. :tada:
 
 </div>
 
----
+______________________________________________________________________
+
diff --git a/docs/data-science/practice/data-preparation.md b/docs/data-science/practice/data-preparation.md
index aabf7a16..fadf06f5 100644
--- a/docs/data-science/practice/data-preparation.md
+++ b/docs/data-science/practice/data-preparation.md
@@ -1,13 +1,13 @@
 ## Introduction
 
 To achieve our end goal we have to carefully analyze and preprocess the data.
-We will start by exploring the data set, handling missing values, identifying 
+We will start by exploring the data set, handling missing values, identifying
 attribute types, and then proceed to preprocessing techniques.
 
 ???+ info
 
-    To build a proper machine learning model for the bank marketing data set, 
-    we need to channel all our knowledge obtained so far!
+    To build a proper machine learning model for the bank marketing data set, we
+    need to channel all our knowledge obtained so far!
 
 Create a new notebook or script.
 
@@ -35,22 +35,21 @@ data = pd.read_csv("data/bank-merged.csv")
     It's always a good idea to take a look at the data before proceeding.
 
     1. Check the shape of the data.
-    2. Display the first few rows.
+    1. Display the first few rows.
 
 ## Missing values
 
-In the data preprocessing chapter we discussed missing values. Recall that 
-in this specific data set, the missing values are a bit more 
-[hidden](../data/preprocessing.md#missing-values-in-disguise).
-They are encoded as `#!python "unknown"`. So let's replace these values with
-`#!python None`.
+In the data preprocessing chapter we discussed missing values. Recall that in
+this specific data set, the missing values are a bit more
+[hidden](../data/preprocessing.md#missing-values-in-disguise). They are encoded
+as `#!python "unknown"`. So let's replace these values with `#!python None`.
 
 ```python
 data = data.replace("unknown", None)
 ```
 
-With a cleaned data set, we can now proceed to the next step - data 
-exploration. 
+With a cleaned data set, we can now proceed to the next step - data
+exploration.
 
 ## Attribute types
 
@@ -58,8 +57,8 @@ Start by checking the attribute types.
 
 ???+ question "Attribute types"
 
-    Again, look at the data set. The task is to identify which attribute types 
-    are generally present in the dataset. Answer, the following quiz question.
+    Again, look at the data set. The task is to identify which attribute types are
+    generally present in the dataset. Answer, the following quiz question.
 
     If you need a refresher on attribute types, check out the appropriate
     [chapter](../data/basics.md).
@@ -74,8 +73,8 @@ Which attribute types are present in the data set?
 - [ ] nominal
 - [ ] nominal, numerical
 
-We are dealing with quite a mixed data set. All three different
-types (nominal, ordinal and numerical) are present. For example:
+We are dealing with quite a mixed data set. All three different types (nominal,
+ordinal and numerical) are present. For example:
 
 - **Job** - nominal
 - **Education** - ordinal
@@ -85,48 +84,48 @@ types (nominal, ordinal and numerical) are present. For example:
 
 #### Feature description
 
-With a broad overview, let's explore the different features/attributes more 
-in-depth. Since we are dealing with a couple of features, categories were 
-built. 
+With a broad overview, let's explore the different features/attributes more
+in-depth. Since we are dealing with a couple of features, categories were
+built.
 
 <div class="grid cards" markdown>
 
--   :old_man: __Client Demographics__
+- :old_man: __Client Demographics__
 
-    ---
-    
-    Demographic information about each client such as the education level
-    (high school, university, etc.).
+    ______________________________________________________________________
+
+    Demographic information about each client such as the education level (high
+    school, university, etc.).
 
     | Variable  | Description                                       |
-    |-----------|---------------------------------------------------|
+    | --------- | ------------------------------------------------- |
     | id        | Client identifier (we will ignore the identifier) |
     | age       | Age                                               |
     | job       | Type of occupation                                |
     | marital   | Marital status                                    |
     | education | Education level                                   |
 
--   :money_mouth_face: __Financial Status__
+- :money_mouth_face: __Financial Status__
 
-    ---
+    ______________________________________________________________________
 
     Does the client have a housing or personal loan, etc.
 
     | Variable | Description           |
-    |----------|-----------------------|
+    | -------- | --------------------- |
     | default  | Credit default status |
     | housing  | Housing loan status   |
     | loan     | Personal loan status  |
 
--   :telephone: __Campaign Information__
+- :telephone: __Campaign Information__
+
+    ______________________________________________________________________
 
-    ---
-    
-    Remember, bank clients were contacted by phone. Some were contacted 
-    multiple times over the span of multiple campaigns.
+    Remember, bank clients were contacted by phone. Some were contacted multiple
+    times over the span of multiple campaigns.
 
     | Variable    | Description                                    |
-    |-------------|------------------------------------------------|
+    | ----------- | ---------------------------------------------- |
     | contact     | Contact type                                   |
     | month       | Last contact month                             |
     | day_of_week | Last contact day                               |
@@ -135,15 +134,15 @@ built.
     | previous    | Number of contacts before this campaign        |
     | poutcome    | Outcome of previous campaign                   |
 
--   :factory: __Economic Indicators__
+- :factory: __Economic Indicators__
 
-    ---
-    
-    Some economic indicators at the time of the contact like the current
-    interest rate ([Euribor rate](https://www.euribor-rates.eu/en/)).
+    ______________________________________________________________________
+
+    Some economic indicators at the time of the contact like the current interest
+    rate ([Euribor rate](https://www.euribor-rates.eu/en/)).
 
     | Variable       | Description               |
-    |----------------|---------------------------|
+    | -------------- | ------------------------- |
     | emp.var.rate   | Employment variation rate |
     | cons.price.idx | Consumer price index      |
     | cons.conf.idx  | Consumer confidence index |
@@ -154,24 +153,21 @@ built.
 
 ???+ info
 
-    Lastly, one column remains - `#!python "y"`. This column is the target, 
-    whether a customer subscribed to a term deposit (`#!python 1`) or not
-    (`#!python 0`).
+    Lastly, one column remains - `#!python "y"`. This column is the target, whether
+    a customer subscribed to a term deposit (`#!python 1`) or not (`#!python 0`).
 
-With a better understanding of the features at hand, we can proceed to the 
-next step, assigning attribute types to the columns. Doing so, will help us 
-later to pick the appropriate preprocessing steps.
+With a better understanding of the features at hand, we can proceed to the next
+step, assigning attribute types to the columns. Doing so, will help us later to
+pick the appropriate preprocessing steps.
 
 #### Assigning attribute types
 
 ???+ question "Assigning attributes"
 
-    Assign an attribute type to each column. Look at the data and go 
-    over each column/attribute and add the column name to one of the three 
-    empty lists.
+    Assign an attribute type to each column. Look at the data and go over each
+    column/attribute and add the column name to one of the three empty lists.
 
-    Disregard the unique identifier `#!python "id"` and the target 
-    `#!python "y"`.
+    Disregard the unique identifier `#!python "id"` and the target `#!python "y"`.
 
     ```python
     nominal = []
@@ -179,17 +175,17 @@ later to pick the appropriate preprocessing steps.
     numerical = []
     ```
 
-    ---    
+    ______________________________________________________________________
+
+    For example (part of the solution):
 
-    For example (part of the solution): 
+    - `#!python "age"` is a "measurable" quantity and expressed as a number, thus
+        is a numerical attribute.
 
-    - `#!python "age"` is a "measurable" quantity and expressed as a number, 
-        thus is a numerical attribute.
+    - The next attribute `#!python "default"` is clearly categorical with its
+        unique values `#!python ["no", None, "yes]`. But since the attribute has no
+        meaningful order, it is nominal.
 
-    - The next attribute `#!python "default"` is clearly categorical with 
-        its unique values `#!python ["no", None, "yes]`. But since the 
-        attribute has no meaningful order, it is nominal.
-    
     Resulting so far in:
 
     ```python
@@ -202,17 +198,14 @@ later to pick the appropriate preprocessing steps.
 
 ???+ danger
 
-    Since the attribute assignment is crucial, we strongly urge you to solve 
-    the task. It will help your understanding of the data set and the next 
-    steps.
+    Since the attribute assignment is crucial, we strongly urge you to solve the
+    task. It will help your understanding of the data set and the next steps.
 
-    Check your solution with the answer below and correct any mistakes you've
-    made.
+    Check your solution with the answer below and correct any mistakes you've made.
 
 ??? info
 
-    The solution is as follows (column names are ordered according to 
-    `data`):
+    The solution is as follows (column names are ordered according to `data`):
 
     ```python
     nominal = [
@@ -224,13 +217,13 @@ later to pick the appropriate preprocessing steps.
         "job",
         "marital",
     ]
-    
+
     ordinal = [
         "month",
         "day_of_week",
         "education",
     ]
-    
+
     numerical = [
         "age",
         "campaign",
@@ -246,15 +239,15 @@ later to pick the appropriate preprocessing steps.
 
 ### Visualizing the data
 
-To get an even better understanding of the data, we can visualize it. For 
-convenience, we will use `pandas` built-in plotting 
-[capabilities](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html). 
+To get an even better understanding of the data, we can visualize it. For
+convenience, we will use `pandas` built-in plotting
+[capabilities](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html).
 
 ???+ tip
 
-    If you want to know more on visualizing different attribute types, visit 
-    the [Frequency Distribution](../../statistics/univariate/Frequency.md) 
-    chapter of the Statistics course.
+    If you want to know more on visualizing different attribute types, visit the
+    [Frequency Distribution](../../statistics/univariate/Frequency.md) chapter of
+    the Statistics course.
 
 For example, we can plot numerical attributes like `#!python "campaign"` as a
 box plot.
@@ -281,7 +274,9 @@ Or how about a pie chart for nominal attributes like `#!python "marital"`?
 ```python
 # first, count the occurrence of each category
 marital_count = data["marital"].value_counts()
-marital_count.plot(kind="pie", autopct="%1.0f%%", title="Marital status")  # (1)!
+marital_count.plot(
+    kind="pie", autopct="%1.0f%%", title="Marital status"
+)  # (1)!
 plt.show()
 ```
 
@@ -296,20 +291,22 @@ plt.show()
     Pick two more attributes of your choice and plot them.
 
     1. Choose a numerical attribute and plot it as a histogram.
-    2. Select a nominal or ordinal attribute and plot it as a bar chart.
+    1. Select a nominal or ordinal attribute and plot it as a bar chart.
 
     Use these `pandas` resources, if you're having trouble:
 
-    - `DataFrame.plot()` docs [:octicons-link-external-16:](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)
-    - Chart visualization [:octicons-link-external-16:](https://pandas.pydata.org/docs/user_guide/visualization.html)
+    - `DataFrame.plot()` docs
+        [:octicons-link-external-16:](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)
+    - Chart visualization
+        [:octicons-link-external-16:](https://pandas.pydata.org/docs/user_guide/visualization.html)
 
 ???+ info
 
-    It's crucial to visualize your data before diving into further analysis. 
-    Visualizations can help you understand the distribution, identify patterns,
-    and detect anomalies or outliers in your data. This step ensures that you
-    have a clear understanding of your data, which is essential for making 
-    informed decisions in your analysis process.
+    It's crucial to visualize your data before diving into further analysis.
+    Visualizations can help you understand the distribution, identify patterns, and
+    detect anomalies or outliers in your data. This step ensures that you have a
+    clear understanding of your data, which is essential for making informed
+    decisions in your analysis process.
 
 ## Preprocessing
 
@@ -317,46 +314,46 @@ Now that we have a better understanding of the data, we can proceed to the
 preprocessing steps. Depending on the attribute type, we will apply different
 techniques.
 
-Since we are dealing with a mixed data set, we will keep things relatively 
+Since we are dealing with a mixed data set, we will keep things relatively
 simple and plan our approach accordingly:
 
 - For `nominal` attributes, we apply one-hot encoding.
 - For `ordinal` attributes, we use one-hot encoding as well.
 - For `numerical` attributes, we follow two strategies:
     - Create bins for `age`, `campaign`, `pdays`, and `previous`.
-    - Z-Score normalization for the remaining features: `emp.var.rate`, 
-      `cons.price.idx`, `cons.conf.idx`, `euribor3m`, and `nr.employed`.
+    - Z-Score normalization for the remaining features: `emp.var.rate`,
+        `cons.price.idx`, `cons.conf.idx`, `euribor3m`, and `nr.employed`.
 
 ???+ info
-    
-    `nominal` and `ordinal` attributes are categorical and require one-hot
-    encoding to be suitable for machine learning algorithms.
+
+    `nominal` and `ordinal` attributes are categorical and require one-hot encoding
+    to be suitable for machine learning algorithms.
 
     We are creating bins for `age`, `campaign`, `pdays`, and `previous`, since
-    these features have a large number of outliers. By binning these features,
-    we can try to reduce the impact of outliers and noise in the data.
+    these features have a large number of outliers. By binning these features, we
+    can try to reduce the impact of outliers and noise in the data.
 
-    Z-Score normalization is applied to the remaining numerical features to
-    ensures that features don't have a larger impact on the model just 
-    because of their larger magnitude.
+    Z-Score normalization is applied to the remaining numerical features to ensures
+    that features don't have a larger impact on the model just because of their
+    larger magnitude.
 
 To apply these preprocessing steps, we have to look for the corresponding
 `scikit-learn` classes.
 
 <div class="grid cards" markdown>
 
--   :toolbox: __Preprocessing technique__
+- :toolbox: __Preprocessing technique__
 
-    ---
+    ______________________________________________________________________
 
     - One hot encoding
     - Binning
     - Z-Score normalization (standardization)
 
--   :package: __Corresponding `scikit-learn` class__
+- :package: __Corresponding `scikit-learn` class__
+
+    ______________________________________________________________________
 
-    ---
-    
     - `OneHotEncoder`
     - `KBinsDiscretizer`
     - `StandardScaler`
@@ -365,14 +362,14 @@ To apply these preprocessing steps, we have to look for the corresponding
 
 ???+ tip
 
-    All these techniques and classes were previously introduced in the 
+    All these techniques and classes were previously introduced in the
     [Data preprocessing chapter](../data/preprocessing.md).
 
-Just like in the Data preprocessing chapter we could apply each technique
-one at a time, e.g.:
+Just like in the Data preprocessing chapter we could apply each technique one
+at a time, e.g.:
 
 ```python
-from sklearn.preprocessing import OneHotEncoder, KBinsDiscretizer
+from sklearn.preprocessing import KBinsDiscretizer, OneHotEncoder
 
 nominal_encoder = OneHotEncoder()
 nominal_encoder.fit_transform(data[nominal])
@@ -386,9 +383,9 @@ binning.fit_transform(data[["age", "campaign", "pdays", "previous"]])
 # and so on...
 ```
 
-... the above approach itself is perfectly fine, but we can do better!
-But first, we need to get the term information leakage out of the way, a 
-common pitfall in machine learning/data science projects.
+... the above approach itself is perfectly fine, but we can do better! But
+first, we need to get the term information leakage out of the way, a common
+pitfall in machine learning/data science projects.
 
 ### Information leakage
 
@@ -396,21 +393,19 @@ To explain the term information leakage, let's look at an example.
 
 ???+ danger "Information leakage"
 
-    Assume, we want to predict the target `#!python "y"` based on the features 
-    `#!python "emp.var.rate"` and `#!python "euribor3m"`. First, we apply 
-    Z-Score normalization to these features. 
+    Assume, we want to predict the target `#!python "y"` based on the features
+    `#!python "emp.var.rate"` and `#!python "euribor3m"`. First, we apply Z-Score
+    normalization to these features.
 
     ```python
     from sklearn.preprocessing import StandardScaler
 
     scaler = StandardScaler()
-    features = scaler.fit_transform(
-        data[["emp.var.rate", "euribor3m"]]
-    )
+    features = scaler.fit_transform(data[["emp.var.rate", "euribor3m"]])
     ```
-    
-    As always, we are splitting the data into training and test set to 
-    later evaluate the model.
+
+    As always, we are splitting the data into training and test set to later
+    evaluate the model.
 
     ```python
     from sklearn.model_selection import train_test_split
@@ -419,26 +414,25 @@ To explain the term information leakage, let's look at an example.
         features, data["y"], test_size=0.2, random_state=42
     )
     ```
-    
-    Now, we are already dealing with information leakage. Put simply - the
-    train set `X_train` already "knows" something about the test set `X_test`. 
-    
+
+    Now, we are already dealing with information leakage. Put simply - the train
+    set `X_train` already "knows" something about the test set `X_test`.
+
     Why?
-    
-    ---
 
-    Remember the definition of Z-Score normalization - it calculates the mean
-    and standard deviation of the data set. If we calculate these values on the
-    whole data set; in our case `data` - just like we did above, `X_train` 
-    contains information about `X_test`. Thus, the test set is no longer a 
-    good representation of unseen data, hence any scores calculated with the 
-    test set are no longer a good indicator of the model's performance.
+    ______________________________________________________________________
 
-    This is a common pitfall in machine learning! To prevent information 
-    leakage, we ==have to split the data before applying any preprocessing 
-    steps==.
+    Remember the definition of Z-Score normalization - it calculates the mean and
+    standard deviation of the data set. If we calculate these values on the whole
+    data set; in our case `data` - just like we did above, `X_train` contains
+    information about `X_test`. Thus, the test set is no longer a good
+    representation of unseen data, hence any scores calculated with the test set
+    are no longer a good indicator of the model's performance.
 
-With information leakage in mind, we introduce a more elegant way to apply 
+    This is a common pitfall in machine learning! To prevent information leakage,
+    we ==have to split the data before applying any preprocessing steps==.
+
+With information leakage in mind, we introduce a more elegant way to apply
 multiple preprocessing steps.
 
 ### `ColumnTransformer`
@@ -452,10 +446,11 @@ multiple preprocessing steps.
     alt="Transformers">
 </div>
 
-Since we do not want to apply each preprocessing step one at a time, we 
-simply bundle them.
+Since we do not want to apply each preprocessing step one at a time, we simply
+bundle them.
 
-The [`ColumnTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html) 
+The
+[`ColumnTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html)
 is a class in `scikit-learn` that allows us to bundle our preprocessing steps
 together. This way, we can apply all transformations in one go.
 
@@ -463,65 +458,91 @@ First, we import all necessary classes:
 
 ```python hl_lines="1"
 from sklearn.compose import ColumnTransformer
-from sklearn.preprocessing import KBinsDiscretizer, OneHotEncoder, StandardScaler
+from sklearn.preprocessing import (
+    KBinsDiscretizer,
+    OneHotEncoder,
+    StandardScaler,
+)
 ```
 
-Next, we can already initiate our transformer. We define the exact same steps 
-as we did in written form at the beginning of this section. Note that the 
+Next, we can already initiate our transformer. We define the exact same steps
+as we did in written form at the beginning of this section. Note that the
 `ColumnTransformer` takes a `#!python list` of `#!python tuple`.
 
 ```python linenums="1"
 preprocessor = ColumnTransformer(
     transformers=[
-        ("nominal", OneHotEncoder(), 
-         ["default", "housing", "loan", "contact", "poutcome", "job", "marital"]),
-        
-        ("ordinal", OneHotEncoder(), 
-         ["month", "day_of_week", "education"]),
-        
-        ("binning", KBinsDiscretizer(n_bins=5, strategy="uniform", encode="onehot"),  # (1)!
-         ["age", "campaign", "pdays", "previous"]),
-        
-        ("zscore", StandardScaler(), 
-         ["emp.var.rate", "cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed"]),
+        (
+            "nominal",
+            OneHotEncoder(),
+            [
+                "default",
+                "housing",
+                "loan",
+                "contact",
+                "poutcome",
+                "job",
+                "marital",
+            ],
+        ),
+        ("ordinal", OneHotEncoder(), ["month", "day_of_week", "education"]),
+        (
+            "binning",
+            KBinsDiscretizer(
+                n_bins=5, strategy="uniform", encode="onehot"
+            ),  # (1)!
+            ["age", "campaign", "pdays", "previous"],
+        ),
+        (
+            "zscore",
+            StandardScaler(),
+            [
+                "emp.var.rate",
+                "cons.price.idx",
+                "cons.conf.idx",
+                "euribor3m",
+                "nr.employed",
+            ],
+        ),
     ]
 )
 ```
 
-1. Conveniently, we can create categories (bins) with the `KBinsDiscretizer` 
-   and directly apply one-hot encoding with `#!python encode="oneheot"`.
+1. Conveniently, we can create categories (bins) with the `KBinsDiscretizer`
+    and directly apply one-hot encoding with `#!python encode="oneheot"`.
 
 Let's break it down:
 
-- Our instance `preprocessor` has 4 steps, named `nominal`, `ordinal`, 
-  `binning`, and `zscore`.
-- Each step is defined as a `#!python tuple`, with the first element being 
-  the name of the step, the second element the preprocessing technique, and 
-  the third element being a list of columns to apply the technique to.
-- By default, all columns which are not specified in the `ColumnTransformer` 
-  will be dropped! See the [`remainder`](https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html)
-  parameter in the docs.
+- Our instance `preprocessor` has 4 steps, named `nominal`, `ordinal`,
+    `binning`, and `zscore`.
+- Each step is defined as a `#!python tuple`, with the first element being the
+    name of the step, the second element the preprocessing technique, and the
+    third element being a list of columns to apply the technique to.
+- By default, all columns which are not specified in the `ColumnTransformer`
+    will be dropped! See the
+    [`remainder`](https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html)
+    parameter in the docs.
 
-So far we only defined the necessary preprocessing steps, but didn't apply 
-them just yet (that's part of the next chapter).
+So far we only defined the necessary preprocessing steps, but didn't apply them
+just yet (that's part of the next chapter).
 
 ### Detour: Didn't we forget something?
 
-We completely neglected the missing values in the data set. Thus, we still 
-need to handle them with an imputation technique.
+We completely neglected the missing values in the data set. Thus, we still need
+to handle them with an imputation technique.
 
 ???+ tip
 
-    During the development process of a data science project, you will often 
-    find yourself jumping back and forth between different steps. This is 
-    perfectly normal and part of the process. Seldom will you follow a 
-    linear path from start to finish.
+    During the development process of a data science project, you will often find
+    yourself jumping back and forth between different steps. This is perfectly
+    normal and part of the process. Seldom will you follow a linear path from start
+    to finish.
 
 ```python
 print(data.isna().sum())
 ```
 
-If you execute the above line, you will see that we still have many missing 
+If you execute the above line, you will see that we still have many missing
 values in a couple of columns. No worries, we can easily handle them with:
 
 ```python
@@ -530,8 +551,8 @@ from sklearn.impute import SimpleImputer
 impute = SimpleImputer(strategy="most_frequent", missing_values=None)
 ```
 
-The `SimpleImputer` lets us fill in missing values with the most frequent
-value in the respective column. But why did we choose this specific strategy?
+The `SimpleImputer` lets us fill in missing values with the most frequent value
+in the respective column. But why did we choose this specific strategy?
 
 <quiz>
 Why do we plan to fill missing values with the most frequent value (the mode) and not the mean or median?
@@ -546,30 +567,33 @@ Why do we plan to fill missing values with the most frequent value (the mode) an
 ???+ info
 
     You might wonder why we didn't include the imputation step in the
-    `ColumnTransformer`. The reason is that passing the same column to more 
-    than one step leads to issues. As the `ColumnTransformer` runs in 
-    parallel and does not apply the steps sequentially.
+    `ColumnTransformer`. The reason is that passing the same column to more than
+    one step leads to issues. As the `ColumnTransformer` runs in parallel and does
+    not apply the steps sequentially.
 
 ## Recap
 
-In this chapter, we started our practical data science project by exploring 
-the bank marketing data set further. We handled missing values and identified 
-attribute types. We then visualized the data to get a better understanding of 
-the features. During our discussion of appropriate preprocessing methods, 
-we discovered the term information leakage and how to prevent it.
-Finally, we introduced the `ColumnTransformer` to bundle preprocessing
-steps together.
+In this chapter, we started our practical data science project by exploring the
+bank marketing data set further. We handled missing values and identified
+attribute types. We then visualized the data to get a better understanding of
+the features. During our discussion of appropriate preprocessing methods, we
+discovered the term information leakage and how to prevent it. Finally, we
+introduced the `ColumnTransformer` to bundle preprocessing steps together.
 
 ### Code recap
 
-This time around, we also do a code recap. The essential findings in this 
+This time around, we also do a code recap. The essential findings in this
 chapter can be distilled to:
 
 ```python linenums="1"
 import pandas as pd
 from sklearn.compose import ColumnTransformer
-from sklearn.preprocessing import KBinsDiscretizer, OneHotEncoder, StandardScaler
 from sklearn.impute import SimpleImputer
+from sklearn.preprocessing import (
+    KBinsDiscretizer,
+    OneHotEncoder,
+    StandardScaler,
+)
 
 data = pd.read_csv("data/bank-merged.csv")
 data = data.replace("unknown", None)
@@ -578,20 +602,39 @@ impute = SimpleImputer(strategy="most_frequent", missing_values=None)
 
 preprocessor = ColumnTransformer(
     transformers=[
-        ("nominal", OneHotEncoder(), 
-         ["default", "housing", "loan", "contact", "poutcome", "job", "marital"]),
-        
-        ("ordinal", OneHotEncoder(), 
-         ["month", "day_of_week", "education"]),
-        
-        ("binning", KBinsDiscretizer(n_bins=5, strategy="uniform", encode="onehot"),
-         ["age", "campaign", "pdays", "previous"]),
-        
-        ("zscore", StandardScaler(), 
-         ["emp.var.rate", "cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed"]),
+        (
+            "nominal",
+            OneHotEncoder(),
+            [
+                "default",
+                "housing",
+                "loan",
+                "contact",
+                "poutcome",
+                "job",
+                "marital",
+            ],
+        ),
+        ("ordinal", OneHotEncoder(), ["month", "day_of_week", "education"]),
+        (
+            "binning",
+            KBinsDiscretizer(n_bins=5, strategy="uniform", encode="onehot"),
+            ["age", "campaign", "pdays", "previous"],
+        ),
+        (
+            "zscore",
+            StandardScaler(),
+            [
+                "emp.var.rate",
+                "cons.price.idx",
+                "cons.conf.idx",
+                "euribor3m",
+                "nr.employed",
+            ],
+        ),
     ]
 )
 ```
 
-In the next chapter we will apply the preprocessing steps to a train and 
-test split. Subsequently, we fit the first model.
+In the next chapter we will apply the preprocessing steps to a train and test
+split. Subsequently, we fit the first model.
diff --git a/docs/data-science/practice/end-to-end.md b/docs/data-science/practice/end-to-end.md
index d2d1bf2d..98c17bb4 100644
--- a/docs/data-science/practice/end-to-end.md
+++ b/docs/data-science/practice/end-to-end.md
@@ -1,13 +1,13 @@
 ## Introduction
 
-We distill all relevant code blocks from the previous two chapters into
-one cohesive script/notebook. This file will be an end-to-end example to fit
-a machine learning model on the bank marketing data set. Lastly, we will
-save the model to disk.
+We distill all relevant code blocks from the previous two chapters into one
+cohesive script/notebook. This file will be an end-to-end example to fit a
+machine learning model on the bank marketing data set. Lastly, we will save the
+model to disk.
 
 ???+ tip
 
-    The script/notebook we will create, can serve as a reference point for your 
+    The script/notebook we will create, can serve as a reference point for your
     further data science projects.
 
 So start by creating yet another script/notebook.
@@ -27,13 +27,13 @@ So start by creating yet another script/notebook.
 In the previous chapters, we:
 
 1. Loaded the data
-2. Defined techniques to impute (`SimpleImputer`) and preprocess the data
-   (`ColumnTransformer`)
-3. Split the data into train and test sets
-4. Applied imputation and preprocessing techniques to the data
-5. Evaluated different model types and concluded that a 
-   `RandomForestClassifier` is the best model (we found) for this task
-6. Fit and evaluated the random forest
+1. Defined techniques to impute (`SimpleImputer`) and preprocess the data
+    (`ColumnTransformer`)
+1. Split the data into train and test sets
+1. Applied imputation and preprocessing techniques to the data
+1. Evaluated different model types and concluded that a
+    `RandomForestClassifier` is the best model (we found) for this task
+1. Fit and evaluated the random forest
 
 Here are the bullet points distilled in one code block:
 
@@ -58,17 +58,36 @@ impute = SimpleImputer(strategy="most_frequent", missing_values=None)
 
 preprocessor = ColumnTransformer(
     transformers=[
-        ("nominal", OneHotEncoder(),
-         ["default", "housing", "loan", "contact", "poutcome", "job", "marital"]),
-
-        ("ordinal", OneHotEncoder(),
-         ["month", "day_of_week", "education"]),
-
-        ("binning", KBinsDiscretizer(n_bins=5, strategy="uniform", encode="onehot"),
-         ["age", "campaign", "pdays", "previous"]),
-
-        ("zscore", StandardScaler(),
-         ["emp.var.rate", "cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed"]),
+        (
+            "nominal",
+            OneHotEncoder(),
+            [
+                "default",
+                "housing",
+                "loan",
+                "contact",
+                "poutcome",
+                "job",
+                "marital",
+            ],
+        ),
+        ("ordinal", OneHotEncoder(), ["month", "day_of_week", "education"]),
+        (
+            "binning",
+            KBinsDiscretizer(n_bins=5, strategy="uniform", encode="onehot"),
+            ["age", "campaign", "pdays", "previous"],
+        ),
+        (
+            "zscore",
+            StandardScaler(),
+            [
+                "emp.var.rate",
+                "cons.price.idx",
+                "cons.conf.idx",
+                "euribor3m",
+                "nr.employed",
+            ],
+        ),
     ]
 )
 
@@ -105,7 +124,7 @@ forest = RandomForestClassifier(
     max_depth=10,
     min_samples_leaf=10,
     random_state=42,
-    class_weight="balanced"
+    class_weight="balanced",
 )
 forest.fit(X_train, y_train)
 
@@ -121,17 +140,16 @@ Balanced accuracy: 0.7445
 
 ???+ question "Copy and execute the block"
 
-    Since the code block is nothing new, simply copy and execute it.
-    If everything went smoothly, you should see the balanced accuracy score 
-    printed.
+    Since the code block is nothing new, simply copy and execute it. If everything
+    went smoothly, you should see the balanced accuracy score printed.
 
 ## Re-fit on whole data set
 
-Previously, we split our data into train and test sets. Using the test set 
-we were able to estimate the performance of our model. That's the whole 
-purpose of the test set.
+Previously, we split our data into train and test sets. Using the test set we
+were able to estimate the performance of our model. That's the whole purpose of
+the test set.
 
-Now, our goal is to save the trained model for future use. Therefore, in 
+Now, our goal is to save the trained model for future use. Therefore, in
 practice, we want to leverage the power of the whole data set. Thus, we re-fit
 the model on the whole data set to make use of all available data.
 
@@ -146,8 +164,8 @@ y = encoder.transform(y)
 ```
 
 To preprocess the whole data set, we can reuse the `impute` and `preprocessor`
-objects. We only need to transform the data and encode the target. Lastly,
-we fit the model on the whole data set. It's as simple as:
+objects. We only need to transform the data and encode the target. Lastly, we
+fit the model on the whole data set. It's as simple as:
 
 ```python
 forest.fit(X, y)
@@ -155,17 +173,17 @@ forest.fit(X, y)
 
 ???+ info
 
-    Note, we can simply call `fit()` again, this will "overwrite" the previous 
+    Note, we can simply call `fit()` again, this will "overwrite" the previous
     model and uses the whole data set to fit the model once-again.
 
-The `forest` is now fitted on the whole data set. That's it! We have our final 
+The `forest` is now fitted on the whole data set. That's it! We have our final
 model which we will save to disk. :party_popper:
 
 ## Model persistence
 
-To save the model to disk, we can use 
-[`pickle`](https://docs.python.org/3.12/library/pickle.html). It's a part of 
-base :fontawesome-brands-python: Python. With `pickle`, you can save any Python 
+To save the model to disk, we can use
+[`pickle`](https://docs.python.org/3.12/library/pickle.html). It's a part of
+base :fontawesome-brands-python: Python. With `pickle`, you can save any Python
 object and load it back later.
 
 <div style="text-align: center; border-radius: 15px;">
@@ -194,11 +212,11 @@ with open("list.pkl", "wb") as file:
 
 Let's break down the code block:
 
-1. We open a new file named `list.pkl`; `.pkl` is just a common extension 
-   for `pickle` files.
-2. The file is opened in write-binary mode (`"wb"`) - as pickle files are 
-   binary files.
-3. We use `pickle.dump()` to save the object `simple_list` to the file.
+1. We open a new file named `list.pkl`; `.pkl` is just a common extension for
+    `pickle` files.
+1. The file is opened in write-binary mode (`"wb"`) - as pickle files are
+    binary files.
+1. We use `pickle.dump()` to save the object `simple_list` to the file.
 
 ???+ info
 
@@ -207,7 +225,7 @@ Let's break down the code block:
 ### Save the model
 
 Let's extend this knowledge to save our model. Unfortunately, it's not just a
-matter of saving the `forest` object. First, we look at the steps we need to 
+matter of saving the `forest` object. First, we look at the steps we need to
 take to make a prediction for a new client:
 
 <div style="text-align: center;">
@@ -230,16 +248,16 @@ To get our prediction process working, we need to save all objects involved:
 
 ???+ warning "Critical: Save ALL preprocessing objects!"
 
-    **You must save every single object** used in the prediction pipeline, not
-    just the model!
-    
+    **You must save every single object** used in the prediction pipeline, not just
+    the model!
+
     Missing even one object will break your predictions:
-    
+
     - Missing `impute` → Cannot handle new missing values
     - Missing `preprocessor` → Cannot transform features correctly
     - Missing `encoder` → Cannot convert predictions back to original labels
     - Missing `forest` → Cannot make predictions
-    
+
     **The model is useless without its preprocessing pipeline!** :warning:
 
 We can save all these objects in one file using a simple `#!python dict`:
@@ -256,14 +274,14 @@ with open("bank-model.pkl", "wb") as file:
     pickle.dump(model, file)
 ```
 
-Bundling all objects in a dictionary ensures you never accidentally 
-forget a component. When you load `bank-model.pkl`, you have **everything** 
-needed for predictions in one place.
+Bundling all objects in a dictionary ensures you never accidentally forget a
+component. When you load `bank-model.pkl`, you have **everything** needed for
+predictions in one place.
 
 ???+ question "Load the model"
 
     Create a new script or notebook which we will use to test the saved model.
-    
+
     Use the following code block to load the `model` `#!python dict`.
 
     ```python
@@ -277,14 +295,13 @@ needed for predictions in one place.
 
 ???+ danger
 
-    Do not download and load `pickle` files from the internet, unless you 
-    trust the source. Since, `pickle` can execute arbitrary code, it can be 
-    a security risk.
+    Do not download and load `pickle` files from the internet, unless you trust the
+    source. Since, `pickle` can execute arbitrary code, it can be a security risk.
 
 ## Predictions
 
-Let's run the prediction process. Assume the bank contacted another client 
-with following attributes:
+Let's run the prediction process. Assume the bank contacted another client with
+following attributes:
 
 ```python
 import pandas as pd
@@ -311,7 +328,8 @@ client = pd.DataFrame(
         "job": "retired",
         "marital": "divorced",
         "education": "professional.course",
-    }, index=[0]
+    },
+    index=[0],
 )
 ```
 
@@ -322,41 +340,41 @@ Does the client subscribe to a term deposit? :thinking:
     Predict if the client will subscribe to a term deposit.
 
     1. Use the above code snippet to create a new observation `client`.
-    2. Use all objects in the dictionary `model` to make a prediction.
-    
-    Hint: To make a prediction, simply implement the above prediction process 
+    1. Use all objects in the dictionary `model` to make a prediction.
+
+    Hint: To make a prediction, simply implement the above prediction process
     illustrated as a graph.
 
-Try to solve the task on your own. For completeness, we provide one possible 
+Try to solve the task on your own. For completeness, we provide one possible
 solution.
 
 ??? info
-    
+
     ```python
     def predict(model, client):
         # preprocess the client data
         X = model["imputer"].transform(client)
         X = pd.DataFrame(X, columns=client.columns)
         X = model["preprocessor"].transform(X)
-    
+
         # make a prediction
         prediction = model["forest"].predict(X)
         # inverse transform (0, 1) to ("no", "yes")
         prediction = model["target-encoder"].inverse_transform(prediction)
-    
+
         return prediction
     ```
 
 ## Conclusion
 
-Across three chapters, we successfully reached our end goal: To build a 
-machine learning model on the bank marketing data set. We ended up with a 
-random forest model with a balanced accuracy of 74.45%.
+Across three chapters, we successfully reached our end goal: To build a machine
+learning model on the bank marketing data set. We ended up with a random forest
+model with a balanced accuracy of 74.45%.
 
 The saved model can be deployed in a production environment. The prediction
 process is straightforward and can be easily applied on new clients.
 
----
+______________________________________________________________________
 
 <div style="text-align: center;">
     <h3>
@@ -365,22 +383,22 @@ process is straightforward and can be easily applied on new clients.
     </h3>
 </div>
 
----
+______________________________________________________________________
 
 ## Outlook
 
-There are many more avenues to explore in the data science/machine learning 
+There are many more avenues to explore in the data science/machine learning
 landscape:
 
-
 ### :rocket: Model deployment
 
-Learn how to deploy a model in a production environment. This can be done
-with a REST API, a web application or a mobile application (among others).
+Learn how to deploy a model in a production environment. This can be done with
+a REST API, a web application or a mobile application (among others).
 
 Start with:
 
-- `fastapi` for building APIs [:octicons-link-external-16:](https://fastapi.tiangolo.com/)
+- `fastapi` for building APIs
+    [:octicons-link-external-16:](https://fastapi.tiangolo.com/)
 
 which is a great way to serve your model.
 
@@ -389,29 +407,32 @@ which is a great way to serve your model.
 The Open Neural Network Exchange (ONNX) format provides an interesting
 alternative to `pickle`. ONNX allows you to convert your trained models into a
 standardized format that can be run efficiently across different platforms and
-programming languages. 
+programming languages.
 
-For example, `onnx` allows you to build the model in 
-:fontawesome-brands-python: Python and deploy it with :fontawesome-brands-js: 
+For example, `onnx` allows you to build the model in
+:fontawesome-brands-python: Python and deploy it with :fontawesome-brands-js:
 JavaScript.
 
 Start with:
 
-- `onnx` documentation for Python [:octicons-link-external-16:](https://onnx.ai/onnx/intro/python.html)
-- `onnx` with `scikit-learn` [:octicons-link-external-16:](https://scikit-learn.org/stable/model_persistence.html#onnx)
+- `onnx` documentation for Python
+    [:octicons-link-external-16:](https://onnx.ai/onnx/intro/python.html)
+- `onnx` with `scikit-learn`
+    [:octicons-link-external-16:](https://scikit-learn.org/stable/model_persistence.html#onnx)
 
 ### :toolbox: Expand your model toolkit
 
-We covered a selection of different model types, yet there are many more to 
-explore. `scikit-learn` offers many more models for classification, regression, 
+We covered a selection of different model types, yet there are many more to
+explore. `scikit-learn` offers many more models for classification, regression,
 clustering or dimensionality reduction.
 
 Since you're already familiar with `scikit-learn`, applying these models is
-straightforward. 
+straightforward.
 
 Start with:
 
-- `scikit-learn` documentation. [:octicons-link-external-16:](https://scikit-learn.org/stable/index.html)
+- `scikit-learn` documentation.
+    [:octicons-link-external-16:](https://scikit-learn.org/stable/index.html)
 
 ### :wrench: Advanced pipeline techniques
 
diff --git a/docs/data-science/practice/index.md b/docs/data-science/practice/index.md
index 2afe4fae..9a838950 100644
--- a/docs/data-science/practice/index.md
+++ b/docs/data-science/practice/index.md
@@ -2,10 +2,10 @@
 
 ## Introduction
 
-In this course block and its subsequent chapters we will demonstrate how to 
-build a machine learning model in practice. In the end, you will have a 
-"ready-to-go" model that can predict whether a bank customer will subscribe 
-to a term deposit.
+In this course block and its subsequent chapters we will demonstrate how to
+build a machine learning model in practice. In the end, you will have a
+"ready-to-go" model that can predict whether a bank customer will subscribe to
+a term deposit.
 
 Along the way we will explore useful functionalities of `scikit-learn`, common
 pitfalls in data science projects, how to properly save a model, and conclude
@@ -13,32 +13,31 @@ with a bonus section on pipelines to automate the entire modelling process.
 
 Let's get started! :rocket:
 
----
+______________________________________________________________________
 
 Remember the bank marketing data set that we used to explore in the Data
 Preparation & Preprocessing portion and then completely abandoned in the last
 couple of chapters? Well, it's time to bring it back!
 
 ???+ info
-    
+
     The bank marketing data was adapted from:
 
-    ^^S. Moro, P. Cortez and P. Rita (2014). *A Data-Driven Approach to 
-    Predict the Success of Bank Telemarketing*[^1]^^
-    
-    [^1]:
-        Decision Support Systems, Volume 62, June 2014, Pages 22-31:
-        [https://doi.org/10.1016/j.dss.2014.03.001](https://doi.org/10.1016/j.dss.2014.03.001)
-    
-    The publicly available dataset is from a Portuguese retail bank 
-    and houses information on direct marketing campaigns (phone calls). Bank 
-    customers were contacted and asked to subscribe to a term deposit.
+    ^^S. Moro, P. Cortez and P. Rita (2014). *A Data-Driven Approach to Predict the
+    Success of Bank Telemarketing*[^1]^^
+
+    [^1]: Decision Support Systems, Volume 62, June 2014, Pages 22-31:
+    [https://doi.org/10.1016/j.dss.2014.03.001](https://doi.org/10.1016/j.dss.2014.03.001)
+
+    The publicly available dataset is from a Portuguese retail bank and houses
+    information on direct marketing campaigns (phone calls). Bank customers were
+    contacted and asked to subscribe to a term deposit.
 
 ## Prerequisites
 
 ### 0. :trophy: What's our goal?
 
-First, let's define the end goal: 
+First, let's define the end goal:
 
 <div style="text-align: center; margin-top: 1em;">
     <p>
@@ -49,16 +48,15 @@ First, let's define the end goal:
 
 ???+ tip
 
-    Put simply, a term deposit is a type of bank account where you agree to
-    lock away your money for a fixed period of time (the "term") in exchange 
-    for a guaranteed interest rate that's typically higher than a regular 
-    savings account.
+    Put simply, a term deposit is a type of bank account where you agree to lock
+    away your money for a fixed period of time (the "term") in exchange for a
+    guaranteed interest rate that's typically higher than a regular savings
+    account.
 
-Using information such as clients' demographic details, economic 
-indicators, and marketing campaign data, we aim to solve this binary 
-classification task.
+Using information such as clients' demographic details, economic indicators,
+and marketing campaign data, we aim to solve this binary classification task.
 
----
+______________________________________________________________________
 
 Before we dive in, you have to set up the project which will be used throughout
 the remainder of this course.
@@ -76,8 +74,8 @@ Start with creating the following project structure:
 
 ???+ danger
 
-    Since we want to make sure that everyone uses the same initial data set,
-    we urge you to re-download it and place it within your `data/` folder.
+    Since we want to make sure that everyone uses the same initial data set, we
+    urge you to re-download it and place it within your `data/` folder.
 
 
     <div class="center-button" markdown>
diff --git a/docs/data-science/practice/modelling.md b/docs/data-science/practice/modelling.md
index 3920fd66..38526b54 100644
--- a/docs/data-science/practice/modelling.md
+++ b/docs/data-science/practice/modelling.md
@@ -14,13 +14,13 @@ Create a new notebook or script. Your project should now look like this:
 ├── ...
 ```
 
-Copy the code block from the previous recap section to get started. 
+Copy the code block from the previous recap section to get started.
 [:octicons-link-external-16:](data-preparation.md#code-recap)
 
 ## Apply preprocessing
 
-Now it is time to actually apply all preprocessing steps. To prevent 
-information leakage, we will split the data into training and test sets first. 
+Now it is time to actually apply all preprocessing steps. To prevent
+information leakage, we will split the data into training and test sets first.
 
 ```python
 from sklearn.model_selection import train_test_split
@@ -36,15 +36,15 @@ X_train, X_test, y_train, y_test = train_test_split(
 
 ???+ tip
 
-    By default `train_test_split()` shuffles the data before splitting which
-    is a good practice. It helps to avoid any bias that might be present in the
-    order of the data.
+    By default `train_test_split()` shuffles the data before splitting which is a
+    good practice. It helps to avoid any bias that might be present in the order of
+    the data.
 
-    However, if you are ever working with data that is dependent on the order
-    (time series data), you should not shuffle the data. In that case, set 
+    However, if you are ever working with data that is dependent on the order (time
+    series data), you should not shuffle the data. In that case, set
     `#!python shuffle=False`.
 
-Now apply the preprocessing steps (from the previous chapter) to the training 
+Now apply the preprocessing steps (from the previous chapter) to the training
 and test sets:
 
 ```python
@@ -65,15 +65,14 @@ X_test = preprocessor.transform(X_test)
 
 ???+ info
 
-    The `impute.transform()` method returns an array.
-    Since the `preprocessor` requires the column names of our data, we need to 
-    convert the array back to a `DataFrame`. Else the `preprocessor` will not
-    work!
+    The `impute.transform()` method returns an array. Since the `preprocessor`
+    requires the column names of our data, we need to convert the array back to a
+    `DataFrame`. Else the `preprocessor` will not work!
 
----
+______________________________________________________________________
 
 > The general rule is to never call `fit` on the test data.
-> 
+>
 > [`scikit-learn`: Common pitfalls and recommended practices](https://scikit-learn.org/stable/common_pitfalls.html#data-leakage)
 
 ???+ info
@@ -81,20 +80,20 @@ X_test = preprocessor.transform(X_test)
     For example, when `impute.fit(X_train)` is called, the `SimpleImputer`
     calculates the mode solely from the training data - `X_train`. When
     `impute.transform(X_test)` is called, the `SimpleImputer` uses the mode
-    calculated from `X_train` to fill in missing values in `X_test`. So we 
-    never use the test set to calculate the mode.
-    
-    The same principles apply to the `preprocessor` and thus we can prevent 
+    calculated from `X_train` to fill in missing values in `X_test`. So we never
+    use the test set to calculate the mode.
+
+    The same principles apply to the `preprocessor` and thus we can prevent
     information leakage.
 
----
+______________________________________________________________________
 
 ## Train a model
 
 For starters, we will train a simple decision tree. First, we need to encode
-our target `#!python "y"` as we are still dealing with `#!python str` 
-labels (`#!python "yes"` or `#!python "no"`). For this purpose, we can use
-the [`LabelEncoder`](../data/preprocessing.md#label-encoding).
+our target `#!python "y"` as we are still dealing with `#!python str` labels
+(`#!python "yes"` or `#!python "no"`). For this purpose, we can use the
+[`LabelEncoder`](../data/preprocessing.md#label-encoding).
 
 ```python
 from sklearn.preprocessing import LabelEncoder
@@ -105,9 +104,9 @@ y_train = encoder.transform(y_train)
 y_test = encoder.transform(y_test)
 ```
 
-Now, we fit the first model. We set the parameters `max_depth` and 
-`min_samples_leaf` to [prune](../algorithms/supervised/tree-based/cart.md#to-fix)
-the tree.
+Now, we fit the first model. We set the parameters `max_depth` and
+`min_samples_leaf` to
+[prune](../algorithms/supervised/tree-based/cart.md#to-fix) the tree.
 
 ```python
 from sklearn.tree import DecisionTreeClassifier
@@ -124,13 +123,13 @@ print(f"Accuracy: {round(score, 2)}")
 Accuracy: 0.89
 ```
 
-89 % accuracy seems like a perfect start! But don't get too excited yet, we 
+89 % accuracy seems like a perfect start! But don't get too excited yet, we
 overlooked a small yet crucial detail.
 
 ### Detour: Class imbalance
 
-In classification, the accuracy is a good metric to evaluate the performance 
-of a model. However, it is not always the most appropriate one. Here's why:
+In classification, the accuracy is a good metric to evaluate the performance of
+a model. However, it is not always the most appropriate one. Here's why:
 
 ```python
 print(y.value_counts(normalize=True))
@@ -144,12 +143,12 @@ yes    0.10998
 
 The target variable `#!python "y"` is imbalanced. The class `#!python "no"`
 occurs in 89% of the cases, while `#!python "yes"` accounts for roughly 11%.
-This means that a model that constantly predicts `#!python "no"` for every 
+This means that a model that constantly predicts `#!python "no"` for every
 observation would achieve an accuracy of 89%.
 
-Thus, our decision tree is not as good as it seems, and we need to 
-consider other metrics to evaluate its performance. For our task, we pick the 
-balanced accuracy score.
+Thus, our decision tree is not as good as it seems, and we need to consider
+other metrics to evaluate its performance. For our task, we pick the balanced
+accuracy score.
 
 #### Confusion matrix
 
@@ -158,7 +157,7 @@ confusion matrix. Let's calculate it and explain it step by step.
 
 ```python
 import matplotlib.pyplot as plt
-from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
+from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
 
 y_pred = tree.predict(X_test)
 # calculate the matrix
@@ -178,36 +177,34 @@ plt.show()
 
 ???+ info
 
-    The labels in red were added by the author and their creation is not 
-    part of the code block. They are simply used to facilitate the 
-    following explanation.
+    The labels in red were added by the author and their creation is not part of
+    the code block. They are simply used to facilitate the following explanation.
 
 Put simply, the confusion matrix compares the actual values with the predicted
 values (of our test set). The matrix is divided into four quadrants:
 
-- **True Positives (TP)**: The model correctly predicted the positive class 
-  (yes - client subscribed to a term deposit). We have `#!python 20` true 
-  positive cases.
-- **True Negatives (TN)**: The model correctly predicted the negative class 
-  (no - client did not subscribe to a term deposit). In our instance, 
-  `#!python 683` true negative cases.
+- **True Positives (TP)**: The model correctly predicted the positive class
+    (yes - client subscribed to a term deposit). We have `#!python 20` true
+    positive cases.
+- **True Negatives (TN)**: The model correctly predicted the negative class (no
+    \- client did not subscribe to a term deposit). In our instance,
+    `#!python 683` true negative cases.
 
-With the first diagonal covered, we move on to the instances that were 
+With the first diagonal covered, we move on to the instances that were
 incorrectly predicted:
 
-- **False Positives (FP)**: The model predicted the positive class, but the 
-  actual class was negative. `#!python 16` to be specific.
-- **False Negatives (FN)**: The model predicted the negative class, but the 
-  actual class was positive. `#!python 67` to be exact.
+- **False Positives (FP)**: The model predicted the positive class, but the
+    actual class was negative. `#!python 16` to be specific.
+- **False Negatives (FN)**: The model predicted the negative class, but the
+    actual class was positive. `#!python 67` to be exact.
 
 ???+ tip
 
-    A perfect model would have 0 false positives and 0 false negatives.
-    Hence, we want to minimize the false positives and false negatives.
+    A perfect model would have 0 false positives and 0 false negatives. Hence, we
+    want to minimize the false positives and false negatives.
 
-    Generally, the confusion matrix is a great tool to understand the
-    performance of a model. It is especially useful when dealing with
-    imbalanced classes.
+    Generally, the confusion matrix is a great tool to understand the performance
+    of a model. It is especially useful when dealing with imbalanced classes.
 
 Regarding our first model, we can simply conclude that there is still a lot of
 room for improvement. With an understanding of the confusion matrix, we can
@@ -223,9 +220,9 @@ The balanced accuracy score is defined as:
     \text{Balanced accuracy} = \frac{1}{2} \left( \frac{TP}{TP + FN} + \frac{TN}{TN + FP} \right)
     \]
 
-> In the binary case, balanced accuracy is equal to the arithmetic mean of 
+> In the binary case, balanced accuracy is equal to the arithmetic mean of
 > sensitivity (true positive rate) and specificity (true negative rate)
-> 
+>
 > [`scikit-learn`: Balanced accuracy score](https://scikit-learn.org/stable/modules/model_evaluation.html#balanced-accuracy-score)
 
 The balanced accuracy score ranges from 0 to 1. A score of 1 indicates a
@@ -236,8 +233,8 @@ perfect model.
     Calculate the balanced accuracy score for the decision tree model.
 
     Use the results from the above confusion matrix and the formula for the
-    balanced accuracy score. You can perform the calculation on a piece of 
-    paper or use simple arithmetic in Python.
+    balanced accuracy score. You can perform the calculation on a piece of paper or
+    use simple arithmetic in Python.
 
 Let's compare your result with the one calculated by `scikit-learn`.
 
@@ -256,25 +253,24 @@ Hopefully, your result matches the one calculated by `scikit-learn`.
 
 Compared to the accuracy of 89%, the balanced accuracy score of 60% gives a
 more realistic view of the model's performance. In turn, this means we have to
-improve our model. 
+improve our model.
 
 ???+ tip
 
-    If you want to know more about different metrics and scoring, check out 
-    this excellent guide. It not only covers classification metrics, but also 
-    multiple ways to score regression models (apart from the \(R^2\)).
+    If you want to know more about different metrics and scoring, check out this
+    excellent guide. It not only covers classification metrics, but also multiple
+    ways to score regression models (apart from the \(R^2\)).
 
-    
     [`scikit-learn`: Metrics and scoring: quantifying the quality of predictions](https://scikit-learn.org/stable/modules/model_evaluation.html)
 
 ## Detour: Reproducibility
 
-Since, we are already on detours, let's take another one. Up until now, we 
-have always set the `random_state` parameter (if available). As we have covered
-multiple times, this ensures the reproducibility of our results. We set it 
-when splitting the data, when initializing a model, etc.
+Since, we are already on detours, let's take another one. Up until now, we have
+always set the `random_state` parameter (if available). As we have covered
+multiple times, this ensures the reproducibility of our results. We set it when
+splitting the data, when initializing a model, etc.
 
-But what happens if you forget to set the `random_state` parameter? To 
+But what happens if you forget to set the `random_state` parameter? To
 demonstrate the outcome, we generate a data set. In a loop we split the data,
 train a tree and calculate the balanced accuracy score. We repeat this process
 10 times:
@@ -327,28 +323,30 @@ As you can see the model's performance varies greatly!
 ???+ danger
 
     The code block illustrates the importance of the `random_state` parameter.
-    Without it, you won't be able to reproduce your own results and others 
-    won't be able to reproduce your results either.
+    Without it, you won't be able to reproduce your own results and others won't be
+    able to reproduce your results either.
 
-    Specifically, in a science setting and real-world applications, 
-    reproducibility is crucial to validate findings and conclusions. So 
-    ensure reproducibility!
+    Specifically, in a science setting and real-world applications, reproducibility
+    is crucial to validate findings and conclusions. So ensure reproducibility!
 
 ## More modelling
 
-Let's get back on track and try out more models. We will compare their 
+Let's get back on track and try out more models. We will compare their
 performance with the balanced accuracy score.
 
 ### Random forest
 
-Naturally, since we started with a CART (decision tree), we try a random 
+Naturally, since we started with a CART (decision tree), we try a random
 forest.
 
 ```python
 from sklearn.ensemble import RandomForestClassifier
 
 forest = RandomForestClassifier(
-    n_estimators=100, max_depth=15, min_samples_leaf=10, random_state=42  # (1)!
+    n_estimators=100,
+    max_depth=15,
+    min_samples_leaf=10,
+    random_state=42,  # (1)!
 )
 forest.fit(X_train, y_train)
 
@@ -359,7 +357,7 @@ print(f"Forest balanced accuracy: {round(score_forest, 4)}")
 ```
 
 1. We adopt the values for `max_depth` and `min_samples_leaf` from the decision
-   tree.
+    tree.
 
 ```title=">>> Output"
 Forest balanced accuracy: 0.5927
@@ -370,7 +368,7 @@ has a balanced accuracy of 59.27%. Somehow, the performance got even worse!
 
 #### `class_weight` parameter
 
-We can try to improve the performance by setting the `class_weight` parameter 
+We can try to improve the performance by setting the `class_weight` parameter
 to `#!python balanced`. This takes the class imbalance into consideration.
 
 ```python hl_lines="6"
@@ -401,18 +399,19 @@ Now we were able to improve the performance significantly, namely to 73.37%.
 
     What about a logistic regression model? How does it perform?
 
-    1. Initialize and fit a `LogisticRegression` model with `#!python 
-        class_weight="balanced"`. Don't forget to set the `random_state`.
-    2. Calculate the balanced accuracy score for the test set.
-    3. Compare the results to the decision tree and random forest.
+    1. Initialize and fit a `LogisticRegression` model with
+        `#!python   class_weight="balanced"`. Don't forget to set the
+        `random_state`.
+    1. Calculate the balanced accuracy score for the test set.
+    1. Compare the results to the decision tree and random forest.
 
 ??? info
 
-    Depending on the parameter settings, the logistic regression model 
-    achieves similar performance to the random forest. 
+    Depending on the parameter settings, the logistic regression model achieves
+    similar performance to the random forest.
 
-As you can see, with a preprocessed data set, we can now easily compare 
-different models. 
+As you can see, with a preprocessed data set, we can now easily compare
+different models.
 
 These are our results so far:
 
@@ -422,14 +421,14 @@ These are our results so far:
 
 ## Hyperparameter tuning
 
-So far, we've used arbitrary values for our model parameters 
-(`#!python max_depth=15, min_samples_leaf=10`, etc.). However, these
-might not be optimal for our specific problem. Therefore, we apply 
-hyperparameter tuning. Hyperparameter tuning is the process of finding the 
-best combination of model parameters to maximize performance.
+So far, we've used arbitrary values for our model parameters
+(`#!python max_depth=15, min_samples_leaf=10`, etc.). However, these might not
+be optimal for our specific problem. Therefore, we apply hyperparameter tuning.
+Hyperparameter tuning is the process of finding the best combination of model
+parameters to maximize performance.
 
-For starters, we will perform a manual hyperparameter tuning for the 
-maximum depth (`max_depth`) parameter. We will test the values
+For starters, we will perform a manual hyperparameter tuning for the maximum
+depth (`max_depth`) parameter. We will test the values
 `#!python [5, 10, 15, 20, 25]`.
 
 ```python hl_lines="1 6"
@@ -441,7 +440,7 @@ for n in max_depth:
         max_depth=n,
         min_samples_leaf=10,
         random_state=42,
-        class_weight="balanced"
+        class_weight="balanced",
     )
     forest.fit(X_train, y_train)
     y_pred = forest.predict(X_test)
@@ -457,45 +456,46 @@ max_depth=20: 0.733
 max_depth=25: 0.733
 ```
 
-The best performance is achieved with a `max_depth` of `#!python 10`. So the 
-initial value of `#!python 15` was not optimal. Next, we could try to 
-optimize the number of trees (`n_estimators`), the minimum number of samples
-required to be at a leaf node (`min_samples_leaf`), etc. You get the point ...
+The best performance is achieved with a `max_depth` of `#!python 10`. So the
+initial value of `#!python 15` was not optimal. Next, we could try to optimize
+the number of trees (`n_estimators`), the minimum number of samples required to
+be at a leaf node (`min_samples_leaf`), etc. You get the point ...
 
 ???+ tip
 
-    However, this manual tuning is time-consuming and not always feasible.
-    In the last (advanced) chapter of this course, we will introduce you to 
-    automated hyperparameter tuning.
+    However, this manual tuning is time-consuming and not always feasible. In the
+    last (advanced) chapter of this course, we will introduce you to automated
+    hyperparameter tuning.
 
 ???+ info
 
     You can spend hours tuning hyperparameters. So, don't get lost in the
     hyperparameter tuning process.
 
-    :warning: *Spoiler alert* :warning:: With this specific data set and 
-    hyperparameter tuning, you won't significantly surpass the results we 
-    have achieved so far.
+    :warning: *Spoiler alert* :warning:: With this specific data set and
+    hyperparameter tuning, you won't significantly surpass the results we have
+    achieved so far.
 
 ## The result
 
-We conclude that a 
+We conclude that a
 
----
+______________________________________________________________________
 
 ```python
 RandomForestClassifier(
-    n_estimators=100, 
+    n_estimators=100,
     max_depth=10,
-    min_samples_leaf=10, 
-    random_state=42, 
-    class_weight="balanced"
+    min_samples_leaf=10,
+    random_state=42,
+    class_weight="balanced",
 )
 ```
-is the best model we have found for our task. It achieves a balanced 
-accuracy of 74.45%.
 
----
+is the best model we have found for our task. It achieves a balanced accuracy
+of 74.45%.
+
+______________________________________________________________________
 
 <div style="text-align: center;">
     <h4>Here is the main takeaway:</h4>
@@ -503,28 +503,28 @@ accuracy of 74.45%.
 
 ???+ tip
 
-    Unfortunately, with real world data sets you won't always achieve 
-    astounding results. But that's okay! :blush:
+    Unfortunately, with real world data sets you won't always achieve astounding
+    results. But that's okay! :blush:
 
-    If the performance does not meet your expectations, you can try 
-    following things:
+    If the performance does not meet your expectations, you can try following
+    things:
 
     - Feature engineering: Create new features or modify existing ones.
     - Preprocessing: Try different preprocessing steps.
     - Model selection: Try different models.
-    
-    But sometimes, it is also a possibility that the features can't describe
-    the target variable well enough or you simply need more data.
+
+    But sometimes, it is also a possibility that the features can't describe the
+    target variable well enough or you simply need more data.
 
 ## Recap
 
 We tried different models and evaluated their performance using the balanced
-accuracy score. Ultimately, we concluded that a random forest model 
-performed best. 
+accuracy score. Ultimately, we concluded that a random forest model performed
+best.
 
-Along the way, we introduced class imbalance, confusion matrix, balanced 
-accuracy and hyperparameter tuning. Another example illustrated the 
-importance of reproducibility.
+Along the way, we introduced class imbalance, confusion matrix, balanced
+accuracy and hyperparameter tuning. Another example illustrated the importance
+of reproducibility.
 
-Next, we distill our findings in an end-to-end example and save the 
-final model to disk.
+Next, we distill our findings in an end-to-end example and save the final model
+to disk.