From 8530deddbff979b2ca4304fc6e0f4635f4d725df Mon Sep 17 00:00:00 2001 From: Jakob Klotz Date: Mon, 9 Feb 2026 08:12:39 +0100 Subject: [PATCH 1/4] docs: emphasize need to pickle all objects --- docs/data-science/practice/end-to-end.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/data-science/practice/end-to-end.md b/docs/data-science/practice/end-to-end.md index bac5751b..d2d1bf2d 100644 --- a/docs/data-science/practice/end-to-end.md +++ b/docs/data-science/practice/end-to-end.md @@ -228,6 +228,20 @@ To get our prediction process working, we need to save all objects involved: - `encoder` - `forest` +???+ warning "Critical: Save ALL preprocessing objects!" + + **You must save every single object** used in the prediction pipeline, not + just the model! + + Missing even one object will break your predictions: + + - Missing `impute` → Cannot handle new missing values + - Missing `preprocessor` → Cannot transform features correctly + - Missing `encoder` → Cannot convert predictions back to original labels + - Missing `forest` → Cannot make predictions + + **The model is useless without its preprocessing pipeline!** :warning: + We can save all these objects in one file using a simple `#!python dict`: ```python @@ -242,6 +256,10 @@ with open("bank-model.pkl", "wb") as file: pickle.dump(model, file) ``` +Bundling all objects in a dictionary ensures you never accidentally +forget a component. When you load `bank-model.pkl`, you have **everything** +needed for predictions in one place. + ???+ question "Load the model" Create a new script or notebook which we will use to test the saved model. From b35896111ba44d81bf350ce77001c306e7bec198 Mon Sep 17 00:00:00 2001 From: Jakob Klotz Date: Mon, 9 Feb 2026 08:17:06 +0100 Subject: [PATCH 2/4] docs: script or notebook --- docs/data-science/data/preprocessing.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/data-science/data/preprocessing.md b/docs/data-science/data/preprocessing.md index 33b8703e..6838c776 100644 --- a/docs/data-science/data/preprocessing.md +++ b/docs/data-science/data/preprocessing.md @@ -66,9 +66,10 @@ we provide a distilled version of the code from Again, we urge you to use a virtual environment which by now, should be second nature anyway. -???+ info "Create a new notebook" +???+ info - To follow along, create a new Jupyter notebook within your project. + To follow along, create a new script or Jupyter notebook within your + project. ## Missing values From a250eb32a1779a61be30d5924a5cef2f0366d2f5 Mon Sep 17 00:00:00 2001 From: Jakob Klotz Date: Mon, 9 Feb 2026 08:33:13 +0100 Subject: [PATCH 3/4] docs: increase `max_iter` for convergence --- docs/data-science/algorithms/supervised/classification.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-science/algorithms/supervised/classification.md b/docs/data-science/algorithms/supervised/classification.md index 5cc9bfb8..0497507e 100644 --- a/docs/data-science/algorithms/supervised/classification.md +++ b/docs/data-science/algorithms/supervised/classification.md @@ -236,7 +236,7 @@ Now that we have our training data, we can train the logistic regression model. ```python from sklearn.linear_model import LogisticRegression -model = LogisticRegression(random_state=42, max_iter=3_000) # (1)! +model = LogisticRegression(random_state=42, max_iter=5_000) # (1)! model.fit(X_train, y_train) ``` @@ -261,7 +261,7 @@ print(f"Model weights: {model.coef_}") ``` ```title=">>> Output" -Model weights: [[ 0.98293997 0.22667548 -0.36956971 0.02637225 ... ]] +Model weights: [[ 0.98208299 0.22519686 -0.36688444 0.0262268 ... ]] ``` The `coef_` attribute contains the weight for each feature. From 3a6e68dbcda080759a109c9439fcaa43664b885a Mon Sep 17 00:00:00 2001 From: Jakob Klotz Date: Mon, 9 Feb 2026 08:58:43 +0100 Subject: [PATCH 4/4] docs: note on `lbfgs` solver --- .../algorithms/supervised/classification.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/data-science/algorithms/supervised/classification.md b/docs/data-science/algorithms/supervised/classification.md index 0497507e..a4b9a818 100644 --- a/docs/data-science/algorithms/supervised/classification.md +++ b/docs/data-science/algorithms/supervised/classification.md @@ -267,6 +267,21 @@ Model weights: [[ 0.98208299 0.22519686 -0.36688444 0.0262268 ... ]] The `coef_` attribute contains the weight for each feature. [As discussed](#deja-vu-linear-regression), the weights are real numbers. +???+ warning "You might not have the exact same results" + + Your model weights might differ slightly from the ones shown above. + This is completely normal and happens because: + + **Numerical precision**: The default optimization solver + (`#!python "lbfgs"`) behind `LogisticRegression` encounters tiny + hardware-specific variations. The underlying libraries handle + floating-point arithmetic differently across hardware platforms. During the + iterative optimization, these tiny rounding differences accumulate, + causing the solver to converge to slightly different solutions. + + :fontawesome-solid-lightbulb: These small differences don't affect your + model's predictions or accuracy. + Now, it's your turn to look at the bias. ???+ question "Model bias"