Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions docs/data-science/algorithms/supervised/classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ Now that we have our training data, we can train the logistic regression model.
```python
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(random_state=42, max_iter=3_000) # (1)!
model = LogisticRegression(random_state=42, max_iter=5_000) # (1)!
model.fit(X_train, y_train)
```

Expand All @@ -261,12 +261,27 @@ print(f"Model weights: {model.coef_}")
```

```title=">>> Output"
Model weights: [[ 0.98293997 0.22667548 -0.36956971 0.02637225 ... ]]
Model weights: [[ 0.98208299 0.22519686 -0.36688444 0.0262268 ... ]]
```

The `coef_` attribute contains the weight for each feature.
[As discussed](#deja-vu-linear-regression), the weights are real numbers.

???+ warning "You might not have the exact same results"

Your model weights might differ slightly from the ones shown above.
This is completely normal and happens because:

**Numerical precision**: The default optimization solver
(`#!python "lbfgs"`) behind `LogisticRegression` encounters tiny
hardware-specific variations. The underlying libraries handle
floating-point arithmetic differently across hardware platforms. During the
iterative optimization, these tiny rounding differences accumulate,
causing the solver to converge to slightly different solutions.

:fontawesome-solid-lightbulb: These small differences don't affect your
model's predictions or accuracy.

Now, it's your turn to look at the bias.

???+ question "Model bias"
Expand Down
5 changes: 3 additions & 2 deletions docs/data-science/data/preprocessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,10 @@ we provide a distilled version of the code from
Again, we urge you to use a virtual environment which by now, should be second
nature anyway.

???+ info "Create a new notebook"
???+ info

To follow along, create a new Jupyter notebook within your project.
To follow along, create a new script or Jupyter notebook within your
project.

## Missing values

Expand Down
18 changes: 18 additions & 0 deletions docs/data-science/practice/end-to-end.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,20 @@ To get our prediction process working, we need to save all objects involved:
- `encoder`
- `forest`

???+ warning "Critical: Save ALL preprocessing objects!"

**You must save every single object** used in the prediction pipeline, not
just the model!

Missing even one object will break your predictions:

- Missing `impute` → Cannot handle new missing values
- Missing `preprocessor` → Cannot transform features correctly
- Missing `encoder` → Cannot convert predictions back to original labels
- Missing `forest` → Cannot make predictions

**The model is useless without its preprocessing pipeline!** :warning:

We can save all these objects in one file using a simple `#!python dict`:

```python
Expand All @@ -242,6 +256,10 @@ with open("bank-model.pkl", "wb") as file:
pickle.dump(model, file)
```

Bundling all objects in a dictionary ensures you never accidentally
forget a component. When you load `bank-model.pkl`, you have **everything**
needed for predictions in one place.

???+ question "Load the model"

Create a new script or notebook which we will use to test the saved model.
Expand Down