diff --git a/docs/data-science/algorithms/supervised/classification.md b/docs/data-science/algorithms/supervised/classification.md index 5cc9bfb8..a4b9a818 100644 --- a/docs/data-science/algorithms/supervised/classification.md +++ b/docs/data-science/algorithms/supervised/classification.md @@ -236,7 +236,7 @@ Now that we have our training data, we can train the logistic regression model. ```python from sklearn.linear_model import LogisticRegression -model = LogisticRegression(random_state=42, max_iter=3_000) # (1)! +model = LogisticRegression(random_state=42, max_iter=5_000) # (1)! model.fit(X_train, y_train) ``` @@ -261,12 +261,27 @@ print(f"Model weights: {model.coef_}") ``` ```title=">>> Output" -Model weights: [[ 0.98293997 0.22667548 -0.36956971 0.02637225 ... ]] +Model weights: [[ 0.98208299 0.22519686 -0.36688444 0.0262268 ... ]] ``` The `coef_` attribute contains the weight for each feature. [As discussed](#deja-vu-linear-regression), the weights are real numbers. +???+ warning "You might not have the exact same results" + + Your model weights might differ slightly from the ones shown above. + This is completely normal and happens because: + + **Numerical precision**: The default optimization solver + (`#!python "lbfgs"`) behind `LogisticRegression` encounters tiny + hardware-specific variations. The underlying libraries handle + floating-point arithmetic differently across hardware platforms. During the + iterative optimization, these tiny rounding differences accumulate, + causing the solver to converge to slightly different solutions. + + :fontawesome-solid-lightbulb: These small differences don't affect your + model's predictions or accuracy. + Now, it's your turn to look at the bias. ???+ question "Model bias" diff --git a/docs/data-science/data/preprocessing.md b/docs/data-science/data/preprocessing.md index 33b8703e..6838c776 100644 --- a/docs/data-science/data/preprocessing.md +++ b/docs/data-science/data/preprocessing.md @@ -66,9 +66,10 @@ we provide a distilled version of the code from Again, we urge you to use a virtual environment which by now, should be second nature anyway. -???+ info "Create a new notebook" +???+ info - To follow along, create a new Jupyter notebook within your project. + To follow along, create a new script or Jupyter notebook within your + project. ## Missing values diff --git a/docs/data-science/practice/end-to-end.md b/docs/data-science/practice/end-to-end.md index bac5751b..d2d1bf2d 100644 --- a/docs/data-science/practice/end-to-end.md +++ b/docs/data-science/practice/end-to-end.md @@ -228,6 +228,20 @@ To get our prediction process working, we need to save all objects involved: - `encoder` - `forest` +???+ warning "Critical: Save ALL preprocessing objects!" + + **You must save every single object** used in the prediction pipeline, not + just the model! + + Missing even one object will break your predictions: + + - Missing `impute` → Cannot handle new missing values + - Missing `preprocessor` → Cannot transform features correctly + - Missing `encoder` → Cannot convert predictions back to original labels + - Missing `forest` → Cannot make predictions + + **The model is useless without its preprocessing pipeline!** :warning: + We can save all these objects in one file using a simple `#!python dict`: ```python @@ -242,6 +256,10 @@ with open("bank-model.pkl", "wb") as file: pickle.dump(model, file) ``` +Bundling all objects in a dictionary ensures you never accidentally +forget a component. When you load `bank-model.pkl`, you have **everything** +needed for predictions in one place. + ???+ question "Load the model" Create a new script or notebook which we will use to test the saved model.