Skip to content

Conversation

@Heiaha
Copy link

@Heiaha Heiaha commented Oct 14, 2025

This PR adds LightGBM + optuna optimization. It begins with an initial hyperparameter guess from long previous experiments, and does ten 5-fold stratified cross-validation experiments to do a short exploration of other hyperparameter sets. It still saves metrics and the final model artifact using mlflow.

@extreme4all
Copy link
Contributor

the LGBM looks great, and that optuna certainly seems like a clean way for model optimization i'll have to learn about that library.

currently the only deployed model is the multi_model, and for it to deploy we have defined a wrapper around the model could you make something similar?

https://github.com/Bot-detector/bot-detector-ml-training/blob/develop/src/multi_model/train.py#L154-L161

model = LGBMClassifier(random_state=42, **best_params)
model.fit(X, y)
mlflow.log_params(best_params)
mlflow.sklearn.log_model(model, artifact_path="refit_model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add some metrics for the refit_best model

@Heiaha
Copy link
Author

Heiaha commented Oct 16, 2025

Sounds good, I'll run some similar experiments on the multi_model and update the PR once done.

@Heiaha
Copy link
Author

Heiaha commented Oct 19, 2025

Introduced similar changes to the multimodel. I tried to make it so the deployment pipeline would need minimal changes. Let me know if anything here would break the larger deployment pipeline. @extreme4all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants