|
| 1 | +--- |
| 2 | +template: main.html |
| 3 | +--- |
| 4 | + |
| 5 | +# Blue-Green Rollout of a ML Model |
| 6 | + |
| 7 | +This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring the network to distribute inference requests. |
| 8 | + |
| 9 | +After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying network configuration. |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +In this tutorial, we use the Istio service mesh to distribute inference requests between different versions of a model. |
| 14 | + |
| 15 | +???+ "Before you begin" |
| 16 | + 1. Ensure that you have the [kubectl CLI](https://kubernetes.io/docs/reference/kubectl/). |
| 17 | + 2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md) environment. |
| 18 | + 3. Install [Istio](https://istio.io). You can install the [demo profile](https://istio.io/latest/docs/setup/getting-started/). |
| 19 | + |
| 20 | +## Install the Iter8 controller |
| 21 | + |
| 22 | +--8<-- "docs/tutorials/integrations/kserve-mm/installiter8controller.md" |
| 23 | + |
| 24 | +## Deploy a primary model |
| 25 | + |
| 26 | +Deploy the primary version of a model using an `InferenceService`: |
| 27 | + |
| 28 | +```shell |
| 29 | +cat <<EOF | kubectl apply -f - |
| 30 | +apiVersion: "serving.kserve.io/v1beta1" |
| 31 | +kind: "InferenceService" |
| 32 | +metadata: |
| 33 | + name: wisdom-0 |
| 34 | + labels: |
| 35 | + app.kubernetes.io/name: wisdom |
| 36 | + app.kubernetes.io/version: v1 |
| 37 | + iter8.tools/watch: "true" |
| 38 | + annotations: |
| 39 | + serving.kserve.io/deploymentMode: ModelMesh |
| 40 | + serving.kserve.io/secretKey: localMinIO |
| 41 | +spec: |
| 42 | + predictor: |
| 43 | + model: |
| 44 | + modelFormat: |
| 45 | + name: sklearn |
| 46 | + storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib |
| 47 | +EOF |
| 48 | +``` |
| 49 | + |
| 50 | +??? note "About the primary `InferenceService`" |
| 51 | + Naming the model with the suffix `-0` (and the candidate with the suffix `-1`) simplifies the rollout initialization. However, any name can be specified. |
| 52 | + |
| 53 | + The label `iter8.tools/watch: "true"` lets Iter8 know that it should pay attention to changes to this `InferenceService`. |
| 54 | + |
| 55 | +Inspect the deployed `InferenceService`: |
| 56 | + |
| 57 | +```shell |
| 58 | +kubectl get inferenceservice wisdom-0 |
| 59 | +``` |
| 60 | + |
| 61 | +When the `READY` field becomes `True`, the model is fully deployed. |
| 62 | + |
| 63 | +## Initialize the Blue-Green routing policy |
| 64 | + |
| 65 | +Initialize model rollout with a blue-green traffic pattern as follows: |
| 66 | + |
| 67 | +```shell |
| 68 | +cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/hub traffic-templates -f - | kubectl apply -f - |
| 69 | +templateName: initialize-rollout |
| 70 | +targetEnv: kserve-modelmesh |
| 71 | +trafficStrategy: blue-green |
| 72 | +modelName: wisdom |
| 73 | +EOF |
| 74 | +``` |
| 75 | + |
| 76 | +The `initialize-rollout` template (with `trafficStrategy: blue-green`) configures the Istio service mesh to route all requests to the primary version of the model (`wisdom-0`). Further, it defines the routing policy that will be used by Iter8 when it observes changes in the models. By default, this routing policy splits inference requests 50-50 between the primary and candidate versions. For detailed configuration options, see the Helm chart. |
| 77 | + |
| 78 | +## Verify network configuration |
| 79 | + |
| 80 | +To verify the network configuration, you can inspect the network configuration: |
| 81 | + |
| 82 | +```shell |
| 83 | +kubectl get virtualservice -o yaml wisdom |
| 84 | +``` |
| 85 | + |
| 86 | +To send inference requests to the model: |
| 87 | + |
| 88 | +1. In a separate terminal, port-forward the ingress gateway: |
| 89 | + ```shell |
| 90 | + kubectl -n istio-system port-forward svc/istio-ingressgateway 8080:80 |
| 91 | + ``` |
| 92 | + |
| 93 | +2. Download the proto file and a sample input: |
| 94 | + ```shell |
| 95 | + curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/kserve.proto |
| 96 | + curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/grpc_input.json |
| 97 | + ``` |
| 98 | + |
| 99 | +3. Send inference requests: |
| 100 | + ```shell |
| 101 | + cat grpc_input.json | \ |
| 102 | + grpcurl -plaintext -proto kserve.proto -d @ \ |
| 103 | + -authority wisdom.modelmesh-serving \ |
| 104 | + localhost:8080 inference.GRPCInferenceService.ModelInfer |
| 105 | + ``` |
| 106 | + |
| 107 | +Note that the model version responding to each inference request can be determined from the `modelName` field of the response. |
| 108 | + |
| 109 | +## Deploy a candidate model |
| 110 | + |
| 111 | +Deploy a candidate model using a second `InferenceService`: |
| 112 | + |
| 113 | +```shell |
| 114 | +cat <<EOF | kubectl apply -f - |
| 115 | +apiVersion: "serving.kserve.io/v1beta1" |
| 116 | +kind: "InferenceService" |
| 117 | +metadata: |
| 118 | + name: wisdom-1 |
| 119 | + labels: |
| 120 | + app.kubernetes.io/name: wisdom |
| 121 | + app.kubernetes.io/version: v2 |
| 122 | + iter8.tools/watch: "true" |
| 123 | + annotations: |
| 124 | + serving.kserve.io/deploymentMode: ModelMesh |
| 125 | + serving.kserve.io/secretKey: localMinIO |
| 126 | +spec: |
| 127 | + predictor: |
| 128 | + model: |
| 129 | + modelFormat: |
| 130 | + name: sklearn |
| 131 | + storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib |
| 132 | +EOF |
| 133 | +``` |
| 134 | + |
| 135 | +??? note "About the candidate `InferenceService`" |
| 136 | + The model name (`wisdom`) and version (`v2`) are recorded using the labels `app.kubernets.io/name` and `app.kubernets.io.version`. |
| 137 | + |
| 138 | + In this tutorial, the model source (field `spec.predictor.model.storageUri`) is the same as for the primary version of the model. In a real world example, this would be different. |
| 139 | + |
| 140 | +## Verify network configuration changes |
| 141 | + |
| 142 | +The deployment of the candidate model triggers an automatic reconfiguration by Iter8. Inspect the `VirtualService` to see that inference requests are now distributed between the primary model and the secondary model: |
| 143 | + |
| 144 | +```shell |
| 145 | +kubectl get virtualservice wisdom -o yaml |
| 146 | +``` |
| 147 | + |
| 148 | +Send additional inference requests as described above. |
| 149 | + |
| 150 | +## Modify weights (optional) |
| 151 | + |
| 152 | +You can modify the weight distribution of inference requests using the Iter8 `traffic-template` chart: |
| 153 | + |
| 154 | +```shell |
| 155 | +cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/hub traffic-templates -f - | kubectl apply -f - |
| 156 | +templateName: modify-weights |
| 157 | +targetEnv: kserve-modelmesh |
| 158 | +trafficStrategy: blue-green |
| 159 | +modelName: wisdom |
| 160 | +modelVersions: |
| 161 | + - weight: 20 |
| 162 | + - weight: 80 |
| 163 | +EOF |
| 164 | +``` |
| 165 | + |
| 166 | +Note that using the `modify-weights` overrides the default traffic split for all future candidate deployments. |
| 167 | + |
| 168 | +As above, you can verify the network configuration changes. |
| 169 | + |
| 170 | +## Promote the candidate model |
| 171 | + |
| 172 | +Promoting the candidate involves redefining the primary `InferenceService` using the new model and deleting the candidate `InferenceService`. |
| 173 | + |
| 174 | +### Redefine the primary `InferenceService` |
| 175 | + |
| 176 | +```shell |
| 177 | +cat <<EOF | kubectl replace -f - |
| 178 | +apiVersion: "serving.kserve.io/v1beta1" |
| 179 | +kind: "InferenceService" |
| 180 | +metadata: |
| 181 | + name: wisdom-0 |
| 182 | + namespace: modelmesh-serving |
| 183 | + labels: |
| 184 | + app.kubernetes.io/name: wisdom |
| 185 | + app.kubernetes.io/version: v2 |
| 186 | + iter8.tools/watch: "true" |
| 187 | + annotations: |
| 188 | + serving.kserve.io/deploymentMode: ModelMesh |
| 189 | + serving.kserve.io/secretKey: localMinIO |
| 190 | +spec: |
| 191 | + predictor: |
| 192 | + model: |
| 193 | + modelFormat: |
| 194 | + name: sklearn |
| 195 | + storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib |
| 196 | +EOF |
| 197 | +``` |
| 198 | + |
| 199 | +??? note "What is different?" |
| 200 | + The version label (`app.kubernets.io/version`) was updated. In a real world example, `spec.predictor.model.storageUri` would also be updated. |
| 201 | + |
| 202 | +### Delete the candidate `InferenceService` |
| 203 | + |
| 204 | +```shell |
| 205 | +kubectl delete inferenceservice wisdom-1 |
| 206 | +``` |
| 207 | + |
| 208 | +### Verify network configuration changes |
| 209 | + |
| 210 | +Inspect the `VirtualService` to see that the it has been automatically reconfigured to send requests only to the primary model. |
| 211 | + |
| 212 | +## Clean up |
| 213 | + |
| 214 | +Delete the candidate model: |
| 215 | + |
| 216 | +```shell |
| 217 | +kubectl delete --force isvc/wisdom-1 |
| 218 | +``` |
| 219 | + |
| 220 | +Delete routing artifacts: |
| 221 | + |
| 222 | +```shell |
| 223 | +cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/hub traffic-templates -f - | kubectl delete --force -f - |
| 224 | +templateName: initialize-rollout |
| 225 | +targetEnv: kserve-modelmesh |
| 226 | +trafficStrategy: blue-green |
| 227 | +modelName: wisdom |
| 228 | +EOF |
| 229 | +``` |
| 230 | + |
| 231 | +Delete the primary model: |
| 232 | + |
| 233 | +```shell |
| 234 | +kubectl delete --force isvc/wisdom-0 |
| 235 | +``` |
| 236 | + |
| 237 | +Uninstall the Iter8 controller: |
| 238 | + |
| 239 | +--8<-- "docs/tutorials/integrations/kserve-mm/deleteiter8controller.md" |
0 commit comments