Skip to content

Commit c50091f

Browse files
authored
KServe modelmesh traffic engineering use cases (#79)
* update workflow actions Signed-off-by: Michael Kalantar <[email protected]> * initial mm scenarios Signed-off-by: Michael Kalantar <[email protected]> * update blue-green Signed-off-by: Michael Kalantar <[email protected]> * update text Signed-off-by: Michael Kalantar <[email protected]> * add canary and mirror instructions Signed-off-by: Michael Kalantar <[email protected]> * consolidate todos Signed-off-by: Michael Kalantar <[email protected]> * fewer template calls Signed-off-by: Michael Kalantar <[email protected]> * new versions of mm traffic engineering tutorials Signed-off-by: Michael Kalantar <[email protected]> * updated docs Signed-off-by: Michael Kalantar <[email protected]> * update wordlist Signed-off-by: Michael Kalantar <[email protected]> * fix spelling Signed-off-by: Michael Kalantar <[email protected]> * minor updates Signed-off-by: Michael Kalantar <[email protected]> * helm or kustomize install/delete for controller Signed-off-by: Michael Kalantar <[email protected]> * blue-green picture Signed-off-by: Michael Kalantar <[email protected]> * canary, mirror pictures Signed-off-by: Michael Kalantar <[email protected]> * update images Signed-off-by: Michael Kalantar <[email protected]> * update images Signed-off-by: Michael Kalantar <[email protected]> * update canary picture Signed-off-by: Michael Kalantar <[email protected]> * fix spelling Signed-off-by: Michael Kalantar <[email protected]> * update artifact references Signed-off-by: Michael Kalantar <[email protected]> * fix links Signed-off-by: Michael Kalantar <[email protected]> * fix wordlist Signed-off-by: Michael Kalantar <[email protected]> * remove sleep pod Signed-off-by: Michael Kalantar <[email protected]> * wordsmithing Signed-off-by: Michael Kalantar <[email protected]> --------- Signed-off-by: Michael Kalantar <[email protected]>
1 parent 6988f2d commit c50091f

File tree

14 files changed

+6902
-2
lines changed

14 files changed

+6902
-2
lines changed

.github/wordlist.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ LitmusChaos
5050
localhost
5151
minikube
5252
MLOps
53+
modelmesh
5354
namespace
5455
namespaces
5556
NewRelic
@@ -89,8 +90,10 @@ auth
8990
argoproj
9091
custommetrics
9192
ctx
93+
deleteiter
9294
dev
9395
encodedmetric
96+
execintosleep
9497
expreport
9598
failured
9699
getRecommendation
@@ -102,7 +105,9 @@ golang
102105
GoVersion
103106
installbrewbins
104107
installghaction
108+
installiter
105109
ksvc
110+
kustomize
106111
lastupdatetime
107112
lifecycle
108113
linenums
@@ -124,6 +129,7 @@ contentType
124129
numRequests
125130
payloadStr
126131
payloadURL
132+
proto
127133
qps
128134
bool
129135
payloadTemplateURL

.github/workflows/abn-sample.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ jobs:
3232
with:
3333
username: ${{ secrets.DOCKERHUB_USERNAME }}
3434
password: ${{ secrets.DOCKERHUB_SECRET }}
35-
- uses: docker/build-push-action@v3
35+
- uses: docker/build-push-action@v4
3636
with:
3737
platforms: linux/amd64,linux/arm64
3838
tags: ${{ env.OWNER }}/abn-sample-backend:${{ env.VERSION }}-v1,${{ env.OWNER }}/abn-sample-backend:${{ env.VERSION }}-v2,${{ env.OWNER }}/abn-sample-backend:${{ env.MAJOR_MINOR_VERSION }}-v1,${{ env.OWNER }}/abn-sample-backend:${{ env.MAJOR_MINOR_VERSION }}-v2,${{ env.OWNER }}/abn-sample-backend:latest
@@ -67,7 +67,7 @@ jobs:
6767
with:
6868
username: ${{ secrets.DOCKERHUB_USERNAME }}
6969
password: ${{ secrets.DOCKERHUB_SECRET }}
70-
- uses: docker/build-push-action@v3
70+
- uses: docker/build-push-action@v4
7171
with:
7272
platforms: linux/amd64,linux/arm64
7373
tags: ${{ env.OWNER }}/abn-sample-frontend-${{ matrix.lang }}:${{ env.VERSION }},${{ env.OWNER }}/abn-sample-frontend-${{ matrix.lang }}:${{ env.MAJOR_MINOR_VERSION }},${{ env.OWNER }}/abn-sample-frontend-${{ matrix.lang }}:latest
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
---
2+
template: main.html
3+
---
4+
5+
# Blue-Green Rollout of a ML Model
6+
7+
This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring the network to distribute inference requests.
8+
9+
After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying network configuration.
10+
11+
![Blue-Green rollout](images/blue-green.png)
12+
13+
In this tutorial, we use the Istio service mesh to distribute inference requests between different versions of a model.
14+
15+
???+ "Before you begin"
16+
1. Ensure that you have the [kubectl CLI](https://kubernetes.io/docs/reference/kubectl/).
17+
2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md) environment.
18+
3. Install [Istio](https://istio.io). You can install the [demo profile](https://istio.io/latest/docs/setup/getting-started/).
19+
20+
## Install the Iter8 controller
21+
22+
--8<-- "docs/tutorials/integrations/kserve-mm/installiter8controller.md"
23+
24+
## Deploy a primary model
25+
26+
Deploy the primary version of a model using an `InferenceService`:
27+
28+
```shell
29+
cat <<EOF | kubectl apply -f -
30+
apiVersion: "serving.kserve.io/v1beta1"
31+
kind: "InferenceService"
32+
metadata:
33+
name: wisdom-0
34+
labels:
35+
app.kubernetes.io/name: wisdom
36+
app.kubernetes.io/version: v1
37+
iter8.tools/watch: "true"
38+
annotations:
39+
serving.kserve.io/deploymentMode: ModelMesh
40+
serving.kserve.io/secretKey: localMinIO
41+
spec:
42+
predictor:
43+
model:
44+
modelFormat:
45+
name: sklearn
46+
storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib
47+
EOF
48+
```
49+
50+
??? note "About the primary `InferenceService`"
51+
Naming the model with the suffix `-0` (and the candidate with the suffix `-1`) simplifies the rollout initialization. However, any name can be specified.
52+
53+
The label `iter8.tools/watch: "true"` lets Iter8 know that it should pay attention to changes to this `InferenceService`.
54+
55+
Inspect the deployed `InferenceService`:
56+
57+
```shell
58+
kubectl get inferenceservice wisdom-0
59+
```
60+
61+
When the `READY` field becomes `True`, the model is fully deployed.
62+
63+
## Initialize the Blue-Green routing policy
64+
65+
Initialize model rollout with a blue-green traffic pattern as follows:
66+
67+
```shell
68+
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/hub traffic-templates -f - | kubectl apply -f -
69+
templateName: initialize-rollout
70+
targetEnv: kserve-modelmesh
71+
trafficStrategy: blue-green
72+
modelName: wisdom
73+
EOF
74+
```
75+
76+
The `initialize-rollout` template (with `trafficStrategy: blue-green`) configures the Istio service mesh to route all requests to the primary version of the model (`wisdom-0`). Further, it defines the routing policy that will be used by Iter8 when it observes changes in the models. By default, this routing policy splits inference requests 50-50 between the primary and candidate versions. For detailed configuration options, see the Helm chart.
77+
78+
## Verify network configuration
79+
80+
To verify the network configuration, you can inspect the network configuration:
81+
82+
```shell
83+
kubectl get virtualservice -o yaml wisdom
84+
```
85+
86+
To send inference requests to the model:
87+
88+
1. In a separate terminal, port-forward the ingress gateway:
89+
```shell
90+
kubectl -n istio-system port-forward svc/istio-ingressgateway 8080:80
91+
```
92+
93+
2. Download the proto file and a sample input:
94+
```shell
95+
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/kserve.proto
96+
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/grpc_input.json
97+
```
98+
99+
3. Send inference requests:
100+
```shell
101+
cat grpc_input.json | \
102+
grpcurl -plaintext -proto kserve.proto -d @ \
103+
-authority wisdom.modelmesh-serving \
104+
localhost:8080 inference.GRPCInferenceService.ModelInfer
105+
```
106+
107+
Note that the model version responding to each inference request can be determined from the `modelName` field of the response.
108+
109+
## Deploy a candidate model
110+
111+
Deploy a candidate model using a second `InferenceService`:
112+
113+
```shell
114+
cat <<EOF | kubectl apply -f -
115+
apiVersion: "serving.kserve.io/v1beta1"
116+
kind: "InferenceService"
117+
metadata:
118+
name: wisdom-1
119+
labels:
120+
app.kubernetes.io/name: wisdom
121+
app.kubernetes.io/version: v2
122+
iter8.tools/watch: "true"
123+
annotations:
124+
serving.kserve.io/deploymentMode: ModelMesh
125+
serving.kserve.io/secretKey: localMinIO
126+
spec:
127+
predictor:
128+
model:
129+
modelFormat:
130+
name: sklearn
131+
storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib
132+
EOF
133+
```
134+
135+
??? note "About the candidate `InferenceService`"
136+
The model name (`wisdom`) and version (`v2`) are recorded using the labels `app.kubernets.io/name` and `app.kubernets.io.version`.
137+
138+
In this tutorial, the model source (field `spec.predictor.model.storageUri`) is the same as for the primary version of the model. In a real world example, this would be different.
139+
140+
## Verify network configuration changes
141+
142+
The deployment of the candidate model triggers an automatic reconfiguration by Iter8. Inspect the `VirtualService` to see that inference requests are now distributed between the primary model and the secondary model:
143+
144+
```shell
145+
kubectl get virtualservice wisdom -o yaml
146+
```
147+
148+
Send additional inference requests as described above.
149+
150+
## Modify weights (optional)
151+
152+
You can modify the weight distribution of inference requests using the Iter8 `traffic-template` chart:
153+
154+
```shell
155+
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/hub traffic-templates -f - | kubectl apply -f -
156+
templateName: modify-weights
157+
targetEnv: kserve-modelmesh
158+
trafficStrategy: blue-green
159+
modelName: wisdom
160+
modelVersions:
161+
- weight: 20
162+
- weight: 80
163+
EOF
164+
```
165+
166+
Note that using the `modify-weights` overrides the default traffic split for all future candidate deployments.
167+
168+
As above, you can verify the network configuration changes.
169+
170+
## Promote the candidate model
171+
172+
Promoting the candidate involves redefining the primary `InferenceService` using the new model and deleting the candidate `InferenceService`.
173+
174+
### Redefine the primary `InferenceService`
175+
176+
```shell
177+
cat <<EOF | kubectl replace -f -
178+
apiVersion: "serving.kserve.io/v1beta1"
179+
kind: "InferenceService"
180+
metadata:
181+
name: wisdom-0
182+
namespace: modelmesh-serving
183+
labels:
184+
app.kubernetes.io/name: wisdom
185+
app.kubernetes.io/version: v2
186+
iter8.tools/watch: "true"
187+
annotations:
188+
serving.kserve.io/deploymentMode: ModelMesh
189+
serving.kserve.io/secretKey: localMinIO
190+
spec:
191+
predictor:
192+
model:
193+
modelFormat:
194+
name: sklearn
195+
storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib
196+
EOF
197+
```
198+
199+
??? note "What is different?"
200+
The version label (`app.kubernets.io/version`) was updated. In a real world example, `spec.predictor.model.storageUri` would also be updated.
201+
202+
### Delete the candidate `InferenceService`
203+
204+
```shell
205+
kubectl delete inferenceservice wisdom-1
206+
```
207+
208+
### Verify network configuration changes
209+
210+
Inspect the `VirtualService` to see that the it has been automatically reconfigured to send requests only to the primary model.
211+
212+
## Clean up
213+
214+
Delete the candidate model:
215+
216+
```shell
217+
kubectl delete --force isvc/wisdom-1
218+
```
219+
220+
Delete routing artifacts:
221+
222+
```shell
223+
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/hub traffic-templates -f - | kubectl delete --force -f -
224+
templateName: initialize-rollout
225+
targetEnv: kserve-modelmesh
226+
trafficStrategy: blue-green
227+
modelName: wisdom
228+
EOF
229+
```
230+
231+
Delete the primary model:
232+
233+
```shell
234+
kubectl delete --force isvc/wisdom-0
235+
```
236+
237+
Uninstall the Iter8 controller:
238+
239+
--8<-- "docs/tutorials/integrations/kserve-mm/deleteiter8controller.md"

0 commit comments

Comments
 (0)