Skip to content

Commit b184306

Browse files
authored
kserve traffic scenarios (#98)
* abn updates Signed-off-by: Michael Kalantar <[email protected]> * reviwer comments, kustomize, python Signed-off-by: Michael Kalantar <[email protected]> * bg and canary for kserve Signed-off-by: Michael Kalantar <[email protected]> * wording changes Signed-off-by: Michael Kalantar <[email protected]> * update kserve docs Signed-off-by: Michael Kalantar <[email protected]> * update kserver-modelmesh use cases Signed-off-by: Michael Kalantar <[email protected]> * fix spelling Signed-off-by: Michael Kalantar <[email protected]> * fix spelling Signed-off-by: Michael Kalantar <[email protected]> * use chart repo Signed-off-by: Michael Kalantar <[email protected]> * fix links to values file Signed-off-by: Michael Kalantar <[email protected]> * change version in install commands Signed-off-by: Michael Kalantar <[email protected]> * update links Signed-off-by: Michael Kalantar <[email protected]> * make link explicit Signed-off-by: Michael Kalantar <[email protected]> --------- Signed-off-by: Michael Kalantar <[email protected]>
1 parent 857ee38 commit b184306

File tree

20 files changed

+3648
-150
lines changed

20 files changed

+3648
-150
lines changed

.github/wordlist.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,3 +168,7 @@ gz
168168
xvf
169169
IMG
170170
mv
171+
appName
172+
src
173+
appType
174+
appVersions

docs/getting-started/install.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Install the latest stable release of the Iter8 CLI as follows.
22

33
```shell
4-
go install github.com/iter8-tools/iter8@v0.14
4+
go install github.com/iter8-tools/iter8@v0.15
55
```

docs/getting-started/installghaction.md

Lines changed: 0 additions & 7 deletions
This file was deleted.

docs/getting-started/installgoinstall.md

Lines changed: 0 additions & 5 deletions
This file was deleted.

docs/tutorials/abn/abn.md

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ data:
9292
EOF
9393
```
9494

95-
In this definition, each version of the application is composed of a `Service` and a `Deployment`. In the primary version, both are named `backend`. In any candidate version they are named `backend-candidate-1`. Iter8 uses this definition to identify when any of the versions of the application are available. It can then respond appropriate to `Lookup()` requests.
95+
In this definition, each version of the application is composed of a `Service` and a `Deployment`. In the primary version, both are named `backend`. In any candidate version they are named `backend-candidate-1`. Iter8 uses this definition to identify when any of the versions of the application are available. It can then respond appropriately to `Lookup()` requests.
9696

9797
## Generate load
9898

@@ -116,8 +116,8 @@ kubectl label deployment backend-candidate-1 iter8.tools/watch="true"
116116
kubectl expose deployment backend-candidate-1 --name=backend-candidate-1 --port=8091
117117
```
118118

119-
Until the candidate version is ready; that is, until all expected resources are deployed and available, calls to `Lookup()` will return only the index 0; the existing version.
120-
Once the candidate version is ready, `Lookup()` will return both indices (0 and 1) so that requests can be distributed across versions.
119+
Until the candidate version is ready; that is, until all expected resources are deployed and available, calls to `Lookup()` will return only the version number `0`; the existing version.
120+
Once the candidate version is ready, `Lookup()` will return both version numbers (`0` and `1`) so that requests can be distributed across versions.
121121

122122
## Compare versions using Grafana
123123

@@ -127,24 +127,20 @@ Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-
127127
kubectl port-forward service/grafana 3000:3000
128128
```
129129

130-
Open Grafana in a browser:
131-
132-
```shell
133-
http://localhost:3000/
134-
```
130+
Open Grafana in a browser by going to [http://localhost:3000](http://localhost:3000)
135131

136132
[Add a JSON API data source](http://localhost:3000/connections/datasources/marcusolsson-json-datasource) `Iter8` with:
137133

138-
- URL `http://iter8.default:8080/metrics` and
139-
- query string `application=default%2Fbackend`
134+
- URL: `http://iter8.default:8080/metrics`
135+
- Query string: `application=default%2Fbackend`
140136

141-
[Create a new dashboard](http://localhost:3000/dashboards) by *import*. Do so by pasting the contents of this [JSON definition](https://gist.githubusercontent.com/Alan-Cha/aa4ba259cc4631aafe9b43500502c60f/raw/034249f24e2c524ee4e326e860c06149ae7b2677/gistfile1.txt) into the box and *load* it. Associate it with the JSON API data source defined above.
137+
[Create a new dashboard](http://localhost:3000/dashboards) by *import*. Copy and paste the contents of this [JSON definition](https://gist.githubusercontent.com/Alan-Cha/aa4ba259cc4631aafe9b43500502c60f/raw/034249f24e2c524ee4e326e860c06149ae7b2677/gistfile1.txt) into the text box and *load* it. Associate it with the JSON API data source above.
142138

143139
The Iter8 dashboard allows you to compare the behavior of the two versions of the backend component against each other and select a winner. Since user requests are being sent by the load generation script, the values in the report may change over time. The Iter8 dashboard may look like the following:
144140

145141
![A/B dashboard](images/dashboard.png)
146142

147-
Once a winner is identified, the winner can be promoted, and the candidate version deleted.
143+
Once you identify a winner, it can be promoted, and the candidate version deleted.
148144

149145
## Promote candidate version
150146

docs/tutorials/deleteiter8controller.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,9 @@
11
=== "Helm"
2-
Delete the Iter8 controller using `helm` as follows.
3-
42
```shell
53
helm delete iter8
64
```
75

86
=== "Kustomize"
9-
Delete the Iter8 controller using `kustomize` as follows.
10-
117
=== "namespace scoped"
128
```shell
139
kubectl delete -k 'https://github.com/iter8-tools/iter8.git/kustomize/iter8/namespaceScoped?ref=v0.15.3'

docs/tutorials/installiter8controller.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
11
=== "Helm"
2-
Install the Iter8 controller using `helm` as follows.
3-
42
=== "namespace scoped"
53
```shell
64
helm install --repo https://iter8-tools.github.io/iter8 iter8 traffic
@@ -13,8 +11,6 @@
1311
```
1412

1513
=== "Kustomize"
16-
Install the Iter8 controller using `kustomize` as follows.
17-
1814
=== "namespace scoped"
1915
```shell
2016
kubectl apply -k 'https://github.com/iter8-tools/iter8.git/kustomize/iter8/namespaceScoped?ref=v0.15.3'

docs/tutorials/integrations/ghactions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ There are two ways that you can use Iter8 with GitHub Actions. You can [run Iter
88

99
# Use Iter8 in a GitHub Actions workflow
1010

11-
Install the latest version of the Iter8 CLI using `iter8-tools/iter8@v0.14`. Once installed, the Iter8 CLI can be used as documented in various tutorials. For example:
11+
Install the latest version of the Iter8 CLI using `iter8-tools/iter8@v0.15`. Once installed, the Iter8 CLI can be used as documented in various tutorials. For example:
1212

1313
```yaml linenums="1"
1414
- name: Install Iter8
15-
run: GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.14
15+
run: GOBIN=/usr/local/bin go install github.com/iter8-tools/iter8@v0.15
1616

1717
# Launch an experiment inside Kubernetes
1818
# This assumes that your Kubernetes cluster is accessible from the GitHub Actions pipeline

docs/tutorials/integrations/kserve-mm/blue-green.md

Lines changed: 64 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,31 @@ template: main.html
44

55
# Blue-Green Rollout of a ML Model
66

7-
This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring the network to distribute inference requests.
7+
This tutorial shows how Iter8 can be used to implement a blue-green rollout of ML models hosted in a KServe modelmesh serving environment. In a blue-green rollout, a percentage of inference requests are directed to a candidate version of the model. The remaining requests go to the primary, or initial, version of the model. Iter8 enables a blue-green rollout by automatically configuring routing resources to distribute inference requests.
88

9-
After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying network configuration.
9+
After a one time initialization step, the end user merely deploys candidate models, evaluates them, and either promotes or deletes them. Optionally, the end user can modify the percentage of inference requests being sent to the candidate model. Iter8 automatically handles all underlying routing configuration.
1010

1111
![Blue-Green rollout](images/blue-green.png)
1212

1313
In this tutorial, we use the Istio service mesh to distribute inference requests between different versions of a model.
1414

1515
???+ "Before you begin"
1616
1. Ensure that you have the [kubectl CLI](https://kubernetes.io/docs/reference/kubectl/).
17-
2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md) environment.
17+
2. Have access to a cluster running [KServe ModelMesh Serving](https://github.com/kserve/modelmesh-serving). For example, you can create a modelmesh-serving [Quickstart](https://github.com/kserve/modelmesh-serving/blob/release-0.11/docs/quickstart.md) environment. If using the Quickstart environment, change your default namespace to `modelmesh-serving`:
18+
```shell
19+
kubectl config set-context --current --namespace=modelmesh-serving
20+
```
1821
3. Install [Istio](https://istio.io). You can install the [demo profile](https://istio.io/latest/docs/setup/getting-started/).
1922

20-
## Install the Iter8 controller
23+
## Install Iter8
2124

2225
--8<-- "docs/tutorials/installiter8controller.md"
2326

24-
## Deploy a primary model
27+
## Initialize primary
2528

26-
Deploy the primary version of a model using an `InferenceService`:
29+
### Application
30+
31+
Deploy the primary version of the application. In this tutorial, the application is an ML model. Initialize the resources for the primary version of the model (`v0`) by deploying an `InferenceService` as follows:
2732

2833
```shell
2934
cat <<EOF | kubectl apply -f -
@@ -48,36 +53,36 @@ EOF
4853
```
4954

5055
??? note "About the primary `InferenceService`"
51-
Naming the model with the suffix `-0` (and the candidate with the suffix `-1`) simplifies the rollout initialization. However, any name can be specified.
56+
The base name (`wisdom`) and version (`v0`) are identified using the labels `app.kubernets.io/name` and `app.kubernets.io.version`, respectively. These labels are not required.
57+
58+
Naming the instance with the suffix `-0` (and the candidate with the suffix `-1`) simplifies the routing initialization (see below). However, any name can be specified.
5259

53-
The label `iter8.tools/watch: "true"` lets Iter8 know that it should pay attention to changes to this `InferenceService`.
60+
The label `iter8.tools/watch: "true"` is required. It lets Iter8 know that it should pay attention to changes to this application resource.
5461

55-
Inspect the deployed `InferenceService`:
62+
You can inspect the deployed `InferenceService`. When the `READY` field becomes `True`, the model is fully deployed.
5663

5764
```shell
5865
kubectl get inferenceservice wisdom-0
5966
```
60-
61-
When the `READY` field becomes `True`, the model is fully deployed.
6267

63-
## Initialize the Blue-Green routing policy
68+
### Routing
6469

6570
Initialize model rollout with a blue-green traffic pattern as follows:
6671

6772
```shell
68-
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/iter8 traffic-templates -f - | kubectl apply -f -
69-
templateName: initialize-rollout
70-
targetEnv: kserve-modelmesh
71-
trafficStrategy: blue-green
72-
modelName: wisdom
73+
cat <<EOF | helm template routing --repo https://iter8-tools.github.io/iter8 routing-actions -f - | kubectl apply -f -
74+
appType: kserve-modelmesh
75+
appName: wisdom
76+
action: initialize
77+
strategy: blue-green
7378
EOF
7479
```
7580

76-
The `initialize-rollout` template (with `trafficStrategy: blue-green`) configures the Istio service mesh to route all requests to the primary version of the model (`wisdom-0`). Further, it defines the routing policy that will be used by Iter8 when it observes changes in the models. By default, this routing policy splits inference requests 50-50 between the primary and candidate versions. For detailed configuration options, see the Helm chart.
81+
The `initialize` action (with strategy `blue-green`) configures the (Istio) service mesh to route all requests to the primary version of the application (`wisdom-0`). It further defines the routing policy that will be used when changes are observed in the application resources. By default, this routing policy splits requests 50-50 between the primary and candidate versions. For detailed configuration options, see the [Helm chart](https://github.com/iter8-tools/iter8/blob/v0.15.5/charts/routing-actions/values.yaml).
7782

78-
## Verify network configuration
83+
## Verify routing
7984

80-
To verify the network configuration, you can inspect the network configuration:
85+
To verify the routing configuration, you can inspect the `VirtualService`:
8186

8287
```shell
8388
kubectl get virtualservice -o yaml wisdom
@@ -88,7 +93,7 @@ To send inference requests to the model:
8893
=== "From within the cluster"
8994
1. Create a "sleep" pod in the cluster from which requests can be made:
9095
```shell
91-
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.14.3/samples/modelmesh-serving/sleep.sh | sh -
96+
curl -s https://raw.githubusercontent.com/iter8-tools/docs/v0.15.2/samples/modelmesh-serving/sleep.sh | sh -
9297
```
9398

9499
2. exec into the sleep pod:
@@ -111,21 +116,22 @@ To send inference requests to the model:
111116

112117
2. Download the proto file and a sample input:
113118
```shell
114-
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/kserve.proto
115-
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.13.18/samples/modelmesh-serving/grpc_input.json
119+
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.15.1/samples/modelmesh-serving/kserve.proto
120+
curl -sO https://raw.githubusercontent.com/iter8-tools/docs/v0.15.1/samples/modelmesh-serving/grpc_input.json
116121
```
117122

118123
3. Send inference requests:
119124
```shell
120125
cat grpc_input.json | \
121-
grpcurl -plaintext -proto kserve.proto -d @ \
126+
grpcurl -vv -plaintext -proto kserve.proto -d @ \
122127
-authority wisdom.modelmesh-serving \
123-
localhost:8080 inference.GRPCInferenceService.ModelInfer
128+
localhost:8080 inference.GRPCInferenceService.ModelInfer \
129+
| grep -e app-version
124130
```
125131

126-
Note that the model version responding to each inference request can be determined from the `modelName` field of the response.
132+
Note that the model version responding to each inference request is noted in the response header `app-version`. In the requests above, we display only this header.
127133

128-
## Deploy a candidate model
134+
## Deploy candidate
129135

130136
Deploy a candidate model using a second `InferenceService`:
131137

@@ -152,45 +158,43 @@ EOF
152158
```
153159

154160
??? note "About the candidate `InferenceService`"
155-
The model name (`wisdom`) and version (`v1`) are recorded using the labels `app.kubernets.io/name` and `app.kubernets.io.version`.
156-
157161
In this tutorial, the model source (field `spec.predictor.model.storageUri`) is the same as for the primary version of the model. In a real world example, this would be different.
158162

159-
## Verify network configuration changes
163+
## Verify routing changes
160164

161-
The deployment of the candidate model triggers an automatic reconfiguration by Iter8. Inspect the `VirtualService` to see that inference requests are now distributed between the primary model and the secondary model:
165+
The deployment of the candidate model triggers an automatic reconfiguration by Iter8. Inspect the `VirtualService` to see that the routing has been changed. Requests are now distributed between the primary and candidate:
162166

163167
```shell
164168
kubectl get virtualservice wisdom -o yaml
165169
```
166170

167-
Send additional inference requests as described above.
171+
You can send additional inference requests as described above. They will be handled by both versions of the model.
168172

169173
## Modify weights (optional)
170174

171-
You can modify the weight distribution of inference requests using the Iter8 `traffic-template` chart:
175+
You can modify the weight distribution of inference requests as follows:
172176

173177
```shell
174-
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/iter8 traffic-templates -f - | kubectl apply -f -
175-
templateName: modify-weights
176-
targetEnv: kserve-modelmesh
177-
trafficStrategy: blue-green
178-
modelName: wisdom
179-
modelVersions:
178+
cat <<EOF | helm template routing --repo https://iter8-tools.github.io/iter8 routing-actions -f - | kubectl apply -f -
179+
appType: kserve-modelmesh
180+
appName: wisdom
181+
action: modify-weights
182+
strategy: blue-green
183+
appVersions:
180184
- weight: 20
181185
- weight: 80
182186
EOF
183187
```
184188

185-
Note that using the `modify-weights` overrides the default traffic split for all future candidate deployments.
189+
Note that using the `modify-weights` action overrides the default traffic split for all future candidate deployments.
186190

187-
As above, you can verify the network configuration changes.
191+
As above, you can verify the routing changes.
188192

189-
## Promote the candidate model
193+
## Promote candidate
190194

191-
Promoting the candidate involves redefining the primary `InferenceService` using the new model and deleting the candidate `InferenceService`.
195+
Promoting the candidate involves redefining the primary version of the application and deleting the candidate version.
192196

193-
### Redefine the primary `InferenceService`
197+
### Redefine primary
194198

195199
```shell
196200
cat <<EOF | kubectl replace -f -
@@ -217,41 +221,43 @@ EOF
217221
??? note "What is different?"
218222
The version label (`app.kubernets.io/version`) was updated. In a real world example, `spec.predictor.model.storageUri` would also be updated.
219223

220-
### Delete the candidate `InferenceService`
224+
### Delete candidate
225+
226+
Once the primary `InferenceService` has been redeployed, delete the candidate:
221227

222228
```shell
223229
kubectl delete inferenceservice wisdom-1
224230
```
225231

226-
### Verify network configuration changes
232+
### Verify routing changes
227233

228234
Inspect the `VirtualService` to see that the it has been automatically reconfigured to send requests only to the primary model.
229235

230-
## Clean up
236+
## Cleanup
231237

232-
Delete the candidate model:
238+
If not already deleted, delete the candidate:
233239

234240
```shell
235-
kubectl delete --force isvc/wisdom-1
241+
kubectl delete isvc/wisdom-1
236242
```
237243

238-
Delete routing artifacts:
244+
Delete routing:
239245

240246
```shell
241-
cat <<EOF | helm template traffic --repo https://iter8-tools.github.io/iter8 traffic-templates -f - | kubectl delete --force -f -
242-
templateName: initialize-rollout
243-
targetEnv: kserve-modelmesh
244-
trafficStrategy: blue-green
245-
modelName: wisdom
247+
cat <<EOF | helm template routing --repo https://iter8-tools.github.io/iter8 routing-actions -f - | kubectl delete -f -
248+
appType: kserve-modelmesh
249+
appName: wisdom
250+
action: initialize
251+
strategy: blue-green
246252
EOF
247253
```
248254

249-
Delete the primary model:
255+
Delete primary:
250256

251257
```shell
252-
kubectl delete --force isvc/wisdom-0
258+
kubectl delete isvc/wisdom-0
253259
```
254260

255-
Uninstall the Iter8 controller:
261+
Uninstall Iter8:
256262

257263
--8<-- "docs/tutorials/deleteiter8controller.md"

0 commit comments

Comments
 (0)