Skip to content

Commit a707c2c

Browse files
authored
Rewards (#93)
* experiment with reward Signed-off-by: Michael Kalantar <[email protected]> * update index Signed-off-by: Michael Kalantar <[email protected]> * wordsmith and spelling Signed-off-by: Michael Kalantar <[email protected]> * wordsmith and spelling Signed-off-by: Michael Kalantar <[email protected]> * mockoon configuration Signed-off-by: Michael Kalantar <[email protected]> * udpate reference in rewards Signed-off-by: Michael Kalantar <[email protected]> * update links Signed-off-by: Michael Kalantar <[email protected]> * add explanation Signed-off-by: Michael Kalantar <[email protected]> --------- Signed-off-by: Michael Kalantar <[email protected]>
1 parent b405009 commit a707c2c

File tree

4 files changed

+121
-1
lines changed

4 files changed

+121
-1
lines changed

.github/wordlist.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,9 @@ LitmusChaos
5050
localhost
5151
minikube
5252
MLOps
53+
mockoon
5354
modelmesh
55+
msec
5456
namespace
5557
namespaces
5658
NewRelic

docs/tutorials/abn/rewards.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
template: main.html
3+
---
4+
5+
# A/B/n Experiments with Rewards
6+
7+
This tutorial describes how to use Iter8 to evaluate two or more versions on an application or ML model to identify the "best" version according to some reward metric(s).
8+
9+
A reward metric is a metric that measures the benefit or profit of a version of an application or ML model. Reward metrics are usually application or model specific. User engagement, sales, and net profit are examples.
10+
11+
## Assumptions
12+
13+
We assume that you have deployed multiple versions of an application (or ML model) with the following characteristics:
14+
15+
- There is a way to route user traffic to the deployed versions. This might be done using the Iter8 SDK, the Iter8 traffic control features, or some other mechanism.
16+
- Metrics, including reward metrics, are being exported to a metrics store such as Prometheus.
17+
- Metrics can be retrieved from the metrics store by application (model) version.
18+
19+
In this tutorial, we mock a Prometheus service and demonstrate how to write an Iter8 experiment that evaluates reward metrics.
20+
21+
## Mock Prometheus
22+
23+
For simplicity, we use [mockoon](https://mockoon.com/) to create a mocked Prometheus service instead of deploying Prometheus itself:
24+
25+
```shell
26+
kubectl create deploy prometheus-mock \
27+
--image mockoon/cli:latest \
28+
--port 9090 \
29+
-- mockoon-cli start --daemon-off \
30+
--port 9090 \
31+
--data https://raw.githubusercontent.com/kalantar/docs/rewards/samples/abn/model-prometheus-abn-tutorial.json
32+
kubectl expose deploy prometheus-mock --port 9090
33+
```
34+
35+
## Define template
36+
37+
Create a [_provider specification_](../../user-guide/tasks/custommetrics.md#provider-spec) that describes how Iter8 should fetch each metric value from the metrics store. The specification provides information about the provider URL, the HTTP method to be used, and any common headers. Furthermore, for each metric, there is:
38+
- metadata, such as name, type and description,
39+
- HTTP query parameters, and
40+
- a jq expression describing how to extract the metric value from the response.
41+
42+
For example, a specification for the mean latency metric from Prometheus can look like the following:
43+
44+
```
45+
metric:
46+
- name: latency-mean
47+
type: gauge
48+
description: |
49+
Mean latency
50+
params:
51+
- name: query
52+
value: |
53+
(sum(last_over_time(revision_app_request_latencies_sum{
54+
{{- template "labels" . }}
55+
}[{{ .elapsedTimeSeconds }}s])) or on() vector(0))/(sum(last_over_time(revision_app_request_latencies_count{
56+
{{- template "labels" . }}
57+
}[{{ .elapsedTimeSeconds }}s])) or on() vector(0))
58+
jqExpression: .data.result[0].value[1] | tonumber
59+
```
60+
61+
Note that the template is parameterized. Values are provided by the Iter8 experiment at run time.
62+
63+
A sample provider specification for Prometheus is provided [here](https://gist.githubusercontent.com/kalantar/80c9efc0fd4cc34572d893cc82bdc4d2/raw/f3629aa62cdc9fd7e39ee2b6b113a8bf7b6b4463/model-prometheus-abn-tutorial.tpl).
64+
65+
It describes the following metrics:
66+
67+
- request-count
68+
- latency-mean
69+
- profit-mean
70+
71+
## Launch experiment
72+
73+
```shell
74+
iter8 k launch \
75+
--set "tasks={custommetrics,assess}" \
76+
--set custommetrics.templates.model-prometheus="https://gist.githubusercontent.com/kalantar/80c9efc0fd4cc34572d893cc82bdc4d2/raw/f3629aa62cdc9fd7e39ee2b6b113a8bf7b6b4463/model-prometheus-abn-tutorial.tpl" \
77+
--set custommetrics.values.labels.model_name=wisdom \
78+
--set 'custommetrics.versionValues[0].labels.mm_vmodel_id=wisdom-1' \
79+
--set 'custommetrics.versionValues[1].labels.mm_vmodel_id=wisdom-2' \
80+
--set assess.SLOs.upper.model-prometheus/latency-mean=50 \
81+
--set "assess.rewards.max={model-prometheus/profit-mean}" \
82+
--set runner=cronjob \
83+
--set cronjobSchedule="*/1 * * * *"
84+
```
85+
86+
This experiment executes in a [loop](../../user-guide/topics/parameters.md), once every minute. It uses the [`custommetrics` task](../../user-guide/tasks/custommetrics.md) to read metrics from the (mocked) Prometheus provider. Finally, the [`assess` task](../../user-guide/tasks/assess.md) verifies that the `latency-mean` is below 50 msec and identifies which version provides the greatest reward; that is, the greatest mean profit.
87+
88+
## Inspect experiment report
89+
90+
=== "Text"
91+
```shell
92+
iter8 k report
93+
```
94+
=== "HTML"
95+
```shell
96+
iter8 k report -o html > report.html # view in a browser
97+
```
98+
99+
Because the experiment loops, the reported results will change over time.
100+
101+
***
102+
103+
## Cleanup
104+
105+
Delete the experiment:
106+
107+
```shell
108+
iter8 k delete
109+
```
110+
111+
Terminate the mocked Prometheus service:
112+
113+
```shell
114+
kubectl delete deploy/prometheus-mock svc/prometheus-mock
115+
```

mkdocs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,9 @@ nav:
122122
- Load test gRPC with SLOs: tutorials/load-test-grpc.md
123123
- Load test multiple gRPC methods: tutorials/load-test-grpc-multiple.md
124124
- Chaos injection with SLOs: tutorials/chaos/slo-validation-chaos.md
125-
- A/B experiments: tutorials/abn/abn.md
125+
- A/B experiments:
126+
- Iter8 SDK: tutorials/abn/abn.md
127+
- Evaluating rewards: tutorials/abn/rewards.md
126128
- Automated experiments: tutorials/autox/autox.md
127129
- Custom metrics:
128130
- One version: tutorials/custom-metrics/one-version.md
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"uuid":"010a623b-dcbe-499c-a964-5501b725e663","lastMigration":25,"name":"Prometheus (model)","endpointPrefix":"api/v1/","latency":0,"port":9090,"hostname":"0.0.0.0","folders":[],"routes":[{"uuid":"387e3484-79f3-4844-8228-4cc2700a24d6","documentation":"","method":"get","endpoint":"query","responses":[{"uuid":"dc1c57ee-fe48-47f3-846e-8f67a9ac38e8","body":"{\n \"response\": \"wisdom-1: request-count\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 0 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: request-count","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"fa57be05-b2b1-4284-bf21-7d7a8fc3c779","body":"{\n \"response\": \"wisdom-1: request-count\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 0 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: request-count","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"2e36070b-162b-4af5-81c6-0df83ab2503c","body":"{\n \"response\": \"v1: latency-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ float 0 50 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: latency-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"model_request_count","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"\\)\\s*/\\s*\\(","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"9e7e7ef3-7aad-46bd-a469-2bed8c90917f","body":"{\n \"response\": \"v2: latency-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ float 0 50 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: latency-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"model_request_latencies_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"\\)\\s*/\\s*\\(","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"00e55214-d6f6-414a-8b52-10b202fef479","body":"{\n \"response\": \"v1: profit-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 10 80 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-1: profit-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"profit_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-1","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"e2a07264-2c5e-4877-993b-750296a31dab","body":"{\n \"response\": \"v2: profit-mean\",\n \"status\":\"success\",\n \"data\": {\n \"resultType\": \"vector\",\n \"result\": [\n {\n \"metric\":{},\n \"value\": [\n {{ divide (now 'T') 1000 }},\n \"{{ int 5 100 }}\"\n ]\n }]\n }\n}","latency":0,"statusCode":200,"label":"wisdom-2: profit-mean","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[{"target":"query","modifier":"query","value":"profit_sum","invert":false,"operator":"regex"},{"target":"query","modifier":"query","value":"wisdom-2","invert":false,"operator":"regex"}],"rulesOperator":"AND","disableTemplating":false,"fallbackTo404":false,"default":false},{"uuid":"785190e8-3e45-4e7f-9352-fe8e06a4928b","body":"{\n \"response\": \"unable to identify query\"\n \"query\": \"{{ queryParam 'query' }}\",\n}","latency":0,"statusCode":400,"label":"unmatched query","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[],"rulesOperator":"OR","disableTemplating":false,"fallbackTo404":false,"default":true},{"uuid":"566f29dc-0bff-4fa9-8449-fb4b37e8f6df","body":"{}","latency":0,"statusCode":200,"label":"","headers":[],"bodyType":"INLINE","filePath":"","databucketID":"","sendFileAsBody":false,"rules":[],"rulesOperator":"OR","disableTemplating":false,"fallbackTo404":false,"default":false}],"enabled":true,"responseMode":null}],"rootChildren":[{"type":"route","uuid":"387e3484-79f3-4844-8228-4cc2700a24d6"}],"proxyMode":false,"proxyHost":"","proxyRemovePrefix":false,"tlsOptions":{"enabled":false,"type":"CERT","pfxPath":"","certPath":"","keyPath":"","caPath":"","passphrase":""},"cors":true,"headers":[{"key":"Content-Type","value":"application/json"}],"proxyReqHeaders":[{"key":"","value":""}],"proxyResHeaders":[{"key":"","value":""}],"data":[]}

0 commit comments

Comments
 (0)