-
Notifications
You must be signed in to change notification settings - Fork 52
docs: add enterprise dashboard webhook trigger docs #2592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
c1e8df4
Docs: add webhook triggers documentation and cross-references
v0y4g3r aa64a67
Document webhook triggers require dashboard v0.2.0-alpha.10 or later
v0y4g3r affc921
Docs: add webhook triggers to v1.1 sidebar
v0y4g3r faf38ee
Docs: fix v1.1 console links
v0y4g3r File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| --- | ||
| keywords: [enterprise, management console, webhook triggers, resource usage, alerting, CPU, memory] | ||
| description: Configure enterprise dashboard webhook triggers to send notifications based on GreptimeDB cluster resource usage metrics. | ||
| --- | ||
|
|
||
| # Webhook Triggers | ||
|
|
||
| Webhook triggers monitor cluster resource usage metrics and send HTTP notifications when a configured threshold is reached. This is an enterprise-only feature and is available only when the enterprise dashboard is deployed with dashboard version `v0.2.0-alpha.10` or later. | ||
|
|
||
| Configure webhook triggers per provisioned instance under `settings.monitoring.webhook_triggers` in the dashboard apiserver configuration. Enabled webhook triggers require a metrics source, either `settings.monitoring.greptimedb.url` or `settings.monitoring.metrics.prometheus`. | ||
|
|
||
| ```yaml | ||
| provisionedInstances: | ||
| - name: mycluster | ||
| settings: | ||
| monitoring: | ||
| greptimedb: | ||
| url: http://monitoring-greptimedb:4000 | ||
| # Or use a Prometheus-compatible metrics source: | ||
| # metrics: | ||
| # prometheus: http://prometheus:9090 | ||
| webhook_triggers: | ||
| - name: high-datanode-memory | ||
| enabled: true | ||
| roles: [datanode] | ||
| metric: memory_usage_percent | ||
| operator: ">=" | ||
| threshold: 90 | ||
| cooldown_seconds: 300 | ||
| url: https://alerts.example.com/datanode-memory | ||
| headers: | ||
| Authorization: Bearer token | ||
| - name: high-frontend-cpu | ||
| enabled: true | ||
| roles: [frontend] | ||
| metric: cpu_usage_millicores | ||
| threshold: 1000 | ||
| cooldown_seconds: 600 | ||
| url: https://alerts.example.com/frontend-cpu | ||
| ``` | ||
|
|
||
| Webhook trigger configuration items: | ||
|
|
||
| - `name`: Trigger name. Required when `enabled` is `true`. The name must be unique within the instance and must not contain `/`. | ||
| - `enabled`: Enables or disables the trigger. | ||
| - `roles`: Optional role filter. Omit it or leave it empty to match all roles. Supported roles are `frontend`, `metasrv`, `datanode`, and `flownode`. | ||
| - `metric`: Resource usage metric to evaluate. Supported metrics are `memory_usage_percent`, `memory_usage_bytes`, `cpu_usage_percent`, and `cpu_usage_millicores`. | ||
| - `operator`: Comparison operator. The default and only supported value is `>=`. | ||
| - `threshold`: Threshold value. It must be greater than `0`. Percentage metrics must be less than or equal to `100`. | ||
| - `cooldown_seconds`: Minimum interval between repeated `firing` notifications for the same active alert. The default is `300` seconds. | ||
| - `url`: Webhook endpoint. Required when `enabled` is `true`; the URL must use `http://` or `https://`. | ||
| - `headers`: Optional custom HTTP headers, for example `Authorization`. The webhook client always sends `Content-Type: application/json`. | ||
|
|
||
| When a matching component crosses the threshold, the dashboard apiserver sends a `firing` payload. While the alert remains active, repeated `firing` payloads for the same instance, trigger, pod, and process start time are suppressed until `cooldown_seconds` elapses. When the metric drops below the threshold, the dashboard apiserver sends a `resolved` payload. | ||
|
|
||
| Webhook payloads use a fixed JSON schema and cannot be templated. A representative `firing` payload looks like this: | ||
|
|
||
| ```json | ||
| { | ||
| "status": "firing", | ||
| "trigger_name": "high-datanode-memory", | ||
| "metric": "memory_usage_percent", | ||
| "operator": ">=", | ||
| "threshold": 90, | ||
| "value": 91.2, | ||
| "instance": "ns_demo", | ||
| "cluster": "demo", | ||
| "namespace": "ns", | ||
| "pod": "demo-datanode-0", | ||
| "role": "datanode", | ||
| "app": "greptime-datanode", | ||
| "component_instance": "datanode-0", | ||
| "endpoint": "http://demo-datanode-0:4000", | ||
| "process_start_time_seconds": 1760000000, | ||
| "starts_at": "2026-06-23T10:00:00Z", | ||
| "ends_at": null, | ||
| "sent_at": "2026-06-23T10:00:00Z" | ||
| } | ||
| ``` | ||
|
|
||
| For `resolved` payloads, `status` is `resolved`, `value` is the below-threshold value that resolved the alert, and `ends_at` is set. | ||
|
|
||
| :::note | ||
| Webhook trigger state is kept in dashboard apiserver memory and is not durable. There is no durable retry queue. If the dashboard apiserver restarts, or if an instance, trigger, or pod series disappears, the corresponding state may be forgotten without sending `resolved`. Receivers that need hard guarantees should expire alerts on their side. | ||
| ::: | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
85 changes: 85 additions & 0 deletions
85
...ocusaurus-plugin-content-docs/current/enterprise/console-ui/webhook-triggers.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| --- | ||
| keywords: [企业版, 管理控制台, Webhook 触发器, 资源使用, 告警, CPU, 内存] | ||
| description: 在企业版 dashboard 中配置 webhook 触发器,基于 GreptimeDB 集群资源使用指标发送通知。 | ||
| --- | ||
|
|
||
| # Webhook 触发器 | ||
|
|
||
| Webhook 触发器会监控集群资源使用指标,并在达到配置阈值时发送 HTTP 通知。该功能仅属于企业版,并且只有在部署 dashboard 版本 `v0.2.0-alpha.10` 或更高版本的企业版 dashboard 时可用。 | ||
|
|
||
| 在 dashboard apiserver 配置中,按 provisioned instance 将 webhook 触发器配置到 `settings.monitoring.webhook_triggers` 下。启用 webhook 触发器需要配置指标数据源,即 `settings.monitoring.greptimedb.url` 或 `settings.monitoring.metrics.prometheus`。 | ||
|
|
||
| ```yaml | ||
| provisionedInstances: | ||
| - name: mycluster | ||
| settings: | ||
| monitoring: | ||
| greptimedb: | ||
| url: http://monitoring-greptimedb:4000 | ||
| # 也可以使用 Prometheus 兼容的指标数据源: | ||
| # metrics: | ||
| # prometheus: http://prometheus:9090 | ||
| webhook_triggers: | ||
| - name: high-datanode-memory | ||
| enabled: true | ||
| roles: [datanode] | ||
| metric: memory_usage_percent | ||
| operator: ">=" | ||
| threshold: 90 | ||
| cooldown_seconds: 300 | ||
| url: https://alerts.example.com/datanode-memory | ||
| headers: | ||
| Authorization: Bearer token | ||
| - name: high-frontend-cpu | ||
| enabled: true | ||
| roles: [frontend] | ||
| metric: cpu_usage_millicores | ||
| threshold: 1000 | ||
| cooldown_seconds: 600 | ||
| url: https://alerts.example.com/frontend-cpu | ||
| ``` | ||
|
|
||
| Webhook 触发器配置项: | ||
|
|
||
| - `name`:触发器名称。`enabled` 为 `true` 时必填。同一个 instance 内名称必须唯一,且不能包含 `/`。 | ||
| - `enabled`:启用或禁用该触发器。 | ||
| - `roles`:可选的角色过滤器。省略或留空表示匹配所有角色。支持的角色包括 `frontend`、`metasrv`、`datanode` 和 `flownode`。 | ||
| - `metric`:要检查的资源使用指标。支持 `memory_usage_percent`、`memory_usage_bytes`、`cpu_usage_percent` 和 `cpu_usage_millicores`。 | ||
| - `operator`:比较运算符。默认值和当前唯一支持的值都是 `>=`。 | ||
| - `threshold`:阈值,必须大于 `0`。百分比指标的阈值必须小于或等于 `100`。 | ||
| - `cooldown_seconds`:同一活跃告警重复发送 `firing` 通知的最小间隔,默认值为 `300` 秒。 | ||
| - `url`:Webhook 端点。`enabled` 为 `true` 时必填,并且必须使用 `http://` 或 `https://`。 | ||
| - `headers`:可选的自定义 HTTP header,例如 `Authorization`。Webhook 客户端总是会发送 `Content-Type: application/json`。 | ||
|
|
||
| 当匹配的组件指标越过阈值时,dashboard apiserver 会发送 `firing` payload。告警保持活跃期间,同一 instance、trigger、pod 和进程启动时间对应的重复 `firing` payload 会被抑制,直到超过 `cooldown_seconds`。当指标降到阈值以下时,dashboard apiserver 会发送 `resolved` payload。 | ||
|
|
||
| Webhook payload 使用固定 JSON 结构,暂不支持自定义模板。下面是一个代表性的 `firing` payload: | ||
|
|
||
| ```json | ||
| { | ||
| "status": "firing", | ||
| "trigger_name": "high-datanode-memory", | ||
| "metric": "memory_usage_percent", | ||
| "operator": ">=", | ||
| "threshold": 90, | ||
| "value": 91.2, | ||
| "instance": "ns_demo", | ||
| "cluster": "demo", | ||
| "namespace": "ns", | ||
| "pod": "demo-datanode-0", | ||
| "role": "datanode", | ||
| "app": "greptime-datanode", | ||
| "component_instance": "datanode-0", | ||
| "endpoint": "http://demo-datanode-0:4000", | ||
| "process_start_time_seconds": 1760000000, | ||
| "starts_at": "2026-06-23T10:00:00Z", | ||
| "ends_at": null, | ||
| "sent_at": "2026-06-23T10:00:00Z" | ||
| } | ||
| ``` | ||
|
|
||
| 对于 `resolved` payload,`status` 为 `resolved`,`value` 是使告警恢复的低于阈值的指标值,并且会设置 `ends_at`。 | ||
|
|
||
| :::note | ||
| Webhook 触发器状态保存在 dashboard apiserver 内存中,不具备持久性,也没有持久化重试队列。如果 dashboard apiserver 重启,或者 instance、trigger、pod 指标序列消失,对应状态可能会被遗忘且不会发送 `resolved`。需要强保证的接收端应自行设置告警过期机制。 | ||
| ::: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
85 changes: 85 additions & 0 deletions
85
...aurus-plugin-content-docs/version-1.1/enterprise/console-ui/webhook-triggers.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| --- | ||
| keywords: [企业版, 管理控制台, Webhook 触发器, 资源使用, 告警, CPU, 内存] | ||
| description: 在企业版 dashboard 中配置 webhook 触发器,基于 GreptimeDB 集群资源使用指标发送通知。 | ||
| --- | ||
|
|
||
| # Webhook 触发器 | ||
|
|
||
| Webhook 触发器会监控集群资源使用指标,并在达到配置阈值时发送 HTTP 通知。该功能仅属于企业版,并且只有在部署 dashboard 版本 `v0.2.0-alpha.10` 或更高版本的企业版 dashboard 时可用。 | ||
|
|
||
| 在 dashboard apiserver 配置中,按 provisioned instance 将 webhook 触发器配置到 `settings.monitoring.webhook_triggers` 下。启用 webhook 触发器需要配置指标数据源,即 `settings.monitoring.greptimedb.url` 或 `settings.monitoring.metrics.prometheus`。 | ||
|
|
||
| ```yaml | ||
| provisionedInstances: | ||
| - name: mycluster | ||
| settings: | ||
| monitoring: | ||
| greptimedb: | ||
| url: http://monitoring-greptimedb:4000 | ||
| # 也可以使用 Prometheus 兼容的指标数据源: | ||
| # metrics: | ||
| # prometheus: http://prometheus:9090 | ||
| webhook_triggers: | ||
| - name: high-datanode-memory | ||
| enabled: true | ||
| roles: [datanode] | ||
| metric: memory_usage_percent | ||
| operator: ">=" | ||
| threshold: 90 | ||
| cooldown_seconds: 300 | ||
| url: https://alerts.example.com/datanode-memory | ||
| headers: | ||
| Authorization: Bearer token | ||
| - name: high-frontend-cpu | ||
| enabled: true | ||
| roles: [frontend] | ||
| metric: cpu_usage_millicores | ||
| threshold: 1000 | ||
| cooldown_seconds: 600 | ||
| url: https://alerts.example.com/frontend-cpu | ||
| ``` | ||
|
|
||
| Webhook 触发器配置项: | ||
|
|
||
| - `name`:触发器名称。`enabled` 为 `true` 时必填。同一个 instance 内名称必须唯一,且不能包含 `/`。 | ||
| - `enabled`:启用或禁用该触发器。 | ||
| - `roles`:可选的角色过滤器。省略或留空表示匹配所有角色。支持的角色包括 `frontend`、`metasrv`、`datanode` 和 `flownode`。 | ||
| - `metric`:要检查的资源使用指标。支持 `memory_usage_percent`、`memory_usage_bytes`、`cpu_usage_percent` 和 `cpu_usage_millicores`。 | ||
| - `operator`:比较运算符。默认值和当前唯一支持的值都是 `>=`。 | ||
| - `threshold`:阈值,必须大于 `0`。百分比指标的阈值必须小于或等于 `100`。 | ||
| - `cooldown_seconds`:同一活跃告警重复发送 `firing` 通知的最小间隔,默认值为 `300` 秒。 | ||
| - `url`:Webhook 端点。`enabled` 为 `true` 时必填,并且必须使用 `http://` 或 `https://`。 | ||
| - `headers`:可选的自定义 HTTP header,例如 `Authorization`。Webhook 客户端总是会发送 `Content-Type: application/json`。 | ||
|
|
||
| 当匹配的组件指标越过阈值时,dashboard apiserver 会发送 `firing` payload。告警保持活跃期间,同一 instance、trigger、pod 和进程启动时间对应的重复 `firing` payload 会被抑制,直到超过 `cooldown_seconds`。当指标降到阈值以下时,dashboard apiserver 会发送 `resolved` payload。 | ||
|
|
||
| Webhook payload 使用固定 JSON 结构,暂不支持自定义模板。下面是一个代表性的 `firing` payload: | ||
|
|
||
| ```json | ||
| { | ||
| "status": "firing", | ||
| "trigger_name": "high-datanode-memory", | ||
| "metric": "memory_usage_percent", | ||
| "operator": ">=", | ||
| "threshold": 90, | ||
| "value": 91.2, | ||
| "instance": "ns_demo", | ||
| "cluster": "demo", | ||
| "namespace": "ns", | ||
| "pod": "demo-datanode-0", | ||
| "role": "datanode", | ||
| "app": "greptime-datanode", | ||
| "component_instance": "datanode-0", | ||
| "endpoint": "http://demo-datanode-0:4000", | ||
| "process_start_time_seconds": 1760000000, | ||
| "starts_at": "2026-06-23T10:00:00Z", | ||
| "ends_at": null, | ||
| "sent_at": "2026-06-23T10:00:00Z" | ||
| } | ||
| ``` | ||
|
|
||
| 对于 `resolved` payload,`status` 为 `resolved`,`value` 是使告警恢复的低于阈值的指标值,并且会设置 `ends_at`。 | ||
|
|
||
| :::note | ||
| Webhook 触发器状态保存在 dashboard apiserver 内存中,不具备持久性,也没有持久化重试队列。如果 dashboard apiserver 重启,或者 instance、trigger、pod 指标序列消失,对应状态可能会被遗忘且不会发送 `resolved`。需要强保证的接收端应自行设置告警过期机制。 | ||
| ::: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
85 changes: 85 additions & 0 deletions
85
versioned_docs/version-1.1/enterprise/console-ui/webhook-triggers.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| --- | ||
| keywords: [enterprise, management console, webhook triggers, resource usage, alerting, CPU, memory] | ||
| description: Configure enterprise dashboard webhook triggers to send notifications based on GreptimeDB cluster resource usage metrics. | ||
| --- | ||
|
|
||
| # Webhook Triggers | ||
|
|
||
| Webhook triggers monitor cluster resource usage metrics and send HTTP notifications when a configured threshold is reached. This is an enterprise-only feature and is available only when the enterprise dashboard is deployed with dashboard version `v0.2.0-alpha.10` or later. | ||
|
|
||
| Configure webhook triggers per provisioned instance under `settings.monitoring.webhook_triggers` in the dashboard apiserver configuration. Enabled webhook triggers require a metrics source, either `settings.monitoring.greptimedb.url` or `settings.monitoring.metrics.prometheus`. | ||
|
|
||
| ```yaml | ||
| provisionedInstances: | ||
| - name: mycluster | ||
| settings: | ||
| monitoring: | ||
| greptimedb: | ||
| url: http://monitoring-greptimedb:4000 | ||
| # Or use a Prometheus-compatible metrics source: | ||
| # metrics: | ||
| # prometheus: http://prometheus:9090 | ||
| webhook_triggers: | ||
| - name: high-datanode-memory | ||
| enabled: true | ||
| roles: [datanode] | ||
| metric: memory_usage_percent | ||
| operator: ">=" | ||
| threshold: 90 | ||
| cooldown_seconds: 300 | ||
| url: https://alerts.example.com/datanode-memory | ||
| headers: | ||
| Authorization: Bearer token | ||
| - name: high-frontend-cpu | ||
| enabled: true | ||
| roles: [frontend] | ||
| metric: cpu_usage_millicores | ||
| threshold: 1000 | ||
| cooldown_seconds: 600 | ||
| url: https://alerts.example.com/frontend-cpu | ||
| ``` | ||
|
|
||
| Webhook trigger configuration items: | ||
|
|
||
| - `name`: Trigger name. Required when `enabled` is `true`. The name must be unique within the instance and must not contain `/`. | ||
| - `enabled`: Enables or disables the trigger. | ||
| - `roles`: Optional role filter. Omit it or leave it empty to match all roles. Supported roles are `frontend`, `metasrv`, `datanode`, and `flownode`. | ||
| - `metric`: Resource usage metric to evaluate. Supported metrics are `memory_usage_percent`, `memory_usage_bytes`, `cpu_usage_percent`, and `cpu_usage_millicores`. | ||
| - `operator`: Comparison operator. The default and only supported value is `>=`. | ||
| - `threshold`: Threshold value. It must be greater than `0`. Percentage metrics must be less than or equal to `100`. | ||
| - `cooldown_seconds`: Minimum interval between repeated `firing` notifications for the same active alert. The default is `300` seconds. | ||
| - `url`: Webhook endpoint. Required when `enabled` is `true`; the URL must use `http://` or `https://`. | ||
| - `headers`: Optional custom HTTP headers, for example `Authorization`. The webhook client always sends `Content-Type: application/json`. | ||
|
|
||
| When a matching component crosses the threshold, the dashboard apiserver sends a `firing` payload. While the alert remains active, repeated `firing` payloads for the same instance, trigger, pod, and process start time are suppressed until `cooldown_seconds` elapses. When the metric drops below the threshold, the dashboard apiserver sends a `resolved` payload. | ||
|
|
||
| Webhook payloads use a fixed JSON schema and cannot be templated. A representative `firing` payload looks like this: | ||
|
|
||
| ```json | ||
| { | ||
| "status": "firing", | ||
| "trigger_name": "high-datanode-memory", | ||
| "metric": "memory_usage_percent", | ||
| "operator": ">=", | ||
| "threshold": 90, | ||
| "value": 91.2, | ||
| "instance": "ns_demo", | ||
| "cluster": "demo", | ||
| "namespace": "ns", | ||
| "pod": "demo-datanode-0", | ||
| "role": "datanode", | ||
| "app": "greptime-datanode", | ||
| "component_instance": "datanode-0", | ||
| "endpoint": "http://demo-datanode-0:4000", | ||
| "process_start_time_seconds": 1760000000, | ||
| "starts_at": "2026-06-23T10:00:00Z", | ||
| "ends_at": null, | ||
| "sent_at": "2026-06-23T10:00:00Z" | ||
| } | ||
| ``` | ||
|
|
||
| For `resolved` payloads, `status` is `resolved`, `value` is the below-threshold value that resolved the alert, and `ends_at` is set. | ||
|
|
||
| :::note | ||
| Webhook trigger state is kept in dashboard apiserver memory and is not durable. There is no durable retry queue. If the dashboard apiserver restarts, or if an instance, trigger, or pod series disappears, the corresponding state may be forgotten without sending `resolved`. Receivers that need hard guarantees should expire alerts on their side. | ||
| ::: |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.