Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/enterprise/console-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,4 @@ The **Metrics** page shows cluster operational metrics in a single built-in moni

### CPU and Memory Profiling

The sidebar provides **Memory Profile** and **CPU Profile** entries for continuous profiling of GreptimeDB components. For configuration and usage, see [Continuous Profiling](./console-ui/continuous-profiling.md).
The sidebar provides **Memory Profile** and **CPU Profile** entries for continuous profiling of GreptimeDB components. For configuration and usage, see [Continuous Profiling](./console-ui/continuous-profiling.md). The enterprise dashboard also supports [Webhook Triggers](./console-ui/webhook-triggers.md) for sending notifications based on cluster resource usage metrics.
85 changes: 85 additions & 0 deletions docs/enterprise/console-ui/webhook-triggers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
keywords: [enterprise, management console, webhook triggers, resource usage, alerting, CPU, memory]
description: Configure enterprise dashboard webhook triggers to send notifications based on GreptimeDB cluster resource usage metrics.
---

# Webhook Triggers

Webhook triggers monitor cluster resource usage metrics and send HTTP notifications when a configured threshold is reached. This is an enterprise-only feature and is available only when the enterprise dashboard is deployed with dashboard version `v0.2.0-alpha.10` or later.

Configure webhook triggers per provisioned instance under `settings.monitoring.webhook_triggers` in the dashboard apiserver configuration. Enabled webhook triggers require a metrics source, either `settings.monitoring.greptimedb.url` or `settings.monitoring.metrics.prometheus`.

```yaml
provisionedInstances:
- name: mycluster
settings:
monitoring:
greptimedb:
url: http://monitoring-greptimedb:4000
# Or use a Prometheus-compatible metrics source:
# metrics:
# prometheus: http://prometheus:9090
webhook_triggers:
- name: high-datanode-memory
enabled: true
roles: [datanode]
metric: memory_usage_percent
operator: ">="
threshold: 90
cooldown_seconds: 300
url: https://alerts.example.com/datanode-memory
headers:
Authorization: Bearer token
- name: high-frontend-cpu
enabled: true
roles: [frontend]
metric: cpu_usage_millicores
threshold: 1000
cooldown_seconds: 600
url: https://alerts.example.com/frontend-cpu
```

Webhook trigger configuration items:

- `name`: Trigger name. Required when `enabled` is `true`. The name must be unique within the instance and must not contain `/`.
- `enabled`: Enables or disables the trigger.
- `roles`: Optional role filter. Omit it or leave it empty to match all roles. Supported roles are `frontend`, `metasrv`, `datanode`, and `flownode`.
- `metric`: Resource usage metric to evaluate. Supported metrics are `memory_usage_percent`, `memory_usage_bytes`, `cpu_usage_percent`, and `cpu_usage_millicores`.
- `operator`: Comparison operator. The default and only supported value is `>=`.
- `threshold`: Threshold value. It must be greater than `0`. Percentage metrics must be less than or equal to `100`.
Comment thread
v0y4g3r marked this conversation as resolved.
- `cooldown_seconds`: Minimum interval between repeated `firing` notifications for the same active alert. The default is `300` seconds.
- `url`: Webhook endpoint. Required when `enabled` is `true`; the URL must use `http://` or `https://`.
- `headers`: Optional custom HTTP headers, for example `Authorization`. The webhook client always sends `Content-Type: application/json`.

When a matching component crosses the threshold, the dashboard apiserver sends a `firing` payload. While the alert remains active, repeated `firing` payloads for the same instance, trigger, pod, and process start time are suppressed until `cooldown_seconds` elapses. When the metric drops below the threshold, the dashboard apiserver sends a `resolved` payload.

Webhook payloads use a fixed JSON schema and cannot be templated. A representative `firing` payload looks like this:

```json
{
"status": "firing",
"trigger_name": "high-datanode-memory",
"metric": "memory_usage_percent",
"operator": ">=",
"threshold": 90,
"value": 91.2,
"instance": "ns_demo",
"cluster": "demo",
"namespace": "ns",
"pod": "demo-datanode-0",
"role": "datanode",
"app": "greptime-datanode",
"component_instance": "datanode-0",
"endpoint": "http://demo-datanode-0:4000",
"process_start_time_seconds": 1760000000,
"starts_at": "2026-06-23T10:00:00Z",
"ends_at": null,
"sent_at": "2026-06-23T10:00:00Z"
}
```

For `resolved` payloads, `status` is `resolved`, `value` is the below-threshold value that resolved the alert, and `ends_at` is set.

:::note
Webhook trigger state is kept in dashboard apiserver memory and is not durable. There is no durable retry queue. If the dashboard apiserver restarts, or if an instance, trigger, or pod series disappears, the corresponding state may be forgotten without sending `resolved`. Receivers that need hard guarantees should expire alerts on their side.
:::
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,4 @@ GreptimeDB 企业版管理控制台在开源 [GreptimeDB 控制台](/getting-sta

### CPU and Memory Profiling

侧边栏提供 **Memory Profile** 与 **CPU Profile** 入口,用于对 GreptimeDB 组件做持续性能剖析。配置与使用见[持续性能剖析](./console-ui/continuous-profiling.md)。
侧边栏提供 **Memory Profile** 与 **CPU Profile** 入口,用于对 GreptimeDB 组件做持续性能剖析。配置与使用见[持续性能剖析](./console-ui/continuous-profiling.md)。企业版 dashboard 还支持 [Webhook 触发器](./console-ui/webhook-triggers.md),可基于集群资源使用指标发送通知。
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
keywords: [企业版, 管理控制台, Webhook 触发器, 资源使用, 告警, CPU, 内存]
description: 在企业版 dashboard 中配置 webhook 触发器,基于 GreptimeDB 集群资源使用指标发送通知。
---

# Webhook 触发器

Webhook 触发器会监控集群资源使用指标,并在达到配置阈值时发送 HTTP 通知。该功能仅属于企业版,并且只有在部署 dashboard 版本 `v0.2.0-alpha.10` 或更高版本的企业版 dashboard 时可用。

在 dashboard apiserver 配置中,按 provisioned instance 将 webhook 触发器配置到 `settings.monitoring.webhook_triggers` 下。启用 webhook 触发器需要配置指标数据源,即 `settings.monitoring.greptimedb.url` 或 `settings.monitoring.metrics.prometheus`。

```yaml
provisionedInstances:
- name: mycluster
settings:
monitoring:
greptimedb:
url: http://monitoring-greptimedb:4000
# 也可以使用 Prometheus 兼容的指标数据源:
# metrics:
# prometheus: http://prometheus:9090
webhook_triggers:
- name: high-datanode-memory
enabled: true
roles: [datanode]
metric: memory_usage_percent
operator: ">="
threshold: 90
cooldown_seconds: 300
url: https://alerts.example.com/datanode-memory
headers:
Authorization: Bearer token
- name: high-frontend-cpu
enabled: true
roles: [frontend]
metric: cpu_usage_millicores
threshold: 1000
cooldown_seconds: 600
url: https://alerts.example.com/frontend-cpu
```

Webhook 触发器配置项:

- `name`:触发器名称。`enabled` 为 `true` 时必填。同一个 instance 内名称必须唯一,且不能包含 `/`。
- `enabled`:启用或禁用该触发器。
- `roles`:可选的角色过滤器。省略或留空表示匹配所有角色。支持的角色包括 `frontend`、`metasrv`、`datanode` 和 `flownode`。
- `metric`:要检查的资源使用指标。支持 `memory_usage_percent`、`memory_usage_bytes`、`cpu_usage_percent` 和 `cpu_usage_millicores`。
- `operator`:比较运算符。默认值和当前唯一支持的值都是 `>=`。
- `threshold`:阈值,必须大于 `0`。百分比指标的阈值必须小于或等于 `100`。
- `cooldown_seconds`:同一活跃告警重复发送 `firing` 通知的最小间隔,默认值为 `300` 秒。
- `url`:Webhook 端点。`enabled` 为 `true` 时必填,并且必须使用 `http://` 或 `https://`。
- `headers`:可选的自定义 HTTP header,例如 `Authorization`。Webhook 客户端总是会发送 `Content-Type: application/json`。

当匹配的组件指标越过阈值时,dashboard apiserver 会发送 `firing` payload。告警保持活跃期间,同一 instance、trigger、pod 和进程启动时间对应的重复 `firing` payload 会被抑制,直到超过 `cooldown_seconds`。当指标降到阈值以下时,dashboard apiserver 会发送 `resolved` payload。

Webhook payload 使用固定 JSON 结构,暂不支持自定义模板。下面是一个代表性的 `firing` payload:

```json
{
"status": "firing",
"trigger_name": "high-datanode-memory",
"metric": "memory_usage_percent",
"operator": ">=",
"threshold": 90,
"value": 91.2,
"instance": "ns_demo",
"cluster": "demo",
"namespace": "ns",
"pod": "demo-datanode-0",
"role": "datanode",
"app": "greptime-datanode",
"component_instance": "datanode-0",
"endpoint": "http://demo-datanode-0:4000",
"process_start_time_seconds": 1760000000,
"starts_at": "2026-06-23T10:00:00Z",
"ends_at": null,
"sent_at": "2026-06-23T10:00:00Z"
}
```

对于 `resolved` payload,`status` 为 `resolved`,`value` 是使告警恢复的低于阈值的指标值,并且会设置 `ends_at`。

:::note
Webhook 触发器状态保存在 dashboard apiserver 内存中,不具备持久性,也没有持久化重试队列。如果 dashboard apiserver 重启,或者 instance、trigger、pod 指标序列消失,对应状态可能会被遗忘且不会发送 `resolved`。需要强保证的接收端应自行设置告警过期机制。
:::
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,4 @@ GreptimeDB 企业版管理控制台是标准 GreptimeDB 仪表板的增强版本

## 持续性能剖析

管理控制台支持[持续性能剖析](./continuous-profiling),可用于采集和分析 GreptimeDB 组件的 CPU 与内存 Profile。
管理控制台支持[持续性能剖析](./console-ui/continuous-profiling.md),可用于采集和分析 GreptimeDB 组件的 CPU 与内存 Profile。同时也支持 [Webhook 触发器](./console-ui/webhook-triggers.md),可基于集群资源使用指标发送通知
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
keywords: [企业版, 管理控制台, Webhook 触发器, 资源使用, 告警, CPU, 内存]
description: 在企业版 dashboard 中配置 webhook 触发器,基于 GreptimeDB 集群资源使用指标发送通知。
---

# Webhook 触发器

Webhook 触发器会监控集群资源使用指标,并在达到配置阈值时发送 HTTP 通知。该功能仅属于企业版,并且只有在部署 dashboard 版本 `v0.2.0-alpha.10` 或更高版本的企业版 dashboard 时可用。

在 dashboard apiserver 配置中,按 provisioned instance 将 webhook 触发器配置到 `settings.monitoring.webhook_triggers` 下。启用 webhook 触发器需要配置指标数据源,即 `settings.monitoring.greptimedb.url` 或 `settings.monitoring.metrics.prometheus`。

```yaml
provisionedInstances:
- name: mycluster
settings:
monitoring:
greptimedb:
url: http://monitoring-greptimedb:4000
# 也可以使用 Prometheus 兼容的指标数据源:
# metrics:
# prometheus: http://prometheus:9090
webhook_triggers:
- name: high-datanode-memory
enabled: true
roles: [datanode]
metric: memory_usage_percent
operator: ">="
threshold: 90
cooldown_seconds: 300
url: https://alerts.example.com/datanode-memory
headers:
Authorization: Bearer token
- name: high-frontend-cpu
enabled: true
roles: [frontend]
metric: cpu_usage_millicores
threshold: 1000
cooldown_seconds: 600
url: https://alerts.example.com/frontend-cpu
```

Webhook 触发器配置项:

- `name`:触发器名称。`enabled` 为 `true` 时必填。同一个 instance 内名称必须唯一,且不能包含 `/`。
- `enabled`:启用或禁用该触发器。
- `roles`:可选的角色过滤器。省略或留空表示匹配所有角色。支持的角色包括 `frontend`、`metasrv`、`datanode` 和 `flownode`。
- `metric`:要检查的资源使用指标。支持 `memory_usage_percent`、`memory_usage_bytes`、`cpu_usage_percent` 和 `cpu_usage_millicores`。
- `operator`:比较运算符。默认值和当前唯一支持的值都是 `>=`。
- `threshold`:阈值,必须大于 `0`。百分比指标的阈值必须小于或等于 `100`。
- `cooldown_seconds`:同一活跃告警重复发送 `firing` 通知的最小间隔,默认值为 `300` 秒。
- `url`:Webhook 端点。`enabled` 为 `true` 时必填,并且必须使用 `http://` 或 `https://`。
- `headers`:可选的自定义 HTTP header,例如 `Authorization`。Webhook 客户端总是会发送 `Content-Type: application/json`。

当匹配的组件指标越过阈值时,dashboard apiserver 会发送 `firing` payload。告警保持活跃期间,同一 instance、trigger、pod 和进程启动时间对应的重复 `firing` payload 会被抑制,直到超过 `cooldown_seconds`。当指标降到阈值以下时,dashboard apiserver 会发送 `resolved` payload。

Webhook payload 使用固定 JSON 结构,暂不支持自定义模板。下面是一个代表性的 `firing` payload:

```json
{
"status": "firing",
"trigger_name": "high-datanode-memory",
"metric": "memory_usage_percent",
"operator": ">=",
"threshold": 90,
"value": 91.2,
"instance": "ns_demo",
"cluster": "demo",
"namespace": "ns",
"pod": "demo-datanode-0",
"role": "datanode",
"app": "greptime-datanode",
"component_instance": "datanode-0",
"endpoint": "http://demo-datanode-0:4000",
"process_start_time_seconds": 1760000000,
"starts_at": "2026-06-23T10:00:00Z",
"ends_at": null,
"sent_at": "2026-06-23T10:00:00Z"
}
```

对于 `resolved` payload,`status` 为 `resolved`,`value` 是使告警恢复的低于阈值的指标值,并且会设置 `ends_at`。

:::note
Webhook 触发器状态保存在 dashboard apiserver 内存中,不具备持久性,也没有持久化重试队列。如果 dashboard apiserver 重启,或者 instance、trigger、pod 指标序列消失,对应状态可能会被遗忘且不会发送 `resolved`。需要强保证的接收端应自行设置告警过期机制。
:::
1 change: 1 addition & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ const sidebars: SidebarsConfig = {
items: [
'enterprise/console-ui',
'enterprise/console-ui/continuous-profiling',
'enterprise/console-ui/webhook-triggers',
],
Comment thread
v0y4g3r marked this conversation as resolved.
},
{
Expand Down
2 changes: 1 addition & 1 deletion versioned_docs/version-1.1/enterprise/console-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,4 @@ Displays long-running SQL and PromQL queries with detailed execution time and qu

## Continuous Profiling

The Management Console supports [Continuous Profiling](./continuous-profiling) for capturing and analyzing GreptimeDB component CPU and memory profiles.
The Management Console supports [Continuous Profiling](./console-ui/continuous-profiling.md) for capturing and analyzing GreptimeDB component CPU and memory profiles. It also supports [Webhook Triggers](./console-ui/webhook-triggers.md) for sending notifications based on cluster resource usage metrics.
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
keywords: [enterprise, management console, webhook triggers, resource usage, alerting, CPU, memory]
description: Configure enterprise dashboard webhook triggers to send notifications based on GreptimeDB cluster resource usage metrics.
---

# Webhook Triggers

Webhook triggers monitor cluster resource usage metrics and send HTTP notifications when a configured threshold is reached. This is an enterprise-only feature and is available only when the enterprise dashboard is deployed with dashboard version `v0.2.0-alpha.10` or later.

Configure webhook triggers per provisioned instance under `settings.monitoring.webhook_triggers` in the dashboard apiserver configuration. Enabled webhook triggers require a metrics source, either `settings.monitoring.greptimedb.url` or `settings.monitoring.metrics.prometheus`.

```yaml
provisionedInstances:
- name: mycluster
settings:
monitoring:
greptimedb:
url: http://monitoring-greptimedb:4000
# Or use a Prometheus-compatible metrics source:
# metrics:
# prometheus: http://prometheus:9090
webhook_triggers:
- name: high-datanode-memory
enabled: true
roles: [datanode]
metric: memory_usage_percent
operator: ">="
threshold: 90
cooldown_seconds: 300
url: https://alerts.example.com/datanode-memory
headers:
Authorization: Bearer token
- name: high-frontend-cpu
enabled: true
roles: [frontend]
metric: cpu_usage_millicores
threshold: 1000
cooldown_seconds: 600
url: https://alerts.example.com/frontend-cpu
```

Webhook trigger configuration items:

- `name`: Trigger name. Required when `enabled` is `true`. The name must be unique within the instance and must not contain `/`.
- `enabled`: Enables or disables the trigger.
- `roles`: Optional role filter. Omit it or leave it empty to match all roles. Supported roles are `frontend`, `metasrv`, `datanode`, and `flownode`.
- `metric`: Resource usage metric to evaluate. Supported metrics are `memory_usage_percent`, `memory_usage_bytes`, `cpu_usage_percent`, and `cpu_usage_millicores`.
- `operator`: Comparison operator. The default and only supported value is `>=`.
- `threshold`: Threshold value. It must be greater than `0`. Percentage metrics must be less than or equal to `100`.
- `cooldown_seconds`: Minimum interval between repeated `firing` notifications for the same active alert. The default is `300` seconds.
- `url`: Webhook endpoint. Required when `enabled` is `true`; the URL must use `http://` or `https://`.
- `headers`: Optional custom HTTP headers, for example `Authorization`. The webhook client always sends `Content-Type: application/json`.

When a matching component crosses the threshold, the dashboard apiserver sends a `firing` payload. While the alert remains active, repeated `firing` payloads for the same instance, trigger, pod, and process start time are suppressed until `cooldown_seconds` elapses. When the metric drops below the threshold, the dashboard apiserver sends a `resolved` payload.

Webhook payloads use a fixed JSON schema and cannot be templated. A representative `firing` payload looks like this:

```json
{
"status": "firing",
"trigger_name": "high-datanode-memory",
"metric": "memory_usage_percent",
"operator": ">=",
"threshold": 90,
"value": 91.2,
"instance": "ns_demo",
"cluster": "demo",
"namespace": "ns",
"pod": "demo-datanode-0",
"role": "datanode",
"app": "greptime-datanode",
"component_instance": "datanode-0",
"endpoint": "http://demo-datanode-0:4000",
"process_start_time_seconds": 1760000000,
"starts_at": "2026-06-23T10:00:00Z",
"ends_at": null,
"sent_at": "2026-06-23T10:00:00Z"
}
```

For `resolved` payloads, `status` is `resolved`, `value` is the below-threshold value that resolved the alert, and `ends_at` is set.

:::note
Webhook trigger state is kept in dashboard apiserver memory and is not durable. There is no durable retry queue. If the dashboard apiserver restarts, or if an instance, trigger, or pod series disappears, the corresponding state may be forgotten without sending `resolved`. Receivers that need hard guarantees should expire alerts on their side.
:::
Loading
Loading