-
Notifications
You must be signed in to change notification settings - Fork 476
Add storage.wal.failover.write_and_sync.latency
#20566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add storage.wal.failover.write_and_sync.latency
#20566
Conversation
Fixes DOC-13184 Summary of changes: - Add a mention of the `storage.wal.failover.write_and_sync.latency` metric to the `wal-failover-metrics.md` include file, which will pull it into the 'WAL failover' and 'cockroach start' pages. - We're also doing a cockroachdb/cockroach PR to mark this metric as 'essential', so it shows up in the list of Storage essential metrics at e.g. https://www.cockroachlabs.com/docs/v25.3/essential-metrics-self-hosted.html#storage
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
Files changed:
|
This change marks the `storage.wal.failover.write_and_sync.latency` metric as "Essential" so it gets automatically pulled into the 'Essential Metrics' documentation at e.g., https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage This is necessary since we are adding some words about this metric to the docs via cockroachdb/docs#20566 We would like to then backport this change to all supported versions of CockroachDB which have WAL failover (i.e., v24.1 and later).
❌ Deploy Preview for cockroachdb-docs failed. Why did it fail? →
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)
src/current/_includes/v25.4/wal-failover-metrics.md
line 7 at r1 (raw file):
- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa. - `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary. - `storage.wal.failover.write_and_sync.latency` metric is up one level from `storage.wal.fsync.latency`, and during the failover will report the latency actually observed by higher levels (which should be ~equivalent to the latency of the secondary).
It is not just during the failover. We should say something like.
When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.
thanks @sumeerbhola, i've updated in 998c744 - PTAL once we're happy with this change and it's had docs review I would like to backport it to the WAL failover docs for all versions where this metric is supported which previous versions have this metric available? is it everything v24.1+ or only a subset? |
155395: storage: mark add'l WAL latency metric essential r=rmloveland a=rmloveland This change marks the `storage.wal.failover.write_and_sync.latency` metric as "Essential" so it gets automatically pulled into the 'Essential Metrics' documentation at e.g., https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage This is necessary since we are adding some words about this metric to the docs via cockroachdb/docs#20566 We would like to then backport this change to all supported versions of CockroachDB which have WAL failover (i.e., v24.1 and later). Addresses part of DOC-13184 Co-authored-by: Rich Loveland <[email protected]>
This change marks the `storage.wal.failover.write_and_sync.latency` metric as "Essential" so it gets automatically pulled into the 'Essential Metrics' documentation at e.g., https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage This is necessary since we are adding some words about this metric to the docs via cockroachdb/docs#20566 We would like to then backport this change to all supported versions of CockroachDB which have WAL failover (i.e., v24.1 and later).
This change marks the `storage.wal.failover.write_and_sync.latency` metric as "Essential" so it gets automatically pulled into the 'Essential Metrics' documentation at e.g., https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage This is necessary since we are adding some words about this metric to the docs via cockroachdb/docs#20566 We would like to then backport this change to all supported versions of CockroachDB which have WAL failover (i.e., v24.1 and later).
Fixes DOC-13184
Summary of changes:
Add a mention of the
storage.wal.failover.write_and_sync.latency
metric to thewal-failover-metrics.md
include file, which will pull it into the 'WAL failover' and 'cockroach start' pages.We're also doing storage: mark add'l WAL latency metric essential cockroach#155395 to mark this metric as 'essential', so it shows up in the list of Storage essential metrics at e.g. https://www.cockroachlabs.com/docs/v25.3/essential-metrics-self-hosted.html#storage