Skip to content

Conversation

rmloveland
Copy link
Contributor

@rmloveland rmloveland commented Oct 14, 2025

Fixes DOC-13184

Summary of changes:

Fixes DOC-13184

Summary of changes:

- Add a mention of the `storage.wal.failover.write_and_sync.latency`
  metric to the `wal-failover-metrics.md` include file, which will pull
  it into the 'WAL failover' and 'cockroach start' pages.

- We're also doing a cockroachdb/cockroach PR to mark this metric as
  'essential', so it shows up in the list of Storage essential metrics
  at e.g.
  https://www.cockroachlabs.com/docs/v25.3/essential-metrics-self-hosted.html#storage
Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 998c744
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/68eff61e36673300087ddbe5

Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 998c744
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/68eff61e68a781000871eca2

Copy link

Files changed:

rmloveland added a commit to rmloveland/cockroach that referenced this pull request Oct 14, 2025
This change marks the `storage.wal.failover.write_and_sync.latency`
metric as "Essential" so it gets automatically pulled into the
'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to
the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of
CockroachDB which have WAL failover (i.e., v24.1 and later).
Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for cockroachdb-docs failed. Why did it fail? →

Name Link
🔨 Latest commit 998c744
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/68eff61e0e63600008264cd2

Copy link

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)


src/current/_includes/v25.4/wal-failover-metrics.md line 7 at r1 (raw file):

- `storage.wal.failover.switch.count`: Count of the number of times WAL writing has switched from primary to secondary store, and vice versa.
- `storage.wal.fsync.latency` monitors the latencies of WAL files. If you have WAL failover enabled and are failing over, `storage.wal.fsync.latency` will include the latency of the stalled primary. 
- `storage.wal.failover.write_and_sync.latency` metric is up one level from `storage.wal.fsync.latency`, and during the failover will report the latency actually observed by higher levels (which should be ~equivalent to the latency of the secondary).

It is not just during the failover. We should say something like.

When WAL failover is configured in a cluster, the operator should monitor this metric which shows the effective latency observed by the higher layer writing to the WAL. This metric is expected to stay low in a healthy system, regardless of whether WAL files are being written to the primary or secondary.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland
Copy link
Contributor Author

thanks @sumeerbhola, i've updated in 998c744 - PTAL

once we're happy with this change and it's had docs review I would like to backport it to the WAL failover docs for all versions where this metric is supported

which previous versions have this metric available? is it everything v24.1+ or only a subset?

craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Oct 15, 2025
155395: storage: mark add'l WAL latency metric essential r=rmloveland a=rmloveland

This change marks the `storage.wal.failover.write_and_sync.latency` metric as "Essential" so it gets automatically pulled into the 'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of CockroachDB which have WAL failover (i.e., v24.1 and later).

Addresses part of DOC-13184

Co-authored-by: Rich Loveland <[email protected]>
rmloveland added a commit to rmloveland/cockroach that referenced this pull request Oct 16, 2025
This change marks the `storage.wal.failover.write_and_sync.latency`
metric as "Essential" so it gets automatically pulled into the
'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to
the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of
CockroachDB which have WAL failover (i.e., v24.1 and later).
rmloveland added a commit to rmloveland/cockroach that referenced this pull request Oct 16, 2025
This change marks the `storage.wal.failover.write_and_sync.latency`
metric as "Essential" so it gets automatically pulled into the
'Essential Metrics' documentation at e.g.,
https://www.cockroachlabs.com/docs/stable/essential-metrics-self-hosted.html#storage

This is necessary since we are adding some words about this metric to
the docs via cockroachdb/docs#20566

We would like to then backport this change to all supported versions of
CockroachDB which have WAL failover (i.e., v24.1 and later).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants