Refactor connectionId, nodeId, clusterId#375
Conversation
Signed-off-by: Argus Li <argus@argusli.dev>
Signed-off-by: Argus Li <argus@argusli.dev>
Signed-off-by: Argus Li <argus@argusli.dev>
Signed-off-by: Argus Li <argus@argusli.dev>
Signed-off-by: Argus Li <argus@argusli.dev>
Signed-off-by: Argus Li <argus@argusli.dev>
Signed-off-by: Argus Li <argus@argusli.dev>
Signed-off-by: Argus Li <argus@argusli.dev>
| return | ||
| if (typeof clusterId === "string") { | ||
| const nodeIds = Object.keys(clusterNodesRegistry[clusterId] ?? {}).filter((id) => metricsServerMap.has(id)) | ||
| const responses = await Promise.all( |
There was a problem hiding this comment.
a node may be in a semi-alive state like when it's a failover or something, and it will fail the promise, so maybe worth switching to allSettled instead?
There was a problem hiding this comment.
in other words, if one node is to be killed anyway — do we need to fail the operation that succeeded in all alive nodes?
There was a problem hiding this comment.
I think that's a good change. For this PR, my focus was on retaining existing logic and just ensuring my changes don't break anything. Considering #373 is waiting for this PR so that the bug where errors are not propagated to the right ID is fixed, I would prefer to not add this and the additional handling in this PR, and add it in its own PR right after.
| const responses = await Promise.all( | ||
| nodeIds.map((nodeId) => postConfigToNode(metricsServerMap.get(nodeId)?.metricsURI, config)), | ||
| ) | ||
| const firstFailure = responses.find((r) => !r.success) |
There was a problem hiding this comment.
subsequently, just getting the entire list of failing nodes instead of re-sending an update just to discover another failing node?
and then use filter instead of findFirst?
| sendUpdateError(ws, { clusterId }, firstFailure) | ||
| } else { | ||
| // All nodes responses are the same so we use the first. | ||
| sendUpdateFulfilled(ws, { clusterId }, responses[0] ?? { success: true, message: "", data: {} }) |
There was a problem hiding this comment.
updating a config is a rarely used operation so we don't have to optimize for network traffic. I'd prefer to have an explicit node: status response
| const nodes = | ||
| typeof clusterId === "string" | ||
| ? clusterNodesRegistry[clusterId] | ||
| typeof clusterId === "string" |
There was a problem hiding this comment.
did lint add trailing spaces?
There was a problem hiding this comment.
It appears to have.
| // `targetId` keys node-level metrics state: `clusterId` for a cluster, else | ||
| // the db-less `nodeId`. (Connection-scoped state below still uses `id`, the | ||
| // db-suffixed connectionId.) | ||
| const nodeId = toNodeId(id!) |
There was a problem hiding this comment.
Does it make sense to just store nodeId in frontend state instead of stripping the db suffixed ID every time we need it?
There was a problem hiding this comment.
We actually are using targetId as the key in the frontend state for node level metrics state. This can be seen in the following functions in the code i.e. selectCommandLogs()(), selectHotKeys()()...
The reason for keeping the connectionId key state is that we still have things that need to be scoped by db such as the connection details, keys etc.
There was a problem hiding this comment.
I'm more so referring to storing nodeId in connection state, so components can read it directly instead of calling toNodeId every time. But this isn't a blocker if you don't think its worth the time.
There was a problem hiding this comment.
Adding nodeId in connection state means that if the connectionId gets changed, we have to ensure nodeId stays consistent, which is an additional thing to manage. toNodeId() is also cheap operation so there's no real performance benefit from caching it directly in connection state.
So in my mind it's not worth gaining that really small performance gain in exchange for an additional state we have to manage.
Description
Previously the metrics state was keyed by the db-suffixed
connectionId(--db) even though the metrics process is unique per Valkey node. That mismatch caused state to be stored and looked up under inconsistent keys.This branch establishes a single id space for metrics and re-keys all node-level state to the db-less
nodeId(-).Bug fixes
Tooling
eslint.config.jstoeslint.config.mjsso the ESM-only config loads (the plugins are ESM-only and the root package is CommonJS).npm run lintautomatically fix.