-
Notifications
You must be signed in to change notification settings - Fork 4k
rabbit_quorum_queue: Shrink batches of QQs in parallel #15081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
With this change and the default |
|
This looks fine to me, at least for now. It would be quite possible to get much higher throughput on this and use command pipelining instead of spawning a bunch of processes just to exercise the WAL more. We'd need to add that as an option to the Ra API however. |
|
Ah yeah, with pipelining we could use the WAL much more efficiently. That shouldn't be too bad to add to Ra - just a new function in I'm actually more worried about the In the meantime making this parallel seems like an easy improvement since we can continue using the |
8208549 to
f14957d
Compare
Shrinking a member node off of a QQ can be parallelized. The operation involves * removing the node from the QQ's cluster membership (appending a command to the log and committing it) with `ra:remove_member/3` * updating the metadata store to remove the member from the QQ type state with `rabbit_amqqueue:update/2` * deleting the queue data from the node with `ra:force_delete_server/2` if the node can be reached All of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s). `ra:force_delete_server/2` benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlying `rpc:call/4` will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, each `rpc:call/4` reuses the same underlying distribution entry and all calls fail together once the connection fails to establish.
f14957d to
a14595d
Compare
Shrinking a member node off of a QQ can be parallelized. The operation involves
ra:remove_member/3rabbit_amqqueue:update/2ra:force_delete_server/2if the node can be reachedAll of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s).
ra:force_delete_server/2benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlyingrpc:call/4will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, eachrpc:call/4reuses the same underlying distribution entry and all calls fail together once the connection fails to establish.Discussed in #15057