[Questions] create quorum queue failed #15099

dormanze · 2025-12-10T07:22:30Z

dormanze
Dec 10, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.1.2

Erlang version used

26.2.x

Operating system (distribution) used

linux

How is RabbitMQ deployed?

Generic binary package

rabbitmq-diagnostics status output

See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

# PASTE OUTPUT HERE, BETWEEN BACKTICKS

Logs from node 1 (with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 2 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

rabbitmq.conf

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

# PASTE rabbitmq.conf HERE, BETWEEN BACKTICKS

Steps to deploy RabbitMQ cluster

config cluster and start

Steps to reproduce the behavior in question

so many quorum queue create

advanced.config

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

When my client restarts in batches (rolling upgrade), a large number of queue(1w+ quorum queues and 5w+ exclusive queues) registration requests are generated, and I occasionally receive some error messages. At the same time, we periodically call the API: /api/health/checks/port-listener/5673 to check whether the server status is normal. During the client start, this API also frequently reports errors.

./[email protected]:2025-12-10 14:29:11.855670+08:00 [error] <0.4797544.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.sys
tem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"
./[email protected]:2025-12-10 14:29:42.090122+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.sys
tem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"
/[email protected]:2025-12-10 14:29:51.902431+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.sys
tem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"
./[email protected]:2025-12-10 14:29:51.902638+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.sys
tem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"
./[email protected]:2025-12-10 14:29:51.903377+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.sysr
tem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5Tb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"
./[email protected]:2025-12-10 14:30:41.995924+08:00 [error] <0.5425325.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-
adc-bpm-79dcd7548c-rf5lb-172.18.0.37-196745786">>' , ChannelID '<0.5425325.0>'
./[email protected]:2025-12-10 14:30:41.998024+08:00 [error] <0.5374829.0>	operation basic.consume caused a connection exception internal error: "failed consuming from quorum queue 'mateinfo
system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/': noproc"
./[email protected]:2025-12-10 14:31:22.275683+08:00 [error] <0.5637570.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-
adc-bpm-79dcd7548c-rf5lb-172.18.0.37-27639187">>’ , ChannelID '<0,5637570.0>'
/[email protected]:2025-12-10 14:31:22.277879+08:00 [error] <0.5620055.0>	operation basic.consume caused a connection exception internal_error: "failed consuming from quorum queue 'mateinfo
system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5Lb-quorum’ in vhost '/': noproc"
./[email protected]:2025-12-10 14:33:13.044177+08:00 [error] <0.6146092.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-
adc-bpm-79dcd7548c-rf5Lb-172.18.0.37--797603050">>' , ChannelID '<0.6146092.0>'
/[email protected]:2025-12-10 14:33:13.110614+08:00 [error] <0.6120239.0> operation basic.consume caused a connection exception internal error: "failed consuming from quorum queue 'mateinfo
system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/': noproc"

It seems that some I/O timeout errors occurred when creating the queue, and then registering consumers generated a large number of noproc exceptions.
After the client restart was completed, some of my queue statuses became abnormal, and my management interface indicated that one of my instances was experiencing an issue.

I did not encounter this issue in the same environment when using 4.0.x.
I have seen the related PR #15003, could this explain why my consumer creation failed, but is my queue status abnormal due to slow local IO? Are there any parameters I can adjust to change this timeout period?

kjnilsson · 2025-12-10T09:32:11Z

kjnilsson
Dec 10, 2025
Maintainer

Around the time you restarted there will most likely be a stack trace with the full error and reason for why some processes did not start. Those are the logs I need to investigate further.

1 reply

dormanze Dec 13, 2025
Author

Could you tell me how to enable logging for this? I noticed that only my temporary exclusive queue is experiencing timeouts; the quorum queue does not.

michaelklishin · 2025-12-10T09:34:49Z

michaelklishin
Dec 10, 2025
Maintainer

@dormanze inject a random delay in the 1-15s range to your clients so that they do not all run their declarations at once.

There were several efficiency improvements around Khepri in the upcoming 4.2.2 release.

1 reply

dormanze Dec 13, 2025
Author

Thank you for your response. Unfortunately, we cannot do that because we have a large number of queues, and we need to ensure the upgrade duration when performing rolling upgrades on the client.

dormanze · 2025-12-13T08:36:58Z

dormanze
Dec 13, 2025
Author

I noticed in my observation logs that not only does creating queues sometimes time out, but adding bindings also experiences timeouts.

crasher:
initial call: rabbit reader:init/3
pid: <0.5125499.0>
registered name: []
exception exit: channel termination timeout
in function rabbit reader:wait for channel termination/3 (src/rabbit reader.erl, line 808)
in call from rabbit_reader:send error on channelO and close/4 (src/rabbit reader.erl, line 1720)
in call from rabbit reader:mainloop/4 (src/rabbit reader.erl, line 548)
in call from rabbit reader:run/1 (src/rabbit reader.erl, line 469)
in call from rabbit reader:start connection/5 (src/rabbit reader.erl, line 340)
ancestors: [<0.5125478.0>,<0.614.0>,<0.613.0>,<0.612.0>,<0.610.0>,
<0.609.0>,rabbit sup,<0.248.0 ]
message queue len: 11
messages: [{channel exit,4,
{amqp_error,internal error,
"Could not add binding due to timeout",
'queue.bind'}},
{channel exit,7,
{amqp_error,internal error,
"Could not add binding due to timeout",
'queue.bind'}},
{channel exit,8,
{amqp_error,internal error,
"Could not add binding due to timeout",
'queue.bind'}},

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Questions] create quorum queue failed #15099

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Questions] create quorum queue failed #15099

Uh oh!

Uh oh!

dormanze Dec 10, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 3 comments · 2 replies

Uh oh!

kjnilsson Dec 10, 2025 Maintainer

Uh oh!

dormanze Dec 13, 2025 Author

Uh oh!

michaelklishin Dec 10, 2025 Maintainer

Uh oh!

dormanze Dec 13, 2025 Author

Uh oh!

dormanze Dec 13, 2025 Author

dormanze
Dec 10, 2025

Replies: 3 comments 2 replies

kjnilsson
Dec 10, 2025
Maintainer

dormanze Dec 13, 2025
Author

michaelklishin
Dec 10, 2025
Maintainer

dormanze Dec 13, 2025
Author

dormanze
Dec 13, 2025
Author