-
Notifications
You must be signed in to change notification settings - Fork 109
Description
Hello Paul,
I have an Erlang cluster of 3 nodes.
A few seconds after startup my code calls GenServer.call({:via, :swarm, "echo-be"}, ... where "echo-be" process does not exist yet.
In 24% of new node startups, this leads to GenServer.call hanging forever (actually inside Swarm.whereis_name called internally), and Swarm never becomes functional on this node.
I'm using Swarm version 3.4.0.
Attached is a full :erlang.dbg trace of the Swarm.Tracker process where the hang happens:
repro.log
I would appreciate it if you investigate this issue and come up with a fix or a workaround. We were very close to adopting Swarm before this issue was discovered.
Please let me know if you need any additional information/traces. I can reliably repro this.
Thank you, Dmitry.
P.S. The 24% number was derived from 910 test runs, where only 691 were successful (no hang).