Skip to content

Add Rails comparison: Action Cable / Solid Cable / AsyncCable / AnyCable#2

Open
irinanazarova wants to merge 18 commits into
add-socketioxidefrom
rails-comparison
Open

Add Rails comparison: Action Cable / Solid Cable / AsyncCable / AnyCable#2
irinanazarova wants to merge 18 commits into
add-socketioxidefrom
rails-comparison

Conversation

@irinanazarova

@irinanazarova irinanazarova commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Adds a fourth comparison to the bench: Rails WebSocket adapters (Action Cable, Solid Cable, Async::Cable, AnyCable), all speaking the Action Cable API, measured on the same sharded fleet as the Node.js suite.

Targets

  • cable-bench/ (Puma): one image, three modes via BENCH_MODE (actioncable = Redis adapter, solidcable = DB polling, anycable = gRPC RPC backend + anycable-go gateway).
  • cable-bench-falcon/ (Falcon): Async::Cable via async-cable + actioncable-next.

Adapter fixes to measure each fairly

  • Falcon fastlane: actioncable-next fastlane broadcasts call Socket#raw_transmit, which async-cable 0.3.1 lacks (0% delivery). Pin async-cable to the commit that adds it, and vendor the fiber-based Async::Cable::Executor (its released form requires edge Rails 8.2) so broadcast dispatch stays on the reactor instead of thread-hopping.
  • Puma workers: stock puma.rb omits the workers directive, so WEB_CONCURRENCY was ignored and Puma ran a single worker. Honor it, so process count matches Falcon's --count for a matched comparison.

Harness: Rails support in the coordinators

  • jitter-multi / throughput-multi / avalanche-multi: forward channel + acProtocol + cableUrl so the drivers hit a real Rails BenchmarkChannel over the base or extended protocol (not only anycable-go $pubsub).
  • New avalanche-multi-anycable: sharded deploy-survival (many runners generate the reconnect storm, one serviceInstanceRedeploy, aggregate time-to-95%), so the recovery number is not load-generator-limited.
  • Fix throughput-multi publish-rate key (interval -> intervalMs).

Realistic client per server

  • Per-adapter JS client: @rails/actioncable (the official Rails client, base protocol, native poll-based reconnect, no resume) for Action Cable / Solid Cable / Async::Cable; @anycable/core (extended protocol, resume) for AnyCable. Driving Action Cable with AnyCable's client flatters its reconnect; using each server's real client is the honest comparison. Includes a small Node shim (WebSocket adapter + browser-global stubs).
  • Configurable reconnect backoff (reconnectBaseMs) on the @anycable/core client.
  • Standard jitter outage: the jitter loop now holds a fixed-length network outage (clean disconnect()/connect() for @anycable/core; native socket drop + native monitor recovery for @rails/actioncable), so delivery reflects a real ~2s drop rather than the client's backoff.

Headline findings (matched, sharded)

Raw data: backend/results/rails-sharded-2026-06-28.json and backend/results/rails-capacity-break-2026-06-28.json. Full write-up in docs/rails-comparison.md.

  • Latency p50 / p99 (5K, steady network): AnyCable 7 / 31 ms, Action Cable 13 / 57, Async::Cable 20 / 71, Solid Cable 74 / 164.
  • Reliability under ~2s jitter (5K): AnyCable 99.9%, the Action Cable family 78.1% (base protocol, no resume, so each offline window drops broadcasts for good). AnyCable's resume backfills the missed messages; its jitter p99 (~6s) is that replayed history landing a beat late but delivered.
  • Deploy survival (5K, real app redeploy): AnyCable 0s (the anycable-go gateway holds connections across a Rails RPC-backend redeploy), the in-process trio ~7.5–8s down to ~96% reconnect.
  • Idle capacity to break (identical 32GB boxes, 8-worker config): AnyCable 600K+ (0 failures, ~47 KB/conn, not broken: the 50-runner load fleet maxed out, so treat it as a floor), Action Cable and Solid Cable ~52K each (file-descriptor ceiling at ~2.5 GB, not memory-bound), Async::Cable ~97K (memory-bound, ~290 KB/conn).

Note: the jitter row above (78.1% for the Action Cable family) is from the 2026-06-28 sharded run, which drove every target with @anycable/core. The per-adapter native-client change (@rails/actioncable for the Action Cable family) landed after that capture; a native-client re-run may shift the Action Cable family number. Drop the new result file into backend/results/ and the prose will be re-synced to it.

Target apps and harness for the Rails WebSocket adapter comparison behind
anycable.io/compare/rails-actioncable.

Targets:
- cable-bench/         Rails 8.1 app, BENCH_MODE selects Action Cable (Redis)
                       or Solid Cable (database); also the AnyCable RPC backend
- cable-bench-falcon/  same app booted on Falcon via actioncable-next +
                       async-cable (the AsyncCable target)

Harness:
- idle-multi.ts: forward CHANNEL/AC_PROTOCOL and send the bench-runner auth
  token, so idle/capacity runs can target a real Rails channel
- jitter-multi.ts: forward CHANNEL + AC_PROTOCOL for Rails targets
- idle-runner.ts / jitter-runners.ts / server.ts: channel + acProtocol params
- tests-manifest.ts: Rails latency/jitter/idle/avalanche/capacity specs

Results (sharded, one shared-tenant Railway window):
- backend/results/rails-sharded-2026-06-28.json   latency/jitter/10K/idle/avalanche
- backend/results/rails-capacity-break-2026-06-28.json  idle-to-break per adapter

Deep dive in docs/rails-comparison.md; summary in README.
Encode each stream payload once per channel identifier instead of once
per subscriber (~2x faster broadcasts). Stock Action Cable has no
equivalent, so this measures the optimized actioncable-next path on the
Async::Cable/Falcon target.
actioncable-next fastlane sends pre-encoded frames via Socket#raw_transmit,
which async-cable 0.3.1's Socket does not implement; without this shim every
fastlane broadcast raises NoMethodError and delivery is 0%.
throughput.ts / bench-runner /bench-throughput-anycable / throughput-multi.ts
now accept channel + acProtocol, mirroring the jitter path, so the throughput
suite can target a Rails BenchmarkChannel over the base protocol instead of
only anycable-go $pubsub over the extended protocol.
The runner reads req.query.intervalMs; throughput-multi sent 'interval', so
every run silently used the 100ms default and ignored the requested rate.
Coordinator-only fix; no runner rebuild needed.
Pin async-cable to @27181dff1 (native Socket#raw_transmit, Rails 8.1
compatible) and drop the raw_transmit shim. Vendor Async::Cable::Executor
(from async-cable dddef54c, whose released form requires edge Rails 8.2) and
install it via ActionCable::Server::Base#executor, so broadcast-delivery
callbacks (SubscriberMap::Async#invoke_callback -> executor.post) run on the
reactor instead of bouncing through Action Cable's thread pool. This is the
documented fix for Falcon broadcast latency; re-measure vs the 0.3.1 numbers.
Stock Rails puma.rb omits the workers directive, so WEB_CONCURRENCY was
ignored and the Action Cable target ran a single Puma process regardless of
the env var. Set it explicitly so Puma's process count matches the Falcon
target's falcon --count for a matched WS-engine comparison.
…/cableUrl)

Was hardcoded to /bench-avalanche-socketio with no Rails param passthrough.
Now selects /bench-avalanche-<protocol> (defaults to anycable for rails-*
services) and forwards channel/acProtocol/cableUrl, so the deploy-survival
test can drive Action Cable / Solid Cable / Async::Cable / AnyCable.
Adapts avalanche-multi-uws.ts to the /bench-avalanche-anycable endpoint so the
post-redeploy reconnect storm is generated across many bench-runners (~250
clients each) instead of one Node process, removing the load-generator limit
on the deploy-survival test. Fires one serviceInstanceRedeploy, aggregates
time-to-95%-reconnect across shards. prearm/recovery tuned under Railway's
5-min proxy timeout.
The @anycable/core client's default reconnect backoff is multi-second, so the
resume-tail p99 after a transient drop is dominated by reconnect wait, not
server delivery. Add reconnectBaseMs (job param + RECONNECT_BASE_MS env): first
reconnect fires in ~base ms (then x2 up to 5s). Set ~200 to collapse the tail.
The AnyCable jitter loop force-closed the socket then waited jitterDurationMs,
during which the client's Monitor reconnected on its backoff -> the offline
period was the backoff delay, not a fixed outage, so delivery depended on the
client's reconnect config. Now re-terminate any reconnect until the window
elapses, so it measures a standard 2s drop; the client backoff only governs
recovery speed after the outage.
Re-terminating fought the Monitor (flapping reconnects + backoff escalation).
Instead cable.disconnect() emits close -> Monitor cancels reconnect, client
stays cleanly offline for the outage window (sid retained), then connect()
reconnects once and AnyCable resumes. Outage length is now fixed and
backoff-independent, a true standard network drop.
Add @rails/actioncable and a clientLib=actioncable path in the jitter runner so
Action Cable / Solid Cable / Async::Cable are driven by the official Rails
client (base protocol, its own reconnect monitor, no resume), while AnyCable
keeps @anycable/core (extended protocol, resume). Realistic per-server client
instead of using @anycable/core for everything.
Its ConnectionMonitor calls addEventListener/removeEventListener and reads
document.visibilityState, which don't exist in Node -> ReferenceError. Provide
no-op stubs before the client loads.
… Node shim

Jitter: Action Cable family now drops the socket uncleanly and recovers on the
official client's own poll-based monitor (native, seconds) rather than a forced
immediate reconnect. Avalanche: same clientLib branch so deploy-survival uses
each server's real client. Extract the Node WebSocket+globals shim to a shared
module.
Strip the half-committed Centrifugo work that leaked into this Rails PR:
server.ts imported ../lib/centrifugo-runners.js (never committed), so the
bench-runner failed to build. Remove the centrifugo endpoints, env config,
and centrifugoUrls helper from server.ts, the centrifugo protocol wiring
from jitter-multi/throughput-multi, and the dev:centrifugo/smoke:centrifugo
scripts from package.json.

Keep the curated Rails result files in the repo (README/docs cite them):
un-ignore backend/results/rails-*.json and socketioxide-*.json, and add
rails-capacity-break-2026-06-28.json (previously referenced but untracked).
Raw per-run dumps stay ignored.
- jitter-runners: add destroy() to the JitterConn surface and use it at
  teardown. The @rails/actioncable path's disconnect() only closes the
  socket (leaving the ConnectionMonitor polling), which is correct for the
  in-run outage but orphaned a reconnecting consumer in the long-lived
  bench-runner at end of run. destroy() calls consumer.disconnect(), which
  also stops the monitor.
- jitter-runners: document the outage asymmetry explicitly — @anycable/core
  gets a fixed jitterDurationMs offline window; @rails/actioncable stays down
  for jitterDurationMs + its native monitor's reconnect latency, so its
  delivery reflects both no-resume and real client recovery time.
- avalanche-anycable-runner: track per-connection up/down state so a single
  drop that fires both "disconnect" and "close" (or any repeat event) counts
  once, instead of double-incrementing disconnected.
- README: results/ note now reflects tracked published files vs ignored dumps.
- async_cable_executor: note the dependency on ActionCable::Server::Base's
  @mutex/@executor ivars.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant