Bound Session.inmemory_storage with LRU eviction#347
Conversation
Fix unbounded growth of WebConsole::Session.inmemory_storage that leaks
a full Rails route tree generation per dev-mode exception.
## Problem
WebConsole::Session.inmemory_storage is a class-level Hash with no eviction
or TTL. Every captured exception adds a new entry that retains all
stack-frame Binding objects in @exception_mappers. A Binding pins the
entire lexical scope at capture time -- including `self` for the failing
controller action, which transitively reaches `Rails.application.routes`.
In a long-running development process this means every captured exception
leaks:
* one ActionDispatch::Routing::RouteSet (or Rails::Engine::LazyRouteSet)
per error,
* the full Journey route AST hanging off it
(Journey::Nodes::{Cat,Slash,Literal,...}),
* the per-request HashWithIndifferentAccess objects reachable from the
captured `self`,
* everything else in scope at the throw site.
A real-world reproduction in a Rails 8.1 app saw Puma workers grow from
~470 MB at boot to >2 GB after a few hours of normal development, and to
~25 GB per worker over a day. Heap-dump analysis traced the retention to:
Binding <- Array (frame list)
<- WebConsole::ExceptionMapper / Evaluator
<- WebConsole::Session
<- WebConsole::Session.inmemory_storage
with 14 retained LazyRouteSet generations and ~1300 of 1900 live Bindings
reaching a RouteSet within four hops.
## Fix
Bound the storage with a configurable LRU-style cap. Hashes preserve
insertion order in Ruby, so the oldest entry is `inmemory_storage.shift`
on overflow. Default is 5 sessions -- enough for the in-browser REPL to
remain interactable for the most recent few errors, while keeping the
retained-set deterministic.
A new `config.web_console.max_sessions` knob (Integer or `nil`) lets apps
opt into a higher ceiling, or restore the legacy unbounded behaviour.
Reads and writes are wrapped in a Mutex because web-console serves both
the application thread (storing on error) and AJAX requests from the
console UI (looking up by id).
## Verification
* Existing test suite: 76 runs, 1641 assertions, 0 failures.
* New tests cover both eviction at the limit and `max_sessions = nil`
preserving legacy behaviour.
* `test/leak/reproduce_session_leak.rb` is a standalone repro that
measures RSS growth as a function of captured exceptions:
$ N_ERRORS=50 MAX_SESSIONS=nil ruby test/leak/reproduce_session_leak.rb
baseline sessions=0 rss= 15 MB
after 50 sessions=50 rss=544 MB
$ N_ERRORS=50 ruby test/leak/reproduce_session_leak.rb # default = 5
baseline sessions=0 rss= 14 MB
after 5 sessions=5 rss= 68 MB
after 50 sessions=5 rss=118 MB # plateau
Unbounded grows ~10 MB per captured error indefinitely; bounded
plateaus once the cap is reached and stays flat.
|
This sounds reasonable. A few technical nits: With the current eviction implementation, we are dependent on the Hash Entry Order which is stable in CRuby and well defined, but are we guaranteed that the order is the same in other Ruby implementations? Should we depend on that? More, we first assign the binding, then delete the excessive ones. If the Hash Entry Order changes in future Ruby versions, we may end up deleting the binding we just stored. I'm fine if we are off-by-one here, but I'm not fine if we wipe off the binding we just stored. If we first evicted old entries, and then added our current, we can guarantee the binding will be present in the in-memory storage. We can even fix the off-by-one if we evict N-1 records, instead of the N setting. |
| inmemory_storage[id] = self | ||
| INMEMORY_STORAGE_MUTEX.synchronize do | ||
| inmemory_storage[id] = self | ||
| evict_oldest_sessions |
There was a problem hiding this comment.
IMO, first evict the oldest sessions, then add the current one. We can evict limit - 1 sessions here to account for the newly added one. And we have to account the edge cases around the configuration.
That way, if the Ruby Hash Entry Order changes, we are guaranteed that the binding we just added is not evicted.
Fix unbounded growth of WebConsole::Session.inmemory_storage that leaks a full Rails route tree generation per dev-mode exception.
Problem
WebConsole::Session.inmemory_storage is a class-level Hash with no eviction or TTL. Every captured exception adds a new entry that retains all stack-frame Binding objects in @exception_mappers. A Binding pins the entire lexical scope at capture time -- including
selffor the failing controller action, which transitively reachesRails.application.routes.In a long-running development process this means every captured exception leaks:
self,A real-world reproduction in a Rails 8.1 app saw Puma workers grow from ~470 MB at boot to >2 GB after a few hours of normal development, and to ~25 GB per worker over a day. Heap-dump analysis traced the retention to:
with 14 retained LazyRouteSet generations and ~1300 of 1900 live Bindings reaching a RouteSet within four hops.
Fix
Bound the storage with a configurable LRU-style cap. Hashes preserve insertion order in Ruby, so the oldest entry is
inmemory_storage.shifton overflow. Default is 5 sessions -- enough for the in-browser REPL to remain interactable for the most recent few errors, while keeping the retained-set deterministic.A new
config.web_console.max_sessionsknob (Integer ornil) lets apps opt into a higher ceiling, or restore the legacy unbounded behaviour.Reads and writes are wrapped in a Mutex because web-console serves both the application thread (storing on error) and AJAX requests from the console UI (looking up by id).
Verification
Existing test suite: 76 runs, 1641 assertions, 0 failures.
New tests cover both eviction at the limit and
max_sessions = nilpreserving legacy behaviour.test/leak/reproduce_session_leak.rbis a standalone repro that measures RSS growth as a function of captured exceptions:$ N_ERRORS=50 MAX_SESSIONS=nil ruby test/leak/reproduce_session_leak.rb
baseline sessions=0 rss= 15 MB
after 50 sessions=50 rss=544 MB
$ N_ERRORS=50 ruby test/leak/reproduce_session_leak.rb # default = 5
baseline sessions=0 rss= 14 MB
after 5 sessions=5 rss= 68 MB
after 50 sessions=5 rss=118 MB # plateau
Unbounded grows ~10 MB per captured error indefinitely; bounded plateaus once the cap is reached and stays flat.