Skip to content

Bound Session.inmemory_storage with LRU eviction#347

Open
brauliobo wants to merge 1 commit into
rails:mainfrom
brauliobo:fix/bounded-session-storage
Open

Bound Session.inmemory_storage with LRU eviction#347
brauliobo wants to merge 1 commit into
rails:mainfrom
brauliobo:fix/bounded-session-storage

Conversation

@brauliobo
Copy link
Copy Markdown

Fix unbounded growth of WebConsole::Session.inmemory_storage that leaks a full Rails route tree generation per dev-mode exception.

Problem

WebConsole::Session.inmemory_storage is a class-level Hash with no eviction or TTL. Every captured exception adds a new entry that retains all stack-frame Binding objects in @exception_mappers. A Binding pins the entire lexical scope at capture time -- including self for the failing controller action, which transitively reaches Rails.application.routes.

In a long-running development process this means every captured exception leaks:

  • one ActionDispatch::Routing::RouteSet (or Rails::Engine::LazyRouteSet) per error,
  • the full Journey route AST hanging off it (Journey::Nodes::{Cat,Slash,Literal,...}),
  • the per-request HashWithIndifferentAccess objects reachable from the captured self,
  • everything else in scope at the throw site.

A real-world reproduction in a Rails 8.1 app saw Puma workers grow from ~470 MB at boot to >2 GB after a few hours of normal development, and to ~25 GB per worker over a day. Heap-dump analysis traced the retention to:

Binding  <-  Array (frame list)
         <-  WebConsole::ExceptionMapper / Evaluator
         <-  WebConsole::Session
         <-  WebConsole::Session.inmemory_storage

with 14 retained LazyRouteSet generations and ~1300 of 1900 live Bindings reaching a RouteSet within four hops.

Fix

Bound the storage with a configurable LRU-style cap. Hashes preserve insertion order in Ruby, so the oldest entry is inmemory_storage.shift on overflow. Default is 5 sessions -- enough for the in-browser REPL to remain interactable for the most recent few errors, while keeping the retained-set deterministic.

A new config.web_console.max_sessions knob (Integer or nil) lets apps opt into a higher ceiling, or restore the legacy unbounded behaviour.

Reads and writes are wrapped in a Mutex because web-console serves both the application thread (storing on error) and AJAX requests from the console UI (looking up by id).

Verification

  • Existing test suite: 76 runs, 1641 assertions, 0 failures.

  • New tests cover both eviction at the limit and max_sessions = nil preserving legacy behaviour.

  • test/leak/reproduce_session_leak.rb is a standalone repro that measures RSS growth as a function of captured exceptions:

    $ N_ERRORS=50 MAX_SESSIONS=nil ruby test/leak/reproduce_session_leak.rb
    baseline sessions=0 rss= 15 MB
    after 50 sessions=50 rss=544 MB

    $ N_ERRORS=50 ruby test/leak/reproduce_session_leak.rb # default = 5
    baseline sessions=0 rss= 14 MB
    after 5 sessions=5 rss= 68 MB
    after 50 sessions=5 rss=118 MB # plateau

    Unbounded grows ~10 MB per captured error indefinitely; bounded plateaus once the cap is reached and stays flat.

Fix unbounded growth of WebConsole::Session.inmemory_storage that leaks
a full Rails route tree generation per dev-mode exception.

## Problem

WebConsole::Session.inmemory_storage is a class-level Hash with no eviction
or TTL. Every captured exception adds a new entry that retains all
stack-frame Binding objects in @exception_mappers. A Binding pins the
entire lexical scope at capture time -- including `self` for the failing
controller action, which transitively reaches `Rails.application.routes`.

In a long-running development process this means every captured exception
leaks:

  * one ActionDispatch::Routing::RouteSet (or Rails::Engine::LazyRouteSet)
    per error,
  * the full Journey route AST hanging off it
    (Journey::Nodes::{Cat,Slash,Literal,...}),
  * the per-request HashWithIndifferentAccess objects reachable from the
    captured `self`,
  * everything else in scope at the throw site.

A real-world reproduction in a Rails 8.1 app saw Puma workers grow from
~470 MB at boot to >2 GB after a few hours of normal development, and to
~25 GB per worker over a day. Heap-dump analysis traced the retention to:

    Binding  <-  Array (frame list)
             <-  WebConsole::ExceptionMapper / Evaluator
             <-  WebConsole::Session
             <-  WebConsole::Session.inmemory_storage

with 14 retained LazyRouteSet generations and ~1300 of 1900 live Bindings
reaching a RouteSet within four hops.

## Fix

Bound the storage with a configurable LRU-style cap. Hashes preserve
insertion order in Ruby, so the oldest entry is `inmemory_storage.shift`
on overflow. Default is 5 sessions -- enough for the in-browser REPL to
remain interactable for the most recent few errors, while keeping the
retained-set deterministic.

A new `config.web_console.max_sessions` knob (Integer or `nil`) lets apps
opt into a higher ceiling, or restore the legacy unbounded behaviour.

Reads and writes are wrapped in a Mutex because web-console serves both
the application thread (storing on error) and AJAX requests from the
console UI (looking up by id).

## Verification

* Existing test suite: 76 runs, 1641 assertions, 0 failures.
* New tests cover both eviction at the limit and `max_sessions = nil`
  preserving legacy behaviour.
* `test/leak/reproduce_session_leak.rb` is a standalone repro that
  measures RSS growth as a function of captured exceptions:

    $ N_ERRORS=50 MAX_SESSIONS=nil ruby test/leak/reproduce_session_leak.rb
    baseline   sessions=0   rss= 15 MB
    after 50   sessions=50  rss=544 MB

    $ N_ERRORS=50 ruby test/leak/reproduce_session_leak.rb   # default = 5
    baseline   sessions=0   rss= 14 MB
    after 5    sessions=5   rss= 68 MB
    after 50   sessions=5   rss=118 MB   # plateau

  Unbounded grows ~10 MB per captured error indefinitely; bounded
  plateaus once the cap is reached and stays flat.
@gsamokovarov
Copy link
Copy Markdown
Collaborator

gsamokovarov commented May 18, 2026

This sounds reasonable. A few technical nits:

With the current eviction implementation, we are dependent on the Hash Entry Order which is stable in CRuby and well defined, but are we guaranteed that the order is the same in other Ruby implementations? Should we depend on that?

More, we first assign the binding, then delete the excessive ones. If the Hash Entry Order changes in future Ruby versions, we may end up deleting the binding we just stored. I'm fine if we are off-by-one here, but I'm not fine if we wipe off the binding we just stored. If we first evicted old entries, and then added our current, we can guarantee the binding will be present in the in-memory storage. We can even fix the off-by-one if we evict N-1 records, instead of the N setting.

inmemory_storage[id] = self
INMEMORY_STORAGE_MUTEX.synchronize do
inmemory_storage[id] = self
evict_oldest_sessions
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, first evict the oldest sessions, then add the current one. We can evict limit - 1 sessions here to account for the newly added one. And we have to account the edge cases around the configuration.

That way, if the Ruby Hash Entry Order changes, we are guaranteed that the binding we just added is not evicted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants