RAM cache implementation - part II #10779

rattus128 · 2025-11-18T01:06:13Z

This PR expands the robustness of the RAM cache implementation. This makes the RAM cache much friendlier to use and avoids users needing to specifically size the cache based on their workflow. It also avoids OOMs in more cases, especially flows with multiple large models. There are three key changes:

1: Loosing the executors pre-emptive cache pin on models so that cached models late in a workflow can be freed to make space for earlier ones
2: pre-emptively freeing space for large models on load
3: freeing space on demand during the GPU -> RAM weight offload process

Example test conditions:

Linux, RTX5090, swapoff, 96GB RAM
Workflow: Flux FP16 -> qwen FP16 -> wan 2.2 FP16
giant-flow.json

In the screenshot its executing wan. The RAM trace shows it dropping down from 95% to make space for wan after qwen.

On rerun it still has all the text encodings for re-use.

Kosinkadink · 2025-11-22T04:45:17Z

Hey, is this PR in a state where it can be taken off draft + reviewed, or is it still in the oven?

rattus128 · 2025-11-22T04:58:41Z

Hey, is this PR in a state where it can be taken off draft + reviewed, or is it still in the oven?

Hey, we are stuck on draft as it conflicts with async offloading and I will need to do a small rebase and retest. Feel free to review though.

comfyanonymous · 2025-11-29T02:52:28Z

comfy_extras/nodes_custom_sampler.py

            outputs=[io.Sigmas.Output()]
        )

+    @classmethod


This should probably be in the v3 schema stuff instead of on every node.

@Kosinkadink What do you think?

I think I would be able to make it so that in v3, the check_lazy_status function would be autogenerated if at least one input is marked as lazy and no override was created. For this PR, I think having it be per node is fine, and I can edit these when I add that autogeneration.

Ive backed out the lazying changes so we can get the loader awareness in in its own right. Theres some discussion ongoing on how to do this lazy stuff.

move the headroom logic into the RAM cache to make this a little easier to call to "free me some RAM". Rename the API to free_ram(). Split off the clean_list creation to a completely separate function to avoid any stray strong reference to the content-to-be-freed on the stack.

Add the free_ram() API and a means to install implementations of the freer (I.E. the RAM cache).

currently this hard assumes that the caller of model_unload will keep current_loaded_models in sync. With RAMPressureCache its possible to have the garbage collector occur in the middle of the model free process which can split these two steps.

This is currently put together as a list of indexes assuming the current_loaded_models doesn't change. However we might need to pruge a model as part of the offload process which means this list can change in the middle of the freeing process. handle by taking independent refs to the LoadedModel objects and dong safe by-value deletion of current_loaded_models.

RAMPressure caching may ned to purge the same model that you are currently trying to offload for VRAM freeing. In this case, RAMPressure cache takes priority and needs to be able to pull the trigger on dumping the whole model and freeing the ModelPatcher in question. To do this, defer the actual tranfer of model weights from GPU to RAM to model_management state and not as part of ModelPatcher. This is dones as a list of weakrefs. If RAM cache decides to free to model you are currently unloading, then the ModelPatcher and refs simply dissappear in the middle of the unloading process, and both RAM and VRAM will be freed. The unpatcher now queues the individual leaf modules to be offloaded one-by-one so that RAM levels can be monitored. Note that the UnloadPartially that is potentially done as part of a load will not be freeable this way, however it shouldn't be anyway as that is the currently active model and RAM cache cannot save you if you cant even fit the one model you are currently trying to use.

comfyanonymous · 2025-12-03T04:14:24Z

There's a bug where I enable ram cache on simulated 50GB ram + 24 vram.

I run this workflow twice in a row:

It unloads the high noise model on the first workflow run which is good but the second time it gets stuck on the first sampler node.

rattus128 changed the title ~~RAM cache implementation - part II (after next stable)~~ RAM cache implementation - part II Nov 19, 2025

rattus128 force-pushed the prs/ram-cache-2 branch 2 times, most recently from 647d88a to ff66d8a Compare November 27, 2025 07:17

rattus128 marked this pull request as ready for review November 27, 2025 07:17

rattus128 requested a review from Kosinkadink as a code owner November 27, 2025 07:17

rattus128 mentioned this pull request Nov 27, 2025

add --total-ram flag for controlling visible system RAM #10941

Open

comfyanonymous reviewed Nov 29, 2025

View reviewed changes

rattus128 force-pushed the prs/ram-cache-2 branch from 58267a8 to ac4378e Compare December 2, 2025 09:16

rattus128 requested a review from guill as a code owner December 2, 2025 09:16

rattus128 added 7 commits December 3, 2025 00:59

mm: Add free_ram()

62a2622

Add the free_ram() API and a means to install implementations of the freer (I.E. the RAM cache).

sd: Free RAM on main model load

4a83a9b

mm: fix debug message

e65e642

rattus128 force-pushed the prs/ram-cache-2 branch from 87568ba to e65e642 Compare December 2, 2025 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RAM cache implementation - part II #10779

RAM cache implementation - part II #10779

rattus128 commented Nov 18, 2025 •

edited

Loading

Uh oh!

Kosinkadink commented Nov 22, 2025

Uh oh!

rattus128 commented Nov 22, 2025

Uh oh!

comfyanonymous Nov 29, 2025

Uh oh!

Kosinkadink Nov 29, 2025

Uh oh!

rattus128 Dec 2, 2025

Uh oh!

comfyanonymous commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RAM cache implementation - part II #10779

Are you sure you want to change the base?

RAM cache implementation - part II #10779

Conversation

rattus128 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kosinkadink commented Nov 22, 2025

Uh oh!

rattus128 commented Nov 22, 2025

Uh oh!

comfyanonymous Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

Kosinkadink Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

rattus128 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

comfyanonymous commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rattus128 commented Nov 18, 2025 •

edited

Loading