Give Firecracker forks their own mem-file instead of deferring the copy#298
Give Firecracker forks their own mem-file instead of deferring the copy#298sjmiller609 wants to merge 3 commits into
Conversation
…the copy Standby forks previously skipped the mem-file copy and stored a path to the source's memory file, serving UFFD faults from it and copying it only at the fork's next standby. That left forks silently dependent on the source snapshot or instance: deleting or stopping the source stranded standby forks, and the source's next diff snapshot mutated the shared file in place under running forks. Fork copies now always include the mem-file via the existing reflink-first directory copy, so a fork owns its memory from creation and the source can be deleted immediately. UFFD one-shot restore is unchanged except that the pager serves from the fork-local file; the shared page cache still works because forks inherit the source's cache key and the cloned files are byte-identical. Removes the deferred-path machinery: FirecrackerDeferredSnapshotMemoryPath, materialize-on-standby, base/latest alternate path resolution, snapshot source locks, the repoint step after running-source forks, and SnapshotOptions.
Benchmarks showed per-fork reflinked mem-files regress concurrent fanout: reflink shares disk extents but not the kernel page cache, so each fork's pager misses re-read the same pages from cold disk instead of hitting the cache warmed by sibling forks. Standby forks now hardlink the source's snapshot mem-file: fanout does no memory I/O and all forks of a snapshot fault against one inode, restoring the shared read path. Independence is preserved by inode refcount (deleting the source only unlinks a name) plus one invariant: the standby path never diff-writes into a mem-file with nlink > 1 — it replaces it with a private reflink/sparse copy first. That guard also covers a source instance re-entering standby while forks still share its retained base, and moves the one-time copy cost to each fork's first standby, off the fanout burst. Falls back from hardlink to copy with a warning when linking fails.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a43a9ba. Configure here.
| if shareMemFile { | ||
| if _, err := os.Stat(srcMem); err != nil { | ||
| shareMemFile = false | ||
| } |
There was a problem hiding this comment.
Stat errors silently disable mem hardlink
Medium Severity
In cloneGuestDirectoryForFork, os.Stat errors on the snapshot memory file (like permission or I/O issues) are treated as if the file is missing. This incorrectly disables memory file sharing, causing the file to be copied instead of hardlinked, or leading to later failures during restore.
Triggered by learned rule: Fork source os.Stat must propagate non-IsNotExist errors
Reviewed by Cursor Bugbot for commit a43a9ba. Configure here.


Summary
Firecracker standby forks (fork-from-standby-instance and fork-from-standby-snapshot) previously skipped copying the mem-file and stored a path to the source's memory file (
FirecrackerDeferredSnapshotMemoryPath), serving UFFD faults from it while running and copying it only at the fork's next standby. Until that first standby completed, every fork silently depended on a file it didn't own:DeleteSnapshot/DeleteInstance/StopInstanceon the source made standby-parked forks permanently unrestorable and broke running forks' next standby. Nothing tracked or guarded these dependencies.This PR removes the deferral. Standby forks now hardlink the source's snapshot mem-file into the fork's guest dir (fallback to reflink/sparse copy with a warning if linking fails), and the standby path enforces one invariant: never diff-write into a mem-file with
nlink > 1— it replaces a shared mem-file with a private reflink/sparse copy before Firecracker writes the diff.Why hardlink (not a per-fork copy or reflink)
Benchmarked on a 25-fork burst: per-fork reflinked copies regressed fork p50 ~4x. Reflink shares disk extents but the kernel page cache is per-inode, so each fork's pager cache misses re-read the same pages from cold encrypted disk instead of hitting pages warmed by sibling forks (5.7s aggregate backing reads vs ~0s). Hardlinks keep all forks on one inode:
FirecrackerSnapshotCacheKey) plus kernel page cache are both shared across forks.DeleteSnapshotis safe the moment the fork API returns.Firecracker mmaps the mem-file MAP_PRIVATE, so guest writes never reach the file; the standby diff merge is the only file writer and is covered by the guard.
Removed (now dead)
FirecrackerDeferredSnapshotMemoryPathmetadata field and all wiring in fork/snapshot-fork pathsmaterializeDeferredSnapshotMemory+ base/latest alternate-path resolution (both copies)repointForkDeferredSnapshotMemoryToSourceBaseafter running-source forkslockFirecrackerSnapshotSource/snapshotSourceLockshypervisor.SnapshotOptions(existed only to carry the deferred path;Snapshot()drops the param across all hypervisors)Behavior changes / caveats
lib/instances/firecracker_uffd.gochanges).Tests
ensureExclusiveSnapshotMemoryOwnershipunshares hardlinked mem-files (source bytes untouched) and no-ops on private/missing ones. Passing locally withlib/forkvm,lib/hypervisor/...,lib/guestmemory.TestFCUFFDOneShotLifecycle(currently skip-gated, KERNEL-1354) additionally assertsDeleteSnapshotsucceeds while a fork is running from it. KVM/UFFD integration not validated locally (sandbox has no KVM images/reflink fs) — needs the CI run.TestForkCloudHypervisorFromRunningNetwork/TestStandbyAndRestorefail in my sandbox on image-pull timeouts — both reproduce identically without these changes (environmental).🤖 Generated with Claude Code
Note
High Risk
Changes Firecracker fork/snapshot memory sharing, standby diff writes, and UFFD restore paths—areas where bugs cause silent guest memory corruption or broken restores; upgrade breaks in-flight deferred forks.
Overview
Replaces deferred Firecracker snapshot memory (forks pointed at the source mem-file and copied only at standby) with hardlinked mem-files at fork time for standby instance/snapshot forks, falling back to reflink/sparse copy when
linkfails.Safety: Before any in-place diff snapshot write,
ensureExclusiveSnapshotMemoryOwnershipclones the mem-file whennlink > 1so forks and sources never corrupt each other's memory. Standby invokes this guard; UFFD restore reads the fork's localsnapshot-latest/memorypath.Removed:
FirecrackerDeferredSnapshotMemoryPath, deferred materialization in FirecrackerSnapshot, snapshot-source locks,repointForkDeferredSnapshotMemoryToSourceBase, andhypervisor.SnapshotOptions. UFFD pager version bumped to0.1.4.Caveat: Standby forks created on older builds with only a deferred path may not restore after upgrade.
Reviewed by Cursor Bugbot for commit a43a9ba. Bugbot is set up for automated code reviews on this repo. Configure here.