Skip to content

Give Firecracker forks their own mem-file instead of deferring the copy#298

Open
sjmiller609 wants to merge 3 commits into
mainfrom
hypeship/fc-fork-local-memfile
Open

Give Firecracker forks their own mem-file instead of deferring the copy#298
sjmiller609 wants to merge 3 commits into
mainfrom
hypeship/fc-fork-local-memfile

Conversation

@sjmiller609

@sjmiller609 sjmiller609 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Firecracker standby forks (fork-from-standby-instance and fork-from-standby-snapshot) previously skipped copying the mem-file and stored a path to the source's memory file (FirecrackerDeferredSnapshotMemoryPath), serving UFFD faults from it while running and copying it only at the fork's next standby. Until that first standby completed, every fork silently depended on a file it didn't own:

  • DeleteSnapshot / DeleteInstance / StopInstance on the source made standby-parked forks permanently unrestorable and broke running forks' next standby. Nothing tracked or guarded these dependencies.
  • Worse, a source instance's restore + re-standby diff-writes into the same inode forks were reading (snapshot-base is promoted by rename, then Firecracker writes dirty pages in place) — silent memory corruption for dependents.

This PR removes the deferral. Standby forks now hardlink the source's snapshot mem-file into the fork's guest dir (fallback to reflink/sparse copy with a warning if linking fails), and the standby path enforces one invariant: never diff-write into a mem-file with nlink > 1 — it replaces a shared mem-file with a private reflink/sparse copy before Firecracker writes the diff.

Why hardlink (not a per-fork copy or reflink)

Benchmarked on a 25-fork burst: per-fork reflinked copies regressed fork p50 ~4x. Reflink shares disk extents but the kernel page cache is per-inode, so each fork's pager cache misses re-read the same pages from cold encrypted disk instead of hitting pages warmed by sibling forks (5.7s aggregate backing reads vs ~0s). Hardlinks keep all forks on one inode:

  • Fanout is zero-I/O — no mem-file read or write at fork time, on every filesystem.
  • Runtime matches the old shared-file profile — pager cache (keyed by the inherited FirecrackerSnapshotCacheKey) plus kernel page cache are both shared across forks.
  • Independence via inode refcount — deleting the source snapshot/instance only unlinks a name; forks keep the inode. DeleteSnapshot is safe the moment the fork API returns.
  • Correctness via the unshare guard — a fork's first standby (and a source re-entering standby while forks share its base) copies the mem-file to a private inode before the in-place diff merge. The one-time copy cost lands off the fanout burst, and is a ~free reflink on FICLONE-capable filesystems.

Firecracker mmaps the mem-file MAP_PRIVATE, so guest writes never reach the file; the standby diff merge is the only file writer and is covered by the guard.

Removed (now dead)

  • FirecrackerDeferredSnapshotMemoryPath metadata field and all wiring in fork/snapshot-fork paths
  • materializeDeferredSnapshotMemory + base/latest alternate-path resolution (both copies)
  • repointForkDeferredSnapshotMemoryToSourceBase after running-source forks
  • lockFirecrackerSnapshotSource / snapshotSourceLocks
  • hypervisor.SnapshotOptions (existed only to carry the deferred path; Snapshot() drops the param across all hypervisors)

Behavior changes / caveats

  • Upgrade: standby forks created by an older build that still carry a deferred path will not restore after this change (the field is gone and the fork has no local mem-file). Accepted intentionally; drain or restore+standby existing deferred forks before rolling out if any exist.
  • A source instance that re-enters standby while forks still share its retained base pays the unshare copy (reflink on xfs, sparse copy otherwise). Central-snapshot fanout never hits this — snapshot store files are never diff-written.
  • Bumps the UFFD pager version (CI gate on lib/instances/firecracker_uffd.go changes).

Tests

  • Unit: forks assert the mem-file shares the source's inode (instance-fork and snapshot-fork); compressed-source forks still fall back to a real copy; ensureExclusiveSnapshotMemoryOwnership unshares hardlinked mem-files (source bytes untouched) and no-ops on private/missing ones. Passing locally with lib/forkvm, lib/hypervisor/..., lib/guestmemory.
  • Integration: the fork-isolation test now asserts hardlink-at-fork + unshare-at-standby + source bytes/inode unchanged through the fork's full lifecycle; TestFCUFFDOneShotLifecycle (currently skip-gated, KERNEL-1354) additionally asserts DeleteSnapshot succeeds while a fork is running from it. KVM/UFFD integration not validated locally (sandbox has no KVM images/reflink fs) — needs the CI run.
  • TestForkCloudHypervisorFromRunningNetwork / TestStandbyAndRestore fail in my sandbox on image-pull timeouts — both reproduce identically without these changes (environmental).
  • Perf validation: re-run the 25-burst cold+warm fork bench; expected fork p50 back to main's baseline with hit-rate ~100% and ~0 backing-read time.

🤖 Generated with Claude Code


Note

High Risk
Changes Firecracker fork/snapshot memory sharing, standby diff writes, and UFFD restore paths—areas where bugs cause silent guest memory corruption or broken restores; upgrade breaks in-flight deferred forks.

Overview
Replaces deferred Firecracker snapshot memory (forks pointed at the source mem-file and copied only at standby) with hardlinked mem-files at fork time for standby instance/snapshot forks, falling back to reflink/sparse copy when link fails.

Safety: Before any in-place diff snapshot write, ensureExclusiveSnapshotMemoryOwnership clones the mem-file when nlink > 1 so forks and sources never corrupt each other's memory. Standby invokes this guard; UFFD restore reads the fork's local snapshot-latest/memory path.

Removed: FirecrackerDeferredSnapshotMemoryPath, deferred materialization in Firecracker Snapshot, snapshot-source locks, repointForkDeferredSnapshotMemoryToSourceBase, and hypervisor.SnapshotOptions. UFFD pager version bumped to 0.1.4.

Caveat: Standby forks created on older builds with only a deferred path may not restore after upgrade.

Reviewed by Cursor Bugbot for commit a43a9ba. Bugbot is set up for automated code reviews on this repo. Configure here.

…the copy

Standby forks previously skipped the mem-file copy and stored a path to the
source's memory file, serving UFFD faults from it and copying it only at the
fork's next standby. That left forks silently dependent on the source snapshot
or instance: deleting or stopping the source stranded standby forks, and the
source's next diff snapshot mutated the shared file in place under running
forks.

Fork copies now always include the mem-file via the existing reflink-first
directory copy, so a fork owns its memory from creation and the source can be
deleted immediately. UFFD one-shot restore is unchanged except that the pager
serves from the fork-local file; the shared page cache still works because
forks inherit the source's cache key and the cloned files are byte-identical.

Removes the deferred-path machinery: FirecrackerDeferredSnapshotMemoryPath,
materialize-on-standby, base/latest alternate path resolution, snapshot source
locks, the repoint step after running-source forks, and SnapshotOptions.
Benchmarks showed per-fork reflinked mem-files regress concurrent fanout:
reflink shares disk extents but not the kernel page cache, so each fork's
pager misses re-read the same pages from cold disk instead of hitting the
cache warmed by sibling forks.

Standby forks now hardlink the source's snapshot mem-file: fanout does no
memory I/O and all forks of a snapshot fault against one inode, restoring
the shared read path. Independence is preserved by inode refcount (deleting
the source only unlinks a name) plus one invariant: the standby path never
diff-writes into a mem-file with nlink > 1 — it replaces it with a private
reflink/sparse copy first. That guard also covers a source instance
re-entering standby while forks still share its retained base, and moves the
one-time copy cost to each fork's first standby, off the fanout burst.

Falls back from hardlink to copy with a warning when linking fails.
@sjmiller609 sjmiller609 marked this pull request as ready for review July 3, 2026 14:12

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a43a9ba. Configure here.

if shareMemFile {
if _, err := os.Stat(srcMem); err != nil {
shareMemFile = false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stat errors silently disable mem hardlink

Medium Severity

In cloneGuestDirectoryForFork, os.Stat errors on the snapshot memory file (like permission or I/O issues) are treated as if the file is missing. This incorrectly disables memory file sharing, causing the file to be copied instead of hardlinked, or leading to later failures during restore.

Fix in Cursor Fix in Web

Triggered by learned rule: Fork source os.Stat must propagate non-IsNotExist errors

Reviewed by Cursor Bugbot for commit a43a9ba. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant