Run Podman containers in Firecracker microVMs with fast cloning via UFFD memory sharing and btrfs CoW snapshots.
Features
- Run OCI containers in isolated Firecracker microVMs
- ~6x faster startup with container image cache (540ms vs 3100ms)
- VM cloning via UFFD memory server + btrfs reflinks (~10ms restore, ~610ms with exec)
- Multiple VMs share memory via kernel page cache (50 VMs = ~512MB, not 25GB)
- Dual networking: bridged (iptables) or rootless (slirp4netns)
- Port forwarding for both regular VMs and clones
- FUSE-based host directory mapping via fuse-pipe
- Container exit code forwarding
- Interactive shell support (
-it) with full TTY (vim, editors, colors)- HTTP API server (
fcvm serve) — ComputeSDK-compatible gateway for programmatic sandbox management
Hardware
- Linux with
/dev/kvm(bare-metal or nested virtualization) - For AWS: c6g.metal (ARM64) or c5.metal (x86_64) - NOT regular instances
Runtime Dependencies
- Rust 1.83+ with cargo (rustup.rs)
- musl target:
rustup target add $(uname -m)-unknown-linux-musl - Firecracker binary in PATH
- For bridged networking: sudo, iptables, iproute2
- For rootless networking: slirp4netns
- For building rootfs: qemu-utils, e2fsprogs
Storage
- btrfs filesystem at
/mnt/fcvm-btrfs(native btrfs used directly; loopback created on non-btrfs hosts) - Kernel auto-downloaded from Kata Containers release on first run
Container Testing (Recommended) - All dependencies bundled:
make container-test # All tests in container (just needs podman + /dev/kvm)See CLAUDE.md for all Makefile targets.
Native Testing - Additional dependencies required:
| Category | Packages |
|---|---|
| FUSE | fuse3, libfuse3-dev |
| pjdfstest build | autoconf, automake, libtool |
| pjdfstest runtime | perl |
| bindgen (userfaultfd-sys) | libclang-dev, clang |
| VM tests | iproute2, iptables, slirp4netns |
| Rootfs build | qemu-utils, e2fsprogs |
| User namespaces | uidmap (for newuidmap/newgidmap) |
pjdfstest Setup (for POSIX compliance tests):
git clone --depth 1 https://github.com/pjd/pjdfstest /tmp/pjdfstest-check
cd /tmp/pjdfstest-check && autoreconf -ifs && ./configure && makeUbuntu/Debian Install:
sudo apt-get update && sudo apt-get install -y \
fuse3 libfuse3-dev \
autoconf automake libtool perl \
libclang-dev clang \
iproute2 iptables slirp4netns \
qemu-utils e2fsprogs \
uidmapSee Containerfile for the full dependency list used in CI.
Host system configuration:
# KVM access
sudo chmod 666 /dev/kvm
# Userfaultfd for snapshot cloning
sudo mknod /dev/userfaultfd c 10 126 2>/dev/null || true
sudo chmod 666 /dev/userfaultfd
sudo sysctl -w vm.unprivileged_userfaultfd=1
# FUSE allow_other
echo "user_allow_other" | sudo tee -a /etc/fuse.conf
# Ubuntu 24.04+: allow unprivileged user namespaces
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0
# IP forwarding for container networking (e.g., podman builds)
sudo sysctl -w net.ipv4.conf.all.forwarding=1
sudo sysctl -w net.ipv4.conf.default.forwarding=1
# Bridged networking only (not needed for --network rootless):
sudo mkdir -p /var/run/netns
sudo iptables -P FORWARD ACCEPT
# NAT rule is set up automatically by fcvm
# If running fcvm inside a container, set NAT on the HOST (container iptables don't persist):
# sudo iptables -t nat -A POSTROUTING -s 172.30.0.0/16 -o eth0 -j MASQUERADEfcvm runs containers inside Firecracker microVMs:
You → fcvm → Firecracker VM → Podman → Container
Each podman run boots a VM, pulls the image, and starts the container in an isolated microVM.
First run is ~3s. Cached runs with the same image are ~540ms.
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
# Install musl toolchain (for static linking fc-agent binary)
sudo apt install musl-tools
rustup target add $(uname -m)-unknown-linux-musl
# Clone and build fcvm + fc-agent binaries (~2 min)
git clone https://github.com/ejc3/fcvm
cd fcvm
make build
# → "Finished release profile [optimized] target(s)"
# Create symlink for convenience (works with sudo)
ln -sf target/release/fcvm ./fcvm
# Download kernel + build rootfs (~5 min first time, then cached)
sudo ./fcvm setup
# → "Setup complete"
# One-shot command (runs, prints output, exits)
./fcvm podman run --name hello alpine:latest -- echo "Hello from microVM"
# → Hello from microVM
# Run a long-lived service (stays in foreground, or add & to background)
./fcvm podman run --name web nginx:alpine
# → Logs show VM booting, then "healthy" when nginx is ready
# In another terminal:
./fcvm ls
# → Shows "web" with PID, health status, network info
./fcvm exec --name web -- cat /etc/os-release
# → Shows Alpine Linux info
# Bridged networking (for full network access, requires sudo)
sudo ./fcvm podman run --name web-bridged --network bridged nginx:alpinefcvm caches container images after the first pull. Subsequent runs with the same image are ~6x faster (540ms vs 3100ms).
# First run: pulls image, creates cache (~3s)
./fcvm podman run --name web1 nginx:alpine
# → Cache created for nginx:alpine
# Second run: restores from cache (~540ms)
./fcvm podman run --name web2 nginx:alpine
# → Restored from snapshot
# Disable snapshot for testing
./fcvm podman run --name web3 --no-snapshot nginx:alpineHow it works:
- First run: fc-agent pulls image, host takes Firecracker snapshot
- Cache key: SHA256 of (image, tag, cmd, env, config)
- Subsequent runs: Restore snapshot, fc-agent starts container (image already pulled)
The snapshot captures VM state after image pull and before container start.
On restore, fc-agent runs podman run with the already-pulled image, so pull/export is skipped.
fcvm uses a two-tier snapshot system to reduce startup time:
| Snapshot | When Created | Content | Size |
|---|---|---|---|
| Pre-start | After image pull, before container runs | VM with image loaded | Full (~1GB) |
| Startup | After HTTP health check passes | VM with container fully initialized | Diff (only dirty pages since pre-start) |
The pre-start snapshot is a full memory dump. The startup snapshot uses the pre-start as a
diff base, capturing only pages dirtied during container initialization. The diff is
automatically merged onto the base to produce a self-contained memory.bin.
The startup snapshot is triggered by --health-check <url>.
After the check passes, fcvm creates a diff snapshot of the initialized app and merges it
onto the pre-start base. The URL hostname is sent as the Host header. Use --health-check-timeout <seconds> to adjust the per-request timeout (default: 5s).
Second run restores from that snapshot and skips container initialization.
# First run: Creates pre-start (full) + startup (diff, merged onto base)
./fcvm podman run --name web --health-check http://localhost/ nginx:alpine
# → Pre-start snapshot: ~1024MB (full)
# → Startup snapshot: diff of dirty pages, merged onto pre-start base
# Second run: Restores from startup snapshot
./fcvm podman run --name web2 --health-check http://localhost/ nginx:alpine
# → Restored from startup snapshot (application already running)Clone snapshots use their source as parent for diff-based optimization across the clone chain.
fcvm supports three modes for delivering container images to VMs:
| Mode | Flag | Description | Speed |
|---|---|---|---|
| Overlay (default) | --image-mode overlay |
Pre-built ext4 image, mounted read-only as additionalImageStore |
Fast (~540ms cached) |
| Btrfs | --image-mode btrfs |
Pre-built btrfs image with native subvolumes, reflink-copied per VM | Fast (~540ms cached) |
| Archive | --image-mode archive |
Docker tar archive, podman load at boot |
Slow (~3s) |
The mode is auto-detected from the kernel profile (btrfs profile → btrfs mode) or can be set explicitly:
# Auto-detect from kernel profile
./fcvm podman run --name web --kernel-profile btrfs nginx:alpine
# Explicit mode
./fcvm podman run --name web --image-mode btrfs nginx:alpine
# Archive mode (fallback, no pre-built image needed)
./fcvm podman run --name web --image-mode archive nginx:alpineBtrfs mode is designed for rootless podman with --user. It builds the btrfs storage image as the target user on the host, so file ownership matches what rootless podman expects — no chown needed at boot:
# Rootless podman in VM with btrfs storage (no sudo needed)
./fcvm podman run --name app \
--user 1000:1000 \
--kernel-profile btrfs \
localhost/myimageHow btrfs mode works:
- Host exports container as Docker archive (
podman save) - Docker archive is attached to the VM as a read-only block device
- VM rootfs is natively btrfs — fc-agent runs
podman loaddirectly on the btrfs root filesystem (no loopback) - Snapshot caches the post-load state for instant subsequent boots
Requirements for btrfs mode:
- A btrfs kernel profile with
CONFIG_BTRFS_FS=y:./fcvm setup --kernel-profile btrfs --build-kernels
# Port forwarding (8080 on host -> 80 in container)
./fcvm podman run --name web --publish 8080:80 nginx:alpine
# In rootless: curl the assigned loopback IP (e.g., curl 127.0.0.2:8080)
# In bridged: curl the veth host IP (see ./fcvm ls --json)
# Mount host directory into container
./fcvm podman run --name app --map /host/data:/data alpine:latest
# Custom CPU/memory
./fcvm podman run --name big --cpu 4 --mem 4096 alpine:latest
# Interactive shell (-it like docker/podman)
./fcvm podman run --name shell -it alpine:latest sh
# JSON output for scripting
./fcvm ls --json
./fcvm ls --pid 12345 # Filter by PID
# Execute in guest VM instead of container
./fcvm exec --name web --vm -- hostname
# Interactive shell in container
./fcvm exec --name web -it -- sh
# TTY for colors (no stdin)
./fcvm exec --name web -t -- ls -la --color=alwaysTwo modes for restoring from snapshots:
- UFFD mode (
--pid): Memory served on-demand via UFFD server. Use for many concurrent clones. - Direct mode (
--snapshot): Memory loaded directly from file. Use for simpler single-clone flows.
# 1. Start baseline VM (using bridged, or omit --network for rootless)
sudo ./fcvm podman run --name baseline --network bridged public.ecr.aws/nginx/nginx:alpine
# 2. Create snapshot (pauses VM briefly, then resumes)
sudo ./fcvm snapshot create baseline --tag nginx-warm
# === Direct Mode (simpler, for single clones) ===
# Clone directly from snapshot files - no server needed
sudo ./fcvm snapshot run --snapshot nginx-warm --name clone1 --network bridged
# === UFFD Mode (for multiple concurrent clones) ===
# 3. Start UFFD memory server (serves pages on-demand, memory shared via page cache)
sudo ./fcvm snapshot serve nginx-warm
# 4. Clone from snapshot (~10ms restore, ~610ms with exec)
sudo ./fcvm snapshot run --pid <serve_pid> --name clone1 --network bridged
sudo ./fcvm snapshot run --pid <serve_pid> --name clone2 --network bridged
# 5. Clone with port forwarding (each clone can have unique ports)
sudo ./fcvm snapshot run --pid <serve_pid> --name web1 --network bridged --publish 8081:80
sudo ./fcvm snapshot run --pid <serve_pid> --name web2 --network bridged --publish 8082:80
# Get the host IP from fcvm ls --json, then curl it:
# curl $(./fcvm ls --json | jq -r '.[] | select(.name=="web1") | .config.network.host_ip'):8081
# 6. Clone and execute command (auto-cleans up after)
sudo ./fcvm snapshot run --pid <serve_pid> --network bridged --exec "curl localhost"
# Or in direct mode:
sudo ./fcvm snapshot run --snapshot nginx-warm --network bridged --exec "curl localhost"Use --hugepages to back VM memory with 2MB hugepages instead of 4KB pages.
Hugepages reduce TLB pressure for memory-intensive workloads by mapping guest memory
with 2MB Stage 2 block entries instead of 4KB page table entries.
# Allocate hugepage pool (must cover VM memory)
sudo sh -c 'echo 1200 > /proc/sys/vm/nr_hugepages' # 2400MB for a 2GB VM
# Run with hugepages
./fcvm podman run --name web --hugepages --health-check http://localhost/ nginx:alpine
# Release hugepage pool when done
sudo sh -c 'echo 0 > /proc/sys/vm/nr_hugepages'How it works with snapshots:
The snapshot cache flow creates two snapshots on the initial VM:
- Pre-start (Full): After container image import
- Startup (Diff): After container is healthy (uses pre-start as base, then merged)
The startup snapshot captures only dirty pages since pre-start and merges them onto the base. Restored VMs (clones) with hugepages disabled enable KVM dirty page tracking for clone-of-clone diff snapshots.
| VM role | Dirty tracking | Stage 2 mapping | Runs |
|---|---|---|---|
| Initial (cache creation) | Disabled | 2MB blocks | Once |
| Clone (hugepage) | Disabled | 2MB blocks (full TLB benefit) | Many times |
| Clone (non-hugepage) | Enabled | 4KB (no hugepages anyway) | Many times |
Benchmark results (c7gd.metal ARM64, 2GB VM, 256MB dirty data):
Phase Standard (4KB) Hugepages (2MB) Ratio
--------------------------------------------------------------------
First Run (cold) 10.1s 8.6s 0.85x
Diff Size (dirty) 95 MB 98 MB 1.03x
Clone Restore 0.5s 1.0s 1.99x
Limitations:
- Hugepage pool must be pre-allocated (
/proc/sys/vm/nr_hugepages) - Hugepage VMs require UFFD for snapshot restore (file-based restore is not supported by Firecracker with hugepages)
- Clone restore is ~2x slower with hugepages due to UFFD page fault handling at 2MB granularity
- Diff snapshots from hugepage clones use
mincore(2)fallback (reports in-core pages, not dirty pages), so they may be larger than necessary
When to use hugepages: Workloads where the TLB benefit of 2MB mappings on clones outweighs the ~2x slower clone restore. Best for long-running VMs with large working sets where TLB misses dominate performance.
| Example | Purpose |
|---|---|
| Clone Speed | ~10ms memory restore, ~610ms full cycle |
| Memory Sharing | 10 clones use ~1.5GB extra, not 20GB |
| Scale-Out | 50+ VMs with ~7GB memory, not 100GB |
| Privileged Container | mknod and device access work |
| Multiple Ports | Comma-separated port mappings |
| Multiple Volumes | Comma-separated volume mappings with :ro |
Clone timing measured on c7g.metal ARM64 with RUST_LOG=debug:
| Step | Time | Description |
|---|---|---|
| State lookup | ~1ms | Find serve process |
| Namespace spawn | ~6ms | unshare --user --net + UID/GID mappings |
| CoW disk reflink | ~31ms | btrfs instant copy |
| Network setup | ~35ms | TAP device, iptables rules |
| Firecracker spawn | ~6ms | Start VM process |
| Snapshot load (UFFD) | ~9ms | Load memory from server |
| Disk patch | <1ms | Point to CoW disk |
| VM resume | <1ms | Resume vCPUs |
| fc-agent recovery | ~100ms | ARP flush, kill stale TCP |
| Exec connect | ~20ms | Connect to guest vsock |
| Command + cleanup | ~300ms | Run echo + shutdown |
| Total | ~610ms | Full clone cycle with exec |
Core VM restore (snapshot load + resume) is ~10ms. Remaining time is network setup, agent recovery, and cleanup. 10 parallel clones complete in ~1s wall clock. See PERFORMANCE.md for detailed benchmarks.
Demo: Time a clone cycle
# Setup: Create baseline and snapshot (rootless mode)
./fcvm podman run --name baseline nginx:alpine
./fcvm snapshot create baseline --tag nginx-warm
./fcvm snapshot serve nginx-warm # Note the serve PID
# Time a clone startup (includes exec and cleanup)
time ./fcvm snapshot run --pid <serve_pid> --exec "echo ready"
# real 0m0.610s ← 610ms total, ~10ms for VM restoreShow that multiple clones share memory via kernel page cache:
# Check baseline memory
free -m | grep Mem
# Start 10 clones from same snapshot
for i in {1..10}; do
./fcvm snapshot run --pid <serve_pid> --name clone$i &
done
wait
# Memory increased only slightly. Clones share pages through the kernel page cache.
free -m | grep MemStart 50 web servers in parallel:
# Create warm nginx snapshot (one-time, in another terminal)
./fcvm podman run --name baseline --publish 8080:80 nginx:alpine
# Once healthy, in another terminal:
./fcvm snapshot create baseline --tag nginx-warm
./fcvm snapshot serve nginx-warm # Note serve PID
# Spin up 50 nginx instances in parallel
time for i in {1..50}; do
./fcvm snapshot run --pid <serve_pid> --name web$i --publish $((8080+i)):80 &
done
wait
# real 0m3.1s ← 50 VMs in ~3 seconds
# Verify all running
./fcvm ls | wc -l # 51 (50 clones + 1 baseline)
# Test a clone (use loopback IP from ./fcvm ls --json)
curl -s 127.0.0.10:8090 | head -5Run containers that need mknod or device access:
# Privileged mode allows mknod, /dev access, etc.
sudo ./fcvm podman run --name dev --privileged \
--cmd "sh -c 'mknod /dev/null2 c 1 3 && ls -la /dev/null2'" \
public.ecr.aws/docker/library/alpine:latest
# Output: crw-r--r-- 1 root root 1,3 /dev/null2Expose multiple ports and mount multiple volumes in one command:
# Multiple port mappings (comma-separated)
./fcvm podman run --name multi-port \
--publish 8080:80,8443:443 \
nginx:alpine
# Multiple volume mappings (comma-separated, with read-only)
./fcvm podman run --name multi-vol \
--map /tmp/logs:/logs,/tmp/data:/data:ro \
nginx:alpine
# Combined
./fcvm podman run --name full \
--publish 8080:80,8443:443 \
--map /tmp/html:/usr/share/nginx/html:ro \
--env NGINX_HOST=localhost,NGINX_PORT=80 \
nginx:alpineUse --portable-volumes to enable deterministic inode numbering for FUSE volumes. This allows snapshots with mounted volumes to be restored on a different machine:
# Standard mode (host inodes, same-machine clones only)
./fcvm podman run --name app --map /data:/data:ro alpine:latest
# Portable mode (path-hash inodes, cross-machine snapshot/restore)
./fcvm podman run --name app --portable-volumes --map /data:/data:ro alpine:latestWhen --portable-volumes is set, volumes use a RemapFs wrapper that translates between stable path-based inodes and host-specific inodes. The flag applies to all --map volumes on the VM and is persisted in snapshot metadata so clones inherit it automatically.
fcvm supports interactive terminal sessions, matching docker/podman's -i and -t flags:
| Flag | Meaning | Use Case |
|---|---|---|
-i |
Keep stdin open | Pipe data to container |
-t |
Allocate pseudo-TTY | Colors, line editing |
-it |
Both | Interactive shell |
# Run interactive shell in container
./fcvm podman run --name shell -it alpine:latest sh
# Run vim (full TTY - arrow keys, escape sequences work)
./fcvm podman run --name editor -it alpine:latest vi /tmp/test.txt
# Run shell in existing VM
./fcvm exec --name web1 -it -- sh
# Pipe data (use -i without -t)
echo "hello" | ./fcvm podman run --name pipe -i alpine:latest cat- Host side: Sets terminal to raw mode, captures all input
- Protocol: Binary framed protocol over vsock (handles escape sequences, control chars)
- Guest side: Allocates PTY, connects container stdin/stdout
Supported:
- Escape sequences (colors, cursor movement)
- Control characters (Ctrl+C, Ctrl+D, Ctrl+Z)
- Line editing in shells
- Full-screen apps (vim, htop, less)
Not yet implemented:
- Window resize (SIGWINCH) - terminal size is fixed at session start
fcvm supports VMs inside VMs using ARM64 FEAT_NV2. Host → L1 → L2 works. L3+ is currently limited by FUSE-over-FUSE latency (~5x per level).
| Requirement | Details |
|---|---|
| Hardware | ARM64 with FEAT_NV2 (Graviton3+: c7g.metal) |
| Host kernel | 6.18+ with kvm-arm.mode=nested |
| Nested kernel | fcvm setup --kernel-profile nested |
# Setup host kernel (one-time)
sudo ./fcvm setup --kernel-profile nested --install-host-kernel
sudo reboot
# Start outer VM with nested kernel
sudo ./fcvm podman run \
--name outer --network bridged \
--kernel-profile nested --privileged \
--map /mnt/fcvm-btrfs:/mnt/fcvm-btrfs \
nginx:alpine
# Run inner VM (inside outer)
./fcvm exec --pid <outer_pid> --vm -- \
/opt/fcvm/fcvm podman run --name inner --network bridged alpine:latest echo "nested!"Performance: L2 has ~5-7x FUSE overhead, and local disk is ~4x slower. L2 VMs are limited to one vCPU due to NV2 multi-vCPU interrupt issues. See PERFORMANCE.md and NESTED.md.
make test-root FILTER=kvm # Run nested virtualization testsfcvm/
├── src/ # Host CLI (fcvm binary)
├── fc-agent/ # Guest agent (runs inside VM)
├── fuse-pipe/ # FUSE passthrough library
└── tests/ # Integration tests (16 files)
See DESIGN.md for detailed structure.
Run fcvm --help or fcvm <command> --help for full options.
| Command | Description |
|---|---|
fcvm setup |
Download kernel (~15MB) and create rootfs (~10GB). Takes 5-10 min first run |
fcvm podman run |
Run container in Firecracker VM |
fcvm exec |
Execute command in running VM/container |
fcvm ls |
List running VMs (--json for JSON output) |
fcvm snapshot create |
Create snapshot from running VM |
fcvm snapshot serve |
Start UFFD memory server for cloning |
fcvm snapshot run |
Clone from snapshot (--pid for UFFD, --snapshot for direct) |
fcvm serve |
Start HTTP API server (ComputeSDK gateway) |
fcvm snapshots |
List available snapshots |
See DESIGN.md for architecture and design decisions.
fcvm podman run - Essential options:
--name <NAME> VM name (required)
--network <MODE> rootless (default) or bridged (needs sudo)
--publish <H:G> Port forward host:guest (e.g., 8080:80)
--map <H:G[:ro]> Volume mount host:guest (optional :ro for read-only)
--env <K=V> Environment variable
-i, --interactive Keep stdin open (for piping input)
-t, --tty Allocate pseudo-TTY (for vim, colors, etc.)
--setup Auto-setup if kernel/rootfs missing (rootless only)
--no-snapshot Disable automatic snapshot creation (for testing)
--hugepages Use 2MB hugepages for VM memory (requires pre-allocated pool)
--forward-localhost <PORTS> Forward localhost ports to host (e.g., 1421,9099)
--rootfs-size <SIZE> Minimum free space on rootfs (default: 10G)
fcvm exec - Execute in VM/container:
./fcvm exec --name my-vm -- cat /etc/os-release # In container
./fcvm exec --name my-vm --vm -- curl -s ifconfig.me # In guest OS
./fcvm exec --name my-vm -it -- bash # Interactive shellfcvm serve starts an HTTP server that implements the ComputeSDK gateway + sandbox daemon protocol.
Use it from the TypeScript computesdk package or any HTTP client.
# Start the API server
./fcvm serve --port 8090import { ComputeSDK } from 'computesdk';
const sdk = new ComputeSDK({
provider: 'fcvm',
apiKey: 'local',
gatewayUrl: 'http://localhost:8090'
});
const sandbox = await sdk.sandbox.create({ runtime: 'python' });
const result = await sandbox.runCode('print("hello")');
console.log(result.output); // "hello\n"
await sandbox.destroy();Gateway (sandbox lifecycle):
| Method | Path | Description |
|---|---|---|
POST |
/v1/sandboxes |
Create sandbox ({ runtime: "python" }) |
GET |
/v1/sandboxes |
List all sandboxes |
GET |
/v1/sandboxes/{id} |
Get sandbox details |
DELETE |
/v1/sandboxes/{id} |
Destroy sandbox |
Sandbox daemon (per-sandbox operations):
| Method | Path | Description |
|---|---|---|
GET |
/s/{id}/health |
Health check |
GET |
/s/{id}/ready |
Readiness check |
POST |
/s/{id}/run/code |
Run code ({ code, language? }) |
POST |
/s/{id}/run/command |
Run shell command ({ command, cwd?, env? }) |
GET |
/s/{id}/files?path= |
List directory |
POST |
/s/{id}/files |
Create file ({ path, content }) |
GET |
/s/{id}/files/*path |
Read file |
HEAD |
/s/{id}/files/*path |
Check file exists |
DELETE |
/s/{id}/files/*path |
Delete file |
POST |
/s/{id}/terminals |
Create terminal session |
GET |
/s/{id} |
WebSocket terminal connection |
# Create a Python sandbox
curl -s -X POST localhost:8090/v1/sandboxes \
-H 'Content-Type: application/json' \
-d '{"runtime":"python"}' | jq .
# Run code (use sandboxId from create response)
curl -s -X POST localhost:8090/s/<id>/run/code \
-H 'Content-Type: application/json' \
-d '{"code":"print(42)"}' | jq .
# Run a shell command
curl -s -X POST localhost:8090/s/<id>/run/command \
-H 'Content-Type: application/json' \
-d '{"command":"ls -la /"}' | jq .
# Write and read a file
curl -s -X POST localhost:8090/s/<id>/files \
-H 'Content-Type: application/json' \
-d '{"path":"/tmp/hello.txt","content":"hello world"}'
curl -s localhost:8090/s/<id>/files/tmp/hello.txt | jq .
# Destroy sandbox
curl -s -X DELETE localhost:8090/v1/sandboxes/<id> | jq .| Runtime | Image |
|---|---|
python |
python:3.12-slim |
node |
node:22-slim |
ruby |
ruby:3.3-slim |
go |
golang:1.23-alpine |
| Custom | Pass any image name directly |
| Mode | Flag | Root | Notes |
|---|---|---|---|
| Rootless | --network rootless (default) |
No | slirp4netns with bridge, IPv6 support |
| Bridged | --network bridged |
Yes | iptables NAT, better performance |
Rootless architecture: Uses a Linux bridge (br0) for L2 forwarding between slirp4netns and Firecracker. The bridge preserves MAC addresses for proper ARP/NDP learning, enabling IPv6 support.
In rootless mode, VMs can reach services on the host via slirp4netns gateways:
| Host Address | VM Uses | Description |
|---|---|---|
127.0.0.1 |
10.0.2.2 |
IPv4 loopback gateway |
::1 |
fd00::2 |
IPv6 loopback gateway |
VMs have full IPv6 support via slirp4netns. To reach host services bound to ::1:
# From inside the VM/container, use fd00::2 to reach host's ::1
wget http://[fd00::2]:8080/ # Reaches host's [::1]:8080
curl http://[fd00::2]:3000/ # Reaches host's [::1]:3000The VM's internal IPv6 address is fd00:1::2 on the fd00:1::/64 network.
fcvm forwards http_proxy and https_proxy from host to VM via MMDS:
# Set proxy on host - fcvm passes it to VM automatically
export http_proxy=http://[fd00::2]:8080
export https_proxy=http://[fd00::2]:8080
fcvm podman run --name myvm alpine:latest
# Image pulls inside VM will use the proxyManual configuration (proxy on host loopback, VM connects via gateway):
# On host: start proxy listening on ::1:8080 (or 127.0.0.1:8080)
# Inside VM: configure proxy using gateway address
export http_proxy=http://[fd00::2]:8080 # For IPv6 proxy
export http_proxy=http://10.0.2.2:8080 # For IPv4 proxy
# Now HTTP requests go through the proxy
wget http://example.com/Note: The VM uses fd00::2 or 10.0.2.2 (gateway addresses), not ::1 or 127.0.0.1
(which would be the VM's own loopback).
See DESIGN.md for architecture details.
- Exit codes: Container exit code forwarded to host via vsock
- Logs: Container stdout goes to host stdout, stderr to host stderr (clean output for scripting)
- Health: Default uses vsock ready signal; optional
--health-checkfor HTTP (timeout configurable via--health-check-timeout, default 5s)
See DESIGN.md for details.
| Variable | Default | Description |
|---|---|---|
FCVM_BASE_DIR |
/mnt/fcvm-btrfs |
Base directory for all data |
RUST_LOG |
warn |
Logging level (quiet by default; use info or debug for verbose) |
FCVM_NO_SNAPSHOT |
unset | Set to 1 to disable automatic snapshot creation (same as --no-snapshot flag) |
FCVM_NO_WRITEBACK_CACHE |
unset | Set to 1 to disable FUSE writeback cache (see below) |
FCVM_SNAPSHOT_CONCURRENCY |
10 |
Max concurrent snapshot creations (prevents dirty_ratio throttling) |
FUSE writeback cache is enabled by default for ~9x write performance. The kernel batches writes and flushes them asynchronously.
Known POSIX edge cases (disabled in pjdfstest):
| Test | Issue | Workaround |
|---|---|---|
open (3/144 fail) |
O_WRONLY promoted to O_RDWR, requires read permission | Use 0644 instead of 0200 for write-only files |
utimensat (1/122 fail) |
Needs kernel patch with default_permissions |
Use nested kernel profile which has the patch |
To disable writeback cache for debugging:
FCVM_NO_WRITEBACK_CACHE=1 ./fcvm podman run --name test alpine:latestCI covers the full stack:
| Metric | Count |
|---|---|
| Total Tests | 9,290 |
| Nextest Functions | 501 |
| POSIX Compliance (pjdfstest) | 8,789 |
| VMs Spawned | 331 (92 base + 239 clones) |
| UFFD Memory Servers | 28 |
| pjdfstest Categories | 17 |
Performance (on c7g.metal ARM64):
- Clone to healthy: 0.67s average (see Clone Speed Breakdown)
- Snapshot creation: 40.7s average
- Total test time: ~13 minutes (parallel jobs)
| Category | Description | VMs | Tests |
|---|---|---|---|
| Unit Tests | CLI parsing, state manager, protocol serialization | 0 | ~50 |
| FUSE Tests | fuse-pipe passthrough, permissions, mount/unmount | 0 | ~80 |
| VM Sanity | Basic VM lifecycle, networking, exec | ~20 | ~30 |
| Snapshot/Clone | UFFD memory sharing, btrfs reflinks, 100-clone scaling | ~230 | ~20 |
| pjdfstest | POSIX filesystem compliance in VMs | 17 | 8,789 |
| Egress/Port Forward | Network connectivity, port mapping | ~30 | ~40 |
| Disk Mounts | RO/RW disks, directory mapping, NFS | ~10 | ~15 |
| Nested KVM | L1→L2 virtualization (ARM64 NV2) | 2 | ~5 |
Tests are organized into tiers by privilege requirements:
make test-unit # Unit tests only (no VMs, no sudo)
make test-fast # + quick VM tests (rootless, no sudo)
make test-all # + slow VM tests (rootless, no sudo)
make test-root # + privileged tests (bridged, pjdfstest, sudo)
make test # Alias for test-root
make test-fc-mock # Container mode tests (no KVM required, uses fc-mock)Container equivalents:
make container-test-unit # Unit tests in container
make container-test # All tests in container (recommended)# Build first
make build
# Run all tests (requires sudo + KVM)
make test-root
# Filter by name pattern
make test-root FILTER=exec
# Live output (stream as tests run)
make test-root FILTER=sanity STREAM=1
# Single test with debug logging
RUST_LOG=debug make test-root FILTER=test_exec_basic STREAM=1Tests run automatically on PRs and pushes to main:
| Job | Runner | Tests |
|---|---|---|
| Host | Self-hosted ARM64 | Unit tests, quick VM tests (rootless) |
| Host-Root-SnapshotDisabled | Self-hosted ARM64 | Privileged tests with FCVM_NO_SNAPSHOT=1 |
| Host-Root-SnapshotEnabled | Self-hosted ARM64 | Privileged tests run twice to verify snapshot hit |
| Container | Self-hosted ARM64 | All tests in container |
The SnapshotEnabled job runs the same suite twice on one runner:
- Run 1: Creates snapshots (cache miss path)
- Run 2: Uses existing snapshots (cache hit path - should be faster)
This validates snapshot creation, persistence, and restore paths.
Latest results: CI Workflow → Actions tab
Analyze any CI run locally:
python3 scripts/analyze_ci_vms.py # Latest run
python3 scripts/analyze_ci_vms.py <run_id> # Specific runEnable tracing:
RUST_LOG="passthrough=debug,fuse_pipe=info" sudo -E cargo test ...Check running VMs:
./fcvm lsManual cleanup:
# Kill test VMs
ps aux | grep fcvm | grep test | awk '{print $2}' | xargs sudo kill 2>/dev/null
# Remove test directories
rm -rf /tmp/fcvm-test-*
# Force unmount stale FUSE mounts
sudo fusermount3 -u /tmp/fuse-*-mount*All data is stored under /mnt/fcvm-btrfs/ (btrfs CoW reflinks). See DESIGN.md.
# Setup btrfs (done automatically by make setup-btrfs)
make setup-btrfs
make setup-fcvm # Download kernel, create rootfsfcvm uses a config-driven approach for kernels and base images. All configuration is in rootfs-config.toml.
The default kernel is from Kata Containers:
| Property | Value |
|---|---|
| Version | 6.12.47 |
| Source | Kata 3.24.0 release |
| Key Config | CONFIG_FUSE_FS=y (required for volume mounts) |
| Architectures | arm64, amd64 |
The kernel is downloaded during fcvm setup and cached by URL hash. Changing the URL in config triggers a re-download.
The guest OS is Ubuntu 24.04 LTS (Noble Numbat):
| Property | Value |
|---|---|
| Version | 24.04 LTS |
| Source | Ubuntu cloud images |
| Packages | podman, crun, fuse-overlayfs, skopeo, fuse3, haveged, chrony |
The rootfs is built during fcvm setup and cached by script SHA.
Changing packages, services, or files in config triggers a rebuild.
fcvm supports custom kernel profiles for advanced use cases (for example nested virtualization).
Profiles define kernel config, optional Firecracker binary, and boot args.
Current profile: nested (arm64, CONFIG_KVM=y).
./fcvm setup --kernel-profile nested # Download pre-built
./fcvm setup --kernel-profile nested --build-kernels # Or build locally
sudo ./fcvm podman run --name vm1 --kernel-profile nested --privileged nginx:alpineTo add custom profiles or customize the base image, edit rootfs-config.toml.
See DESIGN.md for the profile config reference.
- Build fcvm first:
make build - Or set PATH:
export PATH=$PATH:./target/release
- Check VM logs:
./fcvm ls --json - Verify kernel and rootfs exist:
ls -la /mnt/fcvm-btrfs/ - Check networking: VMs use host DNS servers directly (no dnsmasq needed)
- VMs may not be cleaning up properly
- Manual cleanup:
ps aux | grep fcvm | grep test | awk '{print $2}' | xargs sudo kill
Spawn quick one-off VMs with inline commands to diagnose network problems:
# Test connectivity incrementally: gateway → DNS → external
./target/release/fcvm podman run --name net-debug-$(date +%s) --privileged alpine:latest sh -c "
echo '=== Network config ==='
ip addr show eth0
ip route
cat /etc/resolv.conf
echo ''
echo '=== Gateway ==='
ping -c 2 -W 3 10.0.2.2 || echo 'gateway failed'
echo ''
echo '=== DNS ==='
nslookup example.com || echo 'DNS failed'
echo ''
echo '=== External ==='
wget -q -O - --timeout=10 http://ifconfig.me || echo 'external failed'
" 2>&1 &
sleep 60 # Wait for VM to boot and run commandsInspect namespace for running VM:
HOLDER_PID=$(cat /mnt/fcvm-btrfs/state/*.json | jq -r '.holder_pid')
sudo nsenter --net=/proc/$HOLDER_PID/ns/net ip addr
sudo nsenter --net=/proc/$HOLDER_PID/ns/net bridge link # Show bridge ports- Firecracker requires
/dev/kvm - On AWS: use c6g.metal or c5.metal (NOT c5.large or other regular instances)
- On other clouds: use bare-metal instances or hosts with nested virtualization
DESIGN.md- Architecture, configuration reference, design decisionsPERFORMANCE.md- Benchmarks, tuning, and tracingNESTED.md- Nested virtualization setup and details.claude/CLAUDE.md- Development notes, debugging tipsLICENSE- MIT License
CI runs on self-hosted ARM64 runners (c7g.metal spot instances) managed by ejc3/aws-setup.
- Auto-scaling: Runners launch on demand, stop after 30 mins idle
- Hardware: c7g.metal with /dev/kvm for VM tests
- Cost: ~$0.50/hr spot pricing, $0 when idle
PRs are reviewed automatically by Claude. Findings prefixed with BLOCKING: fail the check.
| Trigger | Description |
|---|---|
| Auto | PRs from org members are reviewed automatically |
/claude-review |
Comment on any PR to trigger manual review |
@claude ... |
Ask Claude questions in PR comments |
Reviews check for security issues, bugs, and breaking changes.
MIT License - see LICENSE for details.