Document the storage stack: disk backends and the frontend-to-backend pipeline

# Document the storage stack: disk backends and the frontend-to-backend pipeline

## Summary

PR #2932 adds documentation for the VTL2 storage translation and settings model — how OpenHCL maps backing devices onto guest-visible controllers. That covers the outer shell: what the guest sees, what OpenHCL is offered, and how the configuration surface connects them.

What it does not cover is the inside of that shell: the disk backend abstraction, the concrete disk backends, the layered disk model, and the path from a storage frontend (NVMe, StorVSP, IDE) through the SCSI adapter down to a disk backend. That is the scope of this issue.

This should be written so it is useful for both OpenVMM and OpenHCL contexts, since the same `DiskIo` trait, the same backends, and the same frontend implementations are shared.

---

## What should be documented

### The `DiskIo` trait and the `Disk` wrapper

The central abstraction is the `DiskIo` trait in `vm/devices/storage/disk_backend/src/lib.rs`. Every disk backend implements it. The key operations are:

- `read_vectored` / `write_vectored` (async, scatter-gather)
- `sync_cache` (flush)
- `unmap` (TRIM / deallocate)
- capacity and sector size queries
- optional persistent reservation support

The `Disk` struct wraps `Arc<dyn DynDisk>` for cheap concurrent cloning. This is what frontends hold.

The doc should explain the trait, the wrapper, and the design choices (async, scatter-gather, FUA, sector-aligned I/O).

### Storage frontends

How each frontend consumes a `Disk`:

| Frontend | Protocol | Transport | Crate |
|----------|----------|-----------|-------|
| NVMe | NVMe 2.0 | PCI MMIO + MSI-X | `nvme/` |
| StorVSP | SCSI CDB over VMBus | VMBus ring buffers | `storvsp/` |
| IDE | ATA/ATAPI | PCI/ISA I/O ports + DMA | `ide/` |

The doc should cover the data flow from guest I/O to `DiskIo` method call. The SCSI path is the most interesting because it goes through two layers:

1. **StorVSP**: dequeues SCSI requests from the VMBus ring
2. **SimpleScsiDisk** (`scsidisk/`): parses CDB opcodes and translates them to `DiskIo` calls

NVMe is simpler: the NVMe controller's namespace directly holds a `Disk` and calls into it.

### Concrete disk backends

All the backends that implement `DiskIo`:

| Backend | What it wraps | Crate |
|---------|---------------|-------|
| FileDisk | Host file (read-write-at) | `disk_file/` |
| Vhd1Disk | VHD1 fixed format | `disk_vhd1/` |
| VhdmpDisk | Windows VHD via vhdmp driver | `disk_vhdmp/` |
| BlobDisk | Remote HTTP/Azure blob (read-only) | `disk_blob/` |
| BlockDeviceDisk | Linux block device (io_uring) | `disk_blockdevice/` |
| NvmeDisk | Physical NVMe via VFIO | `disk_nvme/` |
| StripedDisk | Data striped across multiple backends | `disk_striped/` |

### Wrapping backends (decorators)

Backends that wrap another `Disk` and transform I/O:

| Wrapper | Transform | Crate |
|---------|-----------|-------|
| CryptDisk | XTS-AES256 encryption | `disk_crypt/` |
| DelayDisk | Injected I/O latency | `disk_delay/` |
| DiskWithReservations | In-memory persistent reservations | `disk_prwrap/` |

The wrapping pattern is important to document because it is how features compose without modifying backends.

### The layered disk model

`disk_layered/` is its own subsystem. A layered disk stacks multiple layers with read-through and optional write-through semantics:

- Each layer implements `LayerIo` (similar to `DiskIo` but tracks sector presence via bitmap)
- Reads check layers top-to-bottom; the first layer with the requested sectors wins
- Writes go to the topmost writable layer
- Layers can optionally cache read-miss data from below (`read_cache`)
- Layers can optionally write through to the next layer (`write_through`)

The two concrete layer implementations today are:

- **RamDiskLayer** (`disklayer_ram/`) — ephemeral, fast
- **SqliteDiskLayer** (`disklayer_sqlite/`) — persistent, portable

This is what powers the `memdiff:` disk configuration in the CLI. It deserves its own section because the bitmap-based presence tracking and the layer configuration model are not obvious from reading a single file.

### Resolver integration

The doc should show how the resolver pattern connects configuration to concrete backends. The storage resolver chain is the best example of recursive resolution in the codebase:

- NVMe controller → resolves each namespace's disk
- Layered disk → resolves each layer in parallel
- Each layer or backend → resolves to a concrete `DiskIo` implementation

This ties back to the resolver documentation (separate issue) but deserves a storage-specific section showing the concrete resolution flow.

### Online disk resize

Online disk resize is an interesting cross-cutting concern because the behavior differs by frontend, backend, and OpenHCL vs standalone context.

**Frontend notification mechanisms:**

| Frontend | Resize notification | How it works |
|----------|-------------------|--------------|
| NVMe | AEN (Async Event Notification) | Background task calls `disk.wait_resize()` per namespace; on change, completes a queued AER command with `CHANGED_NAMESPACE_LIST` |
| StorVSP/SCSI | UNIT_ATTENTION sense key | On the next SCSI command after a resize, `SimpleScsiDisk` detects the capacity change and returns UNIT_ATTENTION; guest retries and re-reads capacity |
| IDE | Not supported | IDE has no standardized capacity-change notification |

**Backend `wait_resize` support:**

The `DiskIo` trait has a `wait_resize` method that defaults to `pending()` (never completes). Only backends that can detect runtime capacity changes override it:

| Backend | `wait_resize` | How |
|---------|---------------|-----|
| `disk_blockdevice` | ✅ Event-driven | Linux uevent listener for block device resize events |
| `disk_nvme` | ✅ Event-driven | NVMe driver monitors AENs from the physical controller; rescans namespace to detect capacity changes |
| `disk_file` | ❌ Default (pending) | No file-change monitoring |
| `disk_vhd1` | ❌ Default (pending) | Fixed-size format |
| `disk_blob` | ❌ Default (pending) | Remote blob, no resize |
| `disk_layered` | ✅ Delegates | Delegates to bottom-most layer |
| Wrappers (crypt, delay, prwrap) | ✅ Delegates | Forward to inner disk |

**OpenHCL vs standalone:**

In OpenHCL, the resize path is the same as standalone: `disk_blockdevice` detects the uevent from the host, `wait_resize` completes, and the NVMe or SCSI frontend notifies the VTL0 guest through the standard mechanism. There is no special paravisor-level resize interception.

The doc should explain this end-to-end flow and make clear which backends actually support it, since a contributor attaching a `disk_file` backend and expecting runtime resize will be confused when nothing happens.

### RAM disk (`mem:<len>`)

The CLI supports standalone RAM disks via `mem:<len>` (e.g., `--disk mem:1G`). This is distinct from `memdiff:<disk>`, which stacks a RAM layer on top of a backing disk.

Under the hood, `mem:<len>` creates a `RamDiskLayerHandle { len: Some(len) }` wrapped in a single-layer `LayeredDiskHandle`. So even a "standalone" RAM disk is actually the layered disk machinery with one layer.

`memdiff:<disk>` creates a `RamDiskLayerHandle { len: None }` (sized from the backing disk) stacked on top of the inner disk. Writes go to the RAM layer; reads fall through to the backing disk for sectors not yet written.

The doc should explain this because the CLI surface (`mem:` vs `memdiff:`) hides the underlying layered disk model, and contributors reading the code will see `RamDiskLayerHandle` in both cases and wonder what the difference is.

### Virtual optical / DVD

The storage stack supports virtual DVD/CD-ROM drives, which have a different model from disk devices.

**How it works:**

- `SimpleScsiDvd` (in `scsidisk/src/scsidvd/`) implements `AsyncScsiDisk` and handles optical-specific SCSI commands: `GET_EVENT_STATUS_NOTIFICATION`, `GET_CONFIGURATION`, `START_STOP_UNIT` (eject), media change events, and the standard read path.
- The `GuestMedia` enum (in `ide_resources/`) distinguishes `GuestMedia::Dvd` from `GuestMedia::Disk`. DVD wraps a `SimpleScsiDvdHandle` which holds a `Resource<ScsiDeviceHandleKind>`, while Disk wraps a `Resource<DiskHandleKind>`.
- Eject is supported: the `DiskIo` trait has an `eject()` method (defaults to `UnsupportedEject`), and `SimpleScsiDvd` handles the SCSI `START_STOP_UNIT` command with the load/eject flag. Once ejected, the media is permanently removed for the lifetime of the VM.

**Frontend support:**

| Frontend | DVD support |
|----------|------------|
| StorVSP/SCSI | ✅ Via `SimpleScsiDvd` |
| IDE | ✅ Via `AtapiDrive` wrapping `SimpleScsiDvd` through ATAPI |
| NVMe | ❌ Explicitly rejected (`"dvd not supported with nvme"`) |

**CLI surface:**

DVD is specified with the `dvd` flag on `--disk` or `--ide`:
- `--disk file:my.iso,dvd` → SCSI optical drive
- `--ide file:my.iso,dvd` → IDE optical drive (ATAPI)

The `dvd` flag implicitly sets `read_only = true`.

The doc should cover the DVD model because it is a common source of confusion: the guest media enum, the SCSI-vs-ATAPI layering, why NVMe rejects DVD, and how eject works.

---

## Where this belongs

This is architecture reference content. I think it belongs as a new page or set of pages under the architecture section, near the existing OpenVMM and OpenHCL architecture pages. It should cross-link to:

- the storage translation page from PR #2932 (for the OpenHCL settings model)
- the resolver docs (for how backends are wired up)
- the NVMe and StorVSP device pages (once those exist)

Possible locations:

- `Guide/src/reference/architecture/openvmm/storage.md` — for the shared storage pipeline
- Or a section under `Guide/src/reference/devices/` if we want it closer to device docs

I lean toward the architecture section since this is about the internal pipeline, not about a single device.

---

## What should be rustdoc vs Guide

| Content | Location |
|---------|----------|
| `DiskIo` trait semantics, method contracts, scatter-gather model | Rustdoc on `disk_backend` |
| `wait_resize` method contract and default behavior | Rustdoc on `disk_backend` |
| `LayerIo` trait, bitmap semantics, layer configuration | Rustdoc on `disk_layered` |
| Per-backend implementation notes (e.g., VHD1 format details, io_uring usage) | Rustdoc on each backend crate |
| Architecture overview: how frontends, the SCSI adapter, and backends connect | Guide page |
| Data flow diagrams (guest I/O → frontend → SCSI → DiskIo → backend) | Guide page |
| Layered disk model explanation with examples | Guide page |
| RAM disk vs memdiff CLI semantics | Guide page |
| Online disk resize: which backends support it, how frontends notify the guest | Guide page |
| Virtual optical / DVD: `GuestMedia` model, eject, frontend support matrix | Guide page |
| Backend catalog (what exists, when to use each) | Guide page |
| Wrapping/decorator pattern explanation | Guide page |
| Resolver integration for storage | Guide page |

---

## Goals

- A contributor can understand how a guest I/O request flows from the frontend to the disk backend
- A contributor adding a new disk backend knows what to implement and how it gets wired up
- The layered disk model is explained clearly enough that someone can reason about layer composition without reading the implementation
- The backend catalog gives a quick reference for what exists and when each is appropriate

---

## Non-goals

- Documenting the VTL2 storage translation settings model (that is PR #2932)
- Documenting every SCSI CDB opcode that SimpleScsiDisk handles
- Redesigning the storage stack
- Performance tuning guidance (stack futures, buffer reuse, etc. are implementation details for now)

---

## Rough implementation plan

1. Write a Guide page covering the storage pipeline: `DiskIo` trait, frontends, the SCSI adapter, backends, wrappers, layered disks
2. Include data flow diagrams (mermaid) for the NVMe path and the StorVSP/SCSI path
3. Add a backend catalog table
4. Add a section on the layered disk model with a worked example
5. Expand rustdoc on `disk_backend` and `disk_layered` crate-level docs
6. Cross-link to the storage translation page from PR #2932 and to the resolver docs


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the storage stack: disk backends and the frontend-to-backend pipeline #2939

Document the storage stack: disk backends and the frontend-to-backend pipeline

Summary

What should be documented

The `DiskIo` trait and the `Disk` wrapper

Storage frontends

Concrete disk backends

Wrapping backends (decorators)

The layered disk model

Resolver integration

Online disk resize

RAM disk (`mem:<len>`)

Virtual optical / DVD

Where this belongs

What should be rustdoc vs Guide

Goals

Non-goals

Rough implementation plan

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Frontend	Protocol	Transport	Crate
NVMe	NVMe 2.0	PCI MMIO + MSI-X	`nvme/`
StorVSP	SCSI CDB over VMBus	VMBus ring buffers	`storvsp/`
IDE	ATA/ATAPI	PCI/ISA I/O ports + DMA	`ide/`

Backend	What it wraps	Crate
FileDisk	Host file (read-write-at)	`disk_file/`
Vhd1Disk	VHD1 fixed format	`disk_vhd1/`
VhdmpDisk	Windows VHD via vhdmp driver	`disk_vhdmp/`
BlobDisk	Remote HTTP/Azure blob (read-only)	`disk_blob/`
BlockDeviceDisk	Linux block device (io_uring)	`disk_blockdevice/`
NvmeDisk	Physical NVMe via VFIO	`disk_nvme/`
StripedDisk	Data striped across multiple backends	`disk_striped/`

Wrapper	Transform	Crate
CryptDisk	XTS-AES256 encryption	`disk_crypt/`
DelayDisk	Injected I/O latency	`disk_delay/`
DiskWithReservations	In-memory persistent reservations	`disk_prwrap/`

Frontend	Resize notification	How it works
NVMe	AEN (Async Event Notification)	Background task calls `disk.wait_resize()` per namespace; on change, completes a queued AER command with `CHANGED_NAMESPACE_LIST`
StorVSP/SCSI	UNIT_ATTENTION sense key	On the next SCSI command after a resize, `SimpleScsiDisk` detects the capacity change and returns UNIT_ATTENTION; guest retries and re-reads capacity
IDE	Not supported	IDE has no standardized capacity-change notification

Backend	`wait_resize`	How
`disk_blockdevice`	✅ Event-driven	Linux uevent listener for block device resize events
`disk_nvme`	✅ Event-driven	NVMe driver monitors AENs from the physical controller; rescans namespace to detect capacity changes
`disk_file`	❌ Default (pending)	No file-change monitoring
`disk_vhd1`	❌ Default (pending)	Fixed-size format
`disk_blob`	❌ Default (pending)	Remote blob, no resize
`disk_layered`	✅ Delegates	Delegates to bottom-most layer
Wrappers (crypt, delay, prwrap)	✅ Delegates	Forward to inner disk

Frontend	DVD support
StorVSP/SCSI	✅ Via `SimpleScsiDvd`
IDE	✅ Via `AtapiDrive` wrapping `SimpleScsiDvd` through ATAPI
NVMe	❌ Explicitly rejected (`"dvd not supported with nvme"`)

Content	Location
`DiskIo` trait semantics, method contracts, scatter-gather model	Rustdoc on `disk_backend`
`wait_resize` method contract and default behavior	Rustdoc on `disk_backend`
`LayerIo` trait, bitmap semantics, layer configuration	Rustdoc on `disk_layered`
Per-backend implementation notes (e.g., VHD1 format details, io_uring usage)	Rustdoc on each backend crate
Architecture overview: how frontends, the SCSI adapter, and backends connect	Guide page
Data flow diagrams (guest I/O → frontend → SCSI → DiskIo → backend)	Guide page
Layered disk model explanation with examples	Guide page
RAM disk vs memdiff CLI semantics	Guide page
Online disk resize: which backends support it, how frontends notify the guest	Guide page
Virtual optical / DVD: `GuestMedia` model, eject, frontend support matrix	Guide page
Backend catalog (what exists, when to use each)	Guide page
Wrapping/decorator pattern explanation	Guide page
Resolver integration for storage	Guide page

Document the storage stack: disk backends and the frontend-to-backend pipeline #2939

Description

Document the storage stack: disk backends and the frontend-to-backend pipeline

Summary

What should be documented

The DiskIo trait and the Disk wrapper

Storage frontends

Concrete disk backends

Wrapping backends (decorators)

The layered disk model

Resolver integration

Online disk resize

RAM disk (mem:<len>)

Virtual optical / DVD

Where this belongs

What should be rustdoc vs Guide

Goals

Non-goals

Rough implementation plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `DiskIo` trait and the `Disk` wrapper

RAM disk (`mem:<len>`)