Skip to content

Zero-intrusion OS runtime visualization #2

@zevorn

Description

@zevorn

Summary

Implement a visual observation system that renders guest OS component runtime state in real time, with zero modification to the guest OS. All data is extracted externally by the emulator inspecting guest CPU state, memory, page tables, and MMIO traffic.

Motivation

Traditional OS debugging requires instrumenting the kernel (printk, ftrace, kprobes, eBPF). This creates a fundamental problem: observation changes the observed system — timing shifts, memory layout changes, and some bugs become unreproducible.

Since machina controls the entire execution environment (CPU, memory, devices), it can observe everything from the outside. By loading the guest kernel's symbol table (vmlinux / System.map), machina can interpret raw memory as kernel data structures, providing a god-view of the OS without touching a single line of guest code.

This is uniquely valuable for:

  • OS course teaching: students see scheduler, page tables, and interrupts visually
  • Kernel debugging: observe lock contention, memory leaks, and races without printk
  • Security research: monitor privilege transitions and syscall patterns transparently
  • Performance analysis: identify hot paths and idle time without guest-side profiling

Design Principle: Zero Intrusion

All observation is performed by machina reading guest state from the emulator side:

+------------------------------------------+
|              Machina Emulator             |
|                                           |
|  +----------+  +---------+  +----------+ |      +----------------+
|  | CPU State |  | Guest   |  | Device   | |----->| Visualization  |
|  | (regs,   |  | Memory  |  | State    | |      | Dashboard      |
|  |  CSRs,   |  | (phys)  |  | (MMIO)   | |      | (TUI / Web)    |
|  |  priv)   |  |         |  |          | |      +----------------+
|  +----------+  +---------+  +----------+ |
|       ^             ^            ^        |
|       |             |            |        |
|  [ vmlinux symbol table loaded ]          |
|  [ struct offsets auto-derived ]          |
+------------------------------------------+
|                                           |
|         Guest OS (UNMODIFIED)             |
|                                           |
+-------------------------------------------+

The guest kernel binary is never patched. No agent runs inside the guest. No hypercalls are added.

Proposed Features

P0 — Foundation: Symbol-Driven Memory Introspection

  • vmlinux / System.map loader: Parse ELF symbols + DWARF info to resolve kernel addresses and struct field offsets
  • Guest virtual memory reader: Given a GVA, walk Sv39 page tables (from satp CSR) to read guest memory from the emulator side
  • Struct overlay engine: Given a kernel struct name + address, extract fields by offset (e.g., read task_struct->pid, task_struct->comm, task_struct->state from guest memory)

P1 — Process & Scheduler Visualization

  • Process list view: Walk the task_struct linked list in guest memory, display PID, name, state, priority — equivalent to ps but from outside
  • Scheduler timeline: Detect context switches (write to satp or current task pointer change), render a per-hart Gantt chart showing which process runs when
  • CPU mode distribution: Track time spent in M-mode / S-mode / U-mode / WFI-idle per hart, render as stacked bar or pie chart

P2 — Memory Visualization

  • Page table map: Walk Sv39 page table hierarchy from satp, render a visual memory map showing mapped regions with permissions (R/W/X/U) and physical backing
  • Physical memory heatmap: Color-code physical pages by access frequency (tracked from TLB refills and MMIO), identify hot/cold regions
  • Kernel/user space split: Visualize the virtual address space layout — kernel mappings, user mappings, MMIO regions, free pages

P3 — Interrupt & Trap Flow

  • Interrupt timeline: Log every trap entry (scause/mcause, sepc/mepc, privilege transition), render as a timeline with color-coded categories (timer, external, page fault, syscall)
  • Syscall heatmap: Detect ecall from U-mode, extract syscall number from a7, build frequency/latency heatmap
  • IRQ flow diagram: Trace interrupt delivery path: device assert → PLIC claim → CPU trap → handler → PLIC complete, render as a flow/sequence diagram
  • Exception drill-down: On page fault, automatically show faulting address, current page table state, and PTE chain

P4 — Device I/O Visualization

  • MMIO access log: All device register reads/writes pass through machina's AddressSpace — log with timestamp, device name, register offset, value, direction (R/W)
  • UART traffic view: Render serial I/O as a terminal-in-terminal, with RX/TX byte-level visibility
  • PLIC state dashboard: Real-time view of pending interrupts, priority, threshold, enable bits per context
  • ACLINT timer view: Show mtime, mtimecmp per hart, next timer deadline, time-until-fire

P5 — Advanced: Lock & Synchronization

  • Spinlock monitor: Given spinlock symbol addresses from vmlinux, monitor lock/unlock patterns, detect contention (spin count > threshold)
  • Semaphore/mutex state: Read wait queue lengths from kernel structs, detect deadlock candidates (cycle in wait-for graph)

Rendering Options

Option Pros Cons
TUI (ratatui) Zero dependency, inline with terminal Limited graphics
Web dashboard (WebSocket + HTML) Rich visualization, interactive Requires browser
Trace file (JSON/CTF) Post-mortem analysis, tool ecosystem Not real-time

Recommended: TUI for real-time monitoring + trace file export for post-mortem analysis. Web dashboard as optional upgrade.

Implementation Notes

  • Symbol loading should be lazy — only resolve addresses when first accessed
  • Page table walking reuses the existing Sv39 MMU code in machina-guest-riscv
  • Context switch detection can hook on satp CSR write (already trapped in privileged instruction handling)
  • MMIO logging can be added in AddressSpace::read/write with zero overhead when disabled (compile-time feature gate or runtime atomic flag)
  • Struct offset derivation from DWARF should be cached per vmlinux load

References

  • QEMU -d trace flags: in_asm, op, int, exec, cpu, mmu
  • QEMU QMP (QEMU Machine Protocol) for programmatic state queries
  • Linux kernel task_struct layout: include/linux/sched.h
  • rCore-Tutorial kernel structure: simpler task model, good starting target
  • ratatui: Rust TUI framework
  • Chrome Trace Event format: JSON trace format for timeline visualization

Relation to #1

This issue builds on the observability foundation proposed in #1 (GDB stub, CPU state inspection, instruction tracing). Specifically:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions