diff --git a/CLAUDE.md b/CLAUDE.md
index 0f8aebc7..d4b32612 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -95,7 +95,30 @@ The interactive timeline is implemented in JavaScript within `claude_code_log/te
## Architecture
-For detailed architecture documentation, see:
+Start with [dev-docs/application_model.md](dev-docs/application_model.md)
+— the entry point covering subsystems, data lifecycle, and a glossary,
+with pointers to the deep-dive docs:
+
- [dev-docs/rendering-architecture.md](dev-docs/rendering-architecture.md) - Data flow and rendering pipeline
- [dev-docs/messages.md](dev-docs/messages.md) - Message type reference
- [dev-docs/css-classes.md](dev-docs/css-classes.md) - CSS class combinations
+- [dev-docs/dag.md](dev-docs/dag.md) - DAG-based session/fork architecture
+- [dev-docs/agents.md](dev-docs/agents.md) - Sync/async/teammate agent integration
+- [dev-docs/teammates.md](dev-docs/teammates.md) - Teammates feature deep-dive
+- [dev-docs/message-hierarchy.md](dev-docs/message-hierarchy.md) - Fold/unfold state machine
+- [dev-docs/implementing-a-tool-renderer.md](dev-docs/implementing-a-tool-renderer.md) - How-to: add a new tool
+
+User-facing docs live in [docs/](docs/); plans and TODOs live in [work/](work/).
+
+### Keeping dev-docs/ in sync
+
+`dev-docs/` is **as-built reference** — the code is the authoritative
+source. When a non-trivial change alters behavior, structure, or
+invariants documented in a deep-dive, update the relevant page in
+the same commit (or as a prompt follow-up). If `dev-docs/` and the
+code disagree, the doc is wrong.
+
+Typical lifecycle: a feature begins as a spec in `work/`, evolves
+into a WIP scratchpad as the code adapts to reality, then graduates
+into `dev-docs/` (new page or merged into an existing one) once the
+implementation has stabilized.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 5de26693..80293143 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -50,7 +50,9 @@ claude_code_log/
scripts/ # Development utilities
test/test_data/ # Representative JSONL samples
-dev-docs/ # Architecture documentation
+dev-docs/ # Architecture / dev documentation (start in application_model.md)
+docs/ # User-facing operations docs
+work/ # Plans, TODOs, in-flight design docs
```
## Development Setup
@@ -199,7 +201,10 @@ The handler is installed in `cli.py` via `faulthandler.register(SIGUSR1)`. POSIX
## Architecture
-For detailed architecture documentation, see [dev-docs/rendering-architecture.md](dev-docs/rendering-architecture.md).
+Start with [dev-docs/application_model.md](dev-docs/application_model.md)
+for the system overview (subsystems, data lifecycle, glossary). For
+the rendering pipeline specifically, see
+[dev-docs/rendering-architecture.md](dev-docs/rendering-architecture.md).
### Data Flow Overview
diff --git a/claude_code_log/converter.py b/claude_code_log/converter.py
index 280f7518..da2911ab 100644
--- a/claude_code_log/converter.py
+++ b/claude_code_log/converter.py
@@ -2278,7 +2278,7 @@ def _print_archived_sessions_note(total_archived: int) -> None:
f"\nNote: {total_archived} archived session(s) found{cleanup_info}.\n"
" These sessions were cached before their JSONL files were deleted.\n"
" To restore them or adjust cleanup settings, see:\n"
- " https://github.com/daaain/claude-code-log/blob/main/dev-docs/restoring-archived-sessions.md"
+ " https://github.com/daaain/claude-code-log/blob/main/docs/restoring-archived-sessions.md"
)
diff --git a/dev-docs/agents.md b/dev-docs/agents.md
index 786f8a90..022ebc88 100644
--- a/dev-docs/agents.md
+++ b/dev-docs/agents.md
@@ -1,5 +1,7 @@
# Agents
+> See [application_model.md](application_model.md) for the system overview.
+
`claude-code-log` renders three flavors of Task-spawned agents:
| Flavor | Trigger | Reference |
diff --git a/dev-docs/application_model.md b/dev-docs/application_model.md
new file mode 100644
index 00000000..34b0f42f
--- /dev/null
+++ b/dev-docs/application_model.md
@@ -0,0 +1,448 @@
+# Application Model
+
+`claude-code-log` reads Claude Code transcript files (JSONL on disk) and
+produces readable HTML, Markdown, and structured JSON views, with
+optional caching, a TUI for navigation, and per-project aggregate
+pages.
+
+This document is the entry point for `dev-docs/`: a high-level view of
+the parts, what each does, and where to read about them in detail. For
+end-user documentation see the project [`README.md`](../README.md);
+for contributor onboarding see [`CONTRIBUTING.md`](../CONTRIBUTING.md);
+for user-facing operations docs see [`docs/`](../docs/).
+
+---
+
+## 1. Subsystems at a glance
+
+| Subsystem | Owner module(s) | Deep-dive |
+|---|---|---|
+| CLI | [`cli.py`](../claude_code_log/cli.py) | inlined below (§ 2.1) |
+| TUI | [`tui.py`](../claude_code_log/tui.py) | inlined below (§ 2.2) |
+| Cache (SQLite) | [`cache.py`](../claude_code_log/cache.py) + [`migrations/`](../claude_code_log/migrations/) | inlined below (§ 2.3); user-facing in [`docs/restoring-archived-sessions.md`](../docs/restoring-archived-sessions.md) |
+| Migrations | [`migrations/`](../claude_code_log/migrations/) + `migrations/runner.py` | inlined below (§ 2.4) |
+| Parsing | [`parser.py`](../claude_code_log/parser.py), [`factories/`](../claude_code_log/factories/) | [rendering-architecture.md § 3](rendering-architecture.md) |
+| Message taxonomy | [`models.py`](../claude_code_log/models.py) | [messages.md](messages.md) |
+| DAG (sessions, forks, agents) | [`dag.py`](../claude_code_log/dag.py) | [dag.md](dag.md) |
+| Sync sub-agents (#79) | [`converter.py`](../claude_code_log/converter.py), `factories/agent_metadata_factory.py` | [agents.md § 1](agents.md) |
+| Async task agents (#90) | `converter.py`, `factories/task_notification_factory.py` | [agents.md § 2](agents.md) |
+| Teammates (#91) | `renderer.py`, `factories/teammate_factory.py`, `html/teammate_formatter.py` | [teammates.md](teammates.md) |
+| Rendering pipeline | [`renderer.py`](../claude_code_log/renderer.py), `html/`, `markdown/`, `json/` | [rendering-architecture.md](rendering-architecture.md) |
+| Fold-bar / message hierarchy | `html/templates/components/`, JS in `transcript.html` | [message-hierarchy.md](message-hierarchy.md) |
+| CSS class taxonomy | `html/templates/components/*.css` | [css-classes.md](css-classes.md) |
+| JSON export (#36) | [`json/`](../claude_code_log/json/) | inlined below (§ 2.5) |
+| Detail-level filter | renderer.py § Detail-level filtering, `models.DetailLevel` | inlined below (§ 2.6) |
+| Image export | [`image_export.py`](../claude_code_log/image_export.py) | inlined below (§ 2.7) |
+| Performance profiling | [`renderer_timings.py`](../claude_code_log/renderer_timings.py) | inlined below (§ 2.8) |
+| Diagnosing hangs (SIGUSR1) | [`cli.py`](../claude_code_log/cli.py) `_install_stack_dump_signal` | inlined below (§ 2.9) |
+| Adding a new tool renderer | [`factories/tool_factory.py`](../claude_code_log/factories/tool_factory.py), `html/tool_formatters.py` | [implementing-a-tool-renderer.md](implementing-a-tool-renderer.md) (how-to) |
+
+A note on cross-cutting concerns: some behaviour spans several rows
+of the table above and isn't owned by any single subsystem. **Label
+and preview composition** (session header titles, branch labels,
+fork-point box captions) is the most common one — it touches the
+DAG layer (which decides what's a branch), the renderer's session
+machinery (which assembles the label text), and the parsing layer
+(which feeds the preview source). See the `SessionHeaderMessage`
+entry in § 4 for the function-level surface.
+
+---
+
+## 2. Subsystems without their own deep-dive
+
+The subsystems above with "inlined below" pointers don't have a
+dedicated dev-doc — the paragraph here is the canonical reference.
+
+### 2.1 CLI
+
+[`cli.py`](../claude_code_log/cli.py) is the command-line entry point
+(`claude-code-log`) built on Click. The default invocation processes
+the entire `~/.claude/projects/` hierarchy; explicit paths target a
+single transcript or directory. Major flags:
+
+- `--tui` — launch the interactive TUI (§ 2.2).
+- `--detail {full,high,low,minimal,user-only}` — drop content from
+ the rendered output (§ 2.6).
+- `--from-date "yesterday"`, `--to-date "today"` — natural-language
+ date filtering via `dateparser`.
+- `--open-browser` — open the generated `index.html` after rendering.
+- `--no-cache` / `--update-cache` — bypass or force-refresh the
+ SQLite cache (§ 2.3).
+- `--format {html,md,markdown,json}` — switch output format (HTML is
+ the default; Markdown is mainly used for sharing transcripts inline;
+ JSON exports the processed tree for downstream tooling — see § 2.5).
+- `--compact` — Markdown-only; suppresses repeated headings.
+- `--page-size N` — paginate the combined-transcript HTML/Markdown
+ output, packing whole sessions into pages of up to N messages each
+ (sessions are never split across pages, so individual pages may
+ overflow). Per-session HTML files are not paginated.
+
+CLI orchestration delegates to `converter.py` (which owns the
+high-level "load + render + write" flow) and never touches `renderer.py`
+directly. Output paths follow a stable convention so the cache and
+re-renders can find existing files: `combined_transcripts.html`,
+`session-{id}.html`, `index.html`, with `--detail` and `--compact`
+adding suffixes per `utils.variant_suffix`.
+
+### 2.2 TUI
+
+[`tui.py`](../claude_code_log/tui.py) is a Textual application that
+browses the projects index, drills into individual sessions, and
+exposes quick actions: render session to HTML, resume a session via
+`claude --resume`, archive a session (move to cache-only), and so on.
+
+Architecture is straightforward Textual: a few `Screen` subclasses,
+a `DataTable` for the session list, key bindings dispatched through
+Textual's `BINDINGS` mechanism. The TUI reads through `cache.py`
+exclusively (never re-parses JSONL itself) — opening a 50-project
+hierarchy takes milliseconds because cache hydration is incremental.
+
+The "archive" action is interesting: it moves a session's source JSONL
+out of `~/.claude/projects/` while keeping the cache row intact. The
+session then renders from cache only. See
+[`docs/restoring-archived-sessions.md`](../docs/restoring-archived-sessions.md)
+for the user-facing behaviour and recovery flow.
+
+### 2.3 Cache (SQLite)
+
+[`cache.py`](../claude_code_log/cache.py) maintains a SQLite database
+at `~/.claude/projects/claude-code-log-cache.db` (or
+`$CLAUDE_CODE_LOG_CACHE_PATH`). Stored data:
+
+- Per-session: id, summary, first/last timestamps, message count,
+ per-role token totals, `team_name` (added in migration 005).
+- Per-message: a denormalised view used by archived-session
+ restoration (the cache holds enough to re-render even after the
+ source JSONL is deleted).
+- Per-rendered-HTML: the HTML output itself, indexed by source file
+ mtime + detail-level + compact flag (migrations 002–004) — so
+ re-runs with unchanged inputs serve the cached HTML directly.
+
+Invalidation is mtime-based: when a JSONL's mtime is newer than its
+cache row, the session is reparsed. The schema-version row also
+invalidates the entire HTML cache when migrations bump the version,
+since rendered output may have changed even when source data hasn't.
+
+For the operations / recovery side (archived sessions, manual
+deletion, `cleanupPeriodDays`), see
+[`docs/restoring-archived-sessions.md`](../docs/restoring-archived-sessions.md).
+
+### 2.4 Migrations
+
+[`claude_code_log/migrations/`](../claude_code_log/migrations/) is a
+small migration system. Each migration is a `NNN_description.sql` file
+applied in numeric order by `migrations/runner.py`. The schema-version
+table tracks which migrations have run; `cache.py` invokes the runner
+on every connection open, so a fresh checkout running against an old
+cache DB transparently upgrades.
+
+Current migrations:
+
+- `001_initial_schema.sql` — sessions table + per-message metadata.
+- `002_html_cache.sql` — adds the rendered-HTML cache layer.
+- `003_html_pagination.sql` / `004_html_pagination_variant.sql` —
+ per-page HTML chunks for `--page-size`.
+- `005_session_team_name.sql` — adds `team_name` to sessions for the
+ teammates feature (PR #125).
+
+Recreating-tables migrations toggle `PRAGMA foreign_keys = OFF/ON`
+around the rebuild to avoid losing rows to cascade-deletes during the
+swap.
+
+### 2.5 JSON export
+
+[`claude_code_log/json/`](../claude_code_log/json/) is a thin renderer
+that mirrors `HtmlRenderer` / `MarkdownRenderer`: same
+`generate(...)` / `generate_session(...)` / `generate_projects_index(...)`
+surface, same `--detail` and `--compact` honoring. Output is a
+structured JSON document — top-level `version` / `title` / `detail` /
+`compact` / `sessions` / `messages` keys; each node carries
+`index` / `type` / `title` / `timestamp` / `session_id` / `content`,
+plus optional `parent_uuid` / `agent_id` / `pair_first` etc. when
+present. Children are nested directly under their parent's
+`children` array — it's the same tree the HTML/Markdown renderers
+walk, serialized verbatim.
+
+The renderer runs entries through `generate_template_messages` (the
+same format-neutral pipeline § 3 describes), so JSON output inherits
+**all** post-factory polishing for free: slash-command normalisation
+(bare `X` → `/X`), command-args
+hardening, teammate session-color enrichment, etc. There is no
+JSON-specific cleanup pass — the rule of thumb is: *if it shows up
+right in HTML/Markdown, it shows up right in JSON*. This is the
+operative example of the **factory-layer normalisation seam**: raw
+`TranscriptEntry` data is polished once at factory time into the
+typed `MessageContent` models that all three renderers share, so
+display polish lives in one place rather than being re-implemented
+per output format.
+
+A few JSON-specific touches:
+
+- `_json_default` unwraps Pydantic models embedded in `MessageContent`
+ dataclasses (tool inputs/outputs are Pydantic; `dataclasses.asdict`
+ doesn't recurse into them, so without this hook they'd stringify
+ via `__repr__` and lose structure). Also handles `Enum` and `Path`.
+- `is_outdated(file_path)` reads the `version` field from existing
+ JSON output and compares against the current library version —
+ same invalidation contract as the HTML cache so re-runs skip
+ unchanged outputs.
+- `combined_transcripts.json` per project; `session-{id}.json` for
+ individual sessions. The naming respects `variant_suffix` for
+ detail/compact variants.
+
+The projects-index JSON (`all-projects-summary.json`) is a parallel
+top-level file — same shape as HTML's `index.html` but consumable by
+external tools (dashboards, query scripts, `jq` pipelines).
+
+### 2.6 Detail-level filter
+
+The `--detail` flag (and `models.DetailLevel`) lets users dial down
+how much of the transcript renders:
+
+- `full` (default) — everything.
+- `high` — detailed but cleaned: drops system/hook noise while
+ keeping the full conversation and tool I/O.
+- `low` — drops most tool I/O, keeps the conversation plus a curated
+ set of "interaction signal" tools (WebSearch, WebFetch, Task, Agent —
+ the ones that show *what the agent did*, not *what it read*). See
+ `_LOW_KEEP_TOOLS` in [`renderer.py`](../claude_code_log/renderer.py).
+- `minimal` — drops all tool I/O.
+- `user-only` — drops everything except user messages and steering
+ (designed for feeding to downstream agents, e.g. building a
+ requirements doc).
+
+Filtering happens in two passes: a *pre-render* pass on `TranscriptEntry`
+that strips content items (e.g., tool_use blocks from assistant turns),
+and a *post-render* pass on `TemplateMessage` that drops whole content
+types created by factories (`BashInputMessage`, `BashOutputMessage`,
+`CommandOutputMessage` at low/minimal). The two-pass shape exists
+because some content is identifiable only after factory dispatch (e.g.,
+distinguishing `BashInputMessage` from the tool_use that produced it).
+
+Important interaction: `_filter_template_by_detail` runs **before**
+`_pair_skill_tool_uses` and other reorder passes, so paired-message
+indices need re-mapping (`_reindex_filtered_context`). The reindex
+pass also has to update cached parent-message references on
+`SessionHeaderMessage` (see PR #131 fix).
+
+### 2.7 Image export
+
+[`image_export.py`](../claude_code_log/image_export.py) is
+format-agnostic: HTML and Markdown both call into it. Three modes
+(matching the `--image-export-mode` CLI choices):
+
+- `placeholder` — drop the image and render a placeholder marker
+ in its place.
+- `embedded` — base64-encode the image directly into the output as
+ a data URL.
+- `referenced` — write the image to disk next to the output and
+ embed a `src=` reference.
+
+Default is `embedded` for HTML (single self-contained file) and
+`referenced` for Markdown (keeps the `.md` text small and lets
+images live as separate PNGs alongside).
+
+### 2.8 Performance profiling
+
+[`renderer_timings.py`](../claude_code_log/renderer_timings.py)
+provides `log_timing(label, t_start)` context managers used throughout
+`renderer.py`. Set `CLAUDE_CODE_LOG_DEBUG_TIMING=1` to print per-phase
+times to stderr — useful for spotting which phase regressed when a
+large transcript suddenly takes seconds longer than before.
+
+### 2.9 Diagnosing hangs (SIGUSR1 stack dump)
+
+When `claude-code-log` appears stuck (100% CPU, no output), a
+single `SIGUSR1` to the running process dumps the live Python
+stack of every thread to stderr without killing it:
+
+```bash
+# In another terminal
+kill -USR1 $(pgrep -f claude-code-log | head -1)
+```
+
+The handler is wired in `cli.py::_install_stack_dump_signal()` via
+`faulthandler.register(SIGUSR1, all_threads=True, chain=False)` and
+installed before any heavy work in the entry point. POSIX-only —
+Windows lacks `SIGUSR1`, the install is a silent no-op there. Unlike
+`py-spy`, this needs no root and no extra install, since the runtime
+is already wired to dump itself on demand. Added by PR #135 to make
+the DAG cyclic-children class of bug diagnosable in the field; useful
+for any future hang.
+
+---
+
+## 3. Data lifecycle
+
+```
+ ┌──────────────────┐
+ │ JSONL file(s) │
+ │ (~/.claude/...) │
+ └────────┬─────────┘
+ │
+ parser.py + factories/
+ │
+ ▼
+ ┌───────────────────────┐
+ │ list[TranscriptEntry] │ (typed Pydantic models)
+ └───────────┬───────────┘
+ │
+ factories/ dispatch
+ │
+ ▼
+ ┌─────────────────────────┐
+ │ list[TemplateMessage] │ (each carrying a typed
+ │ with MessageContent │ MessageContent variant)
+ └─────────────┬───────────┘
+ │
+ renderer.py (generate_template_messages):
+ build DAG → pair → reorder → relocate
+ subagent blocks → build hierarchy →
+ cleanup sidechain dups → populate caches
+ │
+ ▼
+ ┌──────────────────────┐
+ │ Tree of TemplateMsg │
+ │ + RenderingContext │ (caches: teammate_colors,
+ │ + nav data │ task_subjects, etc.)
+ └──────────┬───────────┘
+ │
+ ┌────────────┬─────────────┴─────────────┬────────────┐
+ ▼ ▼ ▼ ▼
+html/renderer.py markdown/renderer.py json/renderer.py
+ │ │ │
+ ▼ ▼ ▼
+ index.html + *.md combined_transcripts.json
+ session-*.html (single file) session-*.json
+ all-projects-summary.json
+ │ │ │
+ └──────────────────┼──────────────────────┘
+ │
+ ┌──────────┴────────────┐
+ ▼ ▼
+ cache.py image_export.py
+ (SQLite) (HTML / Markdown only —
+ JSON serialises paths)
+```
+
+Cache reads/writes happen *in parallel* with the main pipeline:
+`cache.py` is consulted before parsing (cache hit → skip parse), after
+rendering (write the rendered HTML), and during TUI navigation (the
+TUI never re-parses).
+
+---
+
+## 4. Cross-cutting glossary
+
+Terms that appear across multiple subsystems — defined once here.
+
+- **TranscriptEntry**: typed Pydantic model for a single line in the
+ source JSONL. Variants: `User`, `Assistant`, `Summary`, `System`,
+ `Passthrough`, `QueueOperation`. See
+ [`parser.py`](../claude_code_log/parser.py) and
+ [`models.py`](../claude_code_log/models.py).
+
+- **MessageContent**: render-time content variant produced by the
+ factories from `TranscriptEntry`. Many flavours
+ (`UserTextMessage`, `ToolUseMessage`, `TeammateMessage`, …). One
+ `TranscriptEntry` may yield multiple `MessageContent`s (a single
+ assistant turn with N tool_uses produces N+1 messages). See
+ [messages.md](messages.md) for the full taxonomy.
+
+- **TemplateMessage**: the render-time wrapper around a
+ `MessageContent`. Carries `message_index`, parent/child links,
+ pair_first/pair_middle/pair_last, ancestry, and the renderer-format
+ CSS classes. Defined in [`renderer.py`](../claude_code_log/renderer.py).
+
+- **RenderingContext**: mutable cache attached to one render pass.
+ Holds the message registry plus nested per-session caches
+ (`teammate_colors`, `task_subjects`, `task_id_for_tool_use`,
+ `session_first_message`, etc.). Caches are session-scoped because
+ combined-transcripts mode merges multiple sessions and per-session
+ identifiers (teammate_id, task_id) aren't globally unique.
+
+- **session_id**: the JSONL's `sessionId` field. Often a UUID string.
+ In some renderer paths a *synthetic* form is used:
+ - `{trunk}#agent-{agentId}` for sub-agent transcripts (so they
+ form a separate DAG-line attached to their spawning trunk).
+ - `{trunk}@{first_uuid_prefix}` for branch sessions (rewinds /
+ parallel-tool_use forks). See [dag.md](dag.md).
+
+- **render_session_id**: the session id that should be used when
+ walking `ctx.messages` to find content for rendering, accounting
+ for synthetic rewrites.
+
+- **sidechain**: a sub-agent's transcript entries are flagged
+ `isSidechain: true`. The DAG layer integrates them into the parent
+ session's tree under the spawning Task/Agent tool_use anchor. See
+ [agents.md](agents.md), [dag.md](dag.md).
+
+- **agent_id**: identifier copied from a Task/Agent tool_result
+ (either `toolUseResult.agentId` or parsed from the Markdown
+ metadata tail). Used to stitch sub-agent JSONL files into the
+ trunk DAG. See [agents.md](agents.md).
+
+- **fork point** / **branch**: when a session has multiple children
+ with the same parent, the parent is the fork point and each child
+ initiates a branch. Real forks come from `/exit` rewinds; spurious
+ forks (parallel tool_uses, structural-only siblings) are collapsed
+ by `_walk_session_with_forks`. See [dag.md](dag.md).
+
+- **SessionHeaderMessage**: the synthetic content type produced for
+ every session boundary in the rendered output — the header that
+ appears above each session's first real message. Two flavours:
+ *trunk* headers for top-level sessions, and *branch* headers for
+ fork branches (the "branch heading" you'll see referenced in bug
+ reports). The branch header's title is composed by `_branch_label`
+ and back-filled by `_enrich_branch_titles` (both in `renderer.py`)
+ in the shape `Branch • • `; the preview text
+ itself is built by `create_session_preview` in `utils.py` (which
+ calls `simplify_command_tags` to strip raw `` XML
+ soup down to `/cmd`). When troubleshooting branch-heading
+ rendering, those four functions are the surface area.
+
+- **pair_first / pair_middle / pair_last**: a pair of messages
+ rendered as one logical unit (tool_use + tool_result, Slash + UserSlash,
+ thinking + assistant). `pair_middle` exists for triples — currently
+ the slash-command `(UserSlash → Slash → CommandOutput)` shape.
+
+- **detail level**: see § 2.6.
+
+- **detail-aware tools**: the curated set of tools whose I/O survives
+ `--detail low` because they convey *what the agent did*, not *what
+ it read* (`WebSearch`, `WebFetch`, `Task`, `Agent`).
+
+- **passthrough**: a `PassthroughTranscriptEntry` is a non-conversation
+ entry (hook callbacks, progress updates, last-prompt markers). The
+ DAG layer keeps them in the structure but the renderer typically
+ hides them.
+
+---
+
+## 5. Where to start reading
+
+Common entry questions and their best first stop:
+
+- "How does a JSONL line become an HTML row?"
+ → [rendering-architecture.md](rendering-architecture.md).
+- "Why are forks rendered weirdly / what is a branch session?"
+ → [dag.md](dag.md).
+- "What message types exist and what do they look like?"
+ → [messages.md](messages.md) plus the samples in `messages/`.
+- "I want to add support for a new Claude Code tool."
+ → [implementing-a-tool-renderer.md](implementing-a-tool-renderer.md).
+- "How does folding / collapsible content work?"
+ → [message-hierarchy.md](message-hierarchy.md).
+- "What CSS classes does a message div get?"
+ → [css-classes.md](css-classes.md).
+- "How are sub-agent transcripts (sync, async, teammates) integrated?"
+ → [agents.md](agents.md), then [teammates.md](teammates.md) for the
+ teammates-specific machinery.
+- "I want to extend the cache / change the schema."
+ → § 2.3, § 2.4 here, then read the migration files in order.
+- "How do I export to JSON for downstream tooling?"
+ → § 2.5 here (and `--format json` from § 2.1).
+- "claude-code-log is hung — how do I see what it's doing?"
+ → § 2.9 (`SIGUSR1` stack dump).
+- "What's planned but not implemented?"
+ → [`work/`](../work/) — each `.md` is an in-flight or proposed plan.
diff --git a/dev-docs/css-classes.md b/dev-docs/css-classes.md
index 1001b78c..ce305c20 100755
--- a/dev-docs/css-classes.md
+++ b/dev-docs/css-classes.md
@@ -1,5 +1,7 @@
# CSS Classes for Message Types
+> See [application_model.md](application_model.md) for the system overview.
+
This document provides a comprehensive reference for CSS class combinations used in Claude Code Log HTML output, their CSS rule support status, and pairing behavior.
**Generated from analysis of:** 29 session HTML files (3,244 message elements)
diff --git a/dev-docs/dag.md b/dev-docs/dag.md
index f85b6fb8..596b4d80 100644
--- a/dev-docs/dag.md
+++ b/dev-docs/dag.md
@@ -1,5 +1,7 @@
# DAG-Based Message Architecture
+> See [application_model.md](application_model.md) for the system overview.
+
Replaces timestamp-based ordering with `parentUuid` → `uuid` graph traversal.
Reference: [Messages as Commits: Claude Code's Git-Like DAG of Conversations](https://piebald.ai/blog/messages-as-commits-claude-codes-git-like-dag-of-conversations)
@@ -96,6 +98,14 @@ Where `s1`, `s2`, `s3` are synthesized session header messages.
- **Backlinks** on session headers: "Continues from message X in Session Y"
(shown on `s2` and `s3`)
+> Where branch / session header *titles* (the `Branch • •
+> ` text) are assembled is a renderer concern, not a DAG
+> concern. See the `SessionHeaderMessage` glossary entry in
+> [application_model.md](application_model.md#4-cross-cutting-glossary)
+> for the four functions involved (`_branch_label`,
+> `_enrich_branch_titles`, `create_session_preview`,
+> `simplify_command_tags`).
+
#### Current: `d-{index}` anchors (combined transcript only)
Backlinks use `#msg-d-{N}` anchors which are sequential indices assigned
@@ -178,8 +188,28 @@ is available.
1. Parse all entries, index by `uuid`
2. For duplicate `uuid`s, keep the one from the earliest `sessionId`
-3. Build `children_by_uuid` from `parentUuid` links
-4. Group messages by `sessionId`
+3. Group messages by `sessionId`
+4. `build_dag(nodes, sidechain_uuids)` populates `children_uuids` —
+ in three steps that **must run in this order** (PR #135):
+
+ ```mermaid
+ flowchart TB
+ A["entries indexed by uuid
(parent_uuid pointers may
dangle or cycle)"] --> S1
+ S1["Step 1 — orphan promotion
parent_uuid not in nodes →
null it; warn unless the
parent is a known sidechain
uuid (silently promote)"] --> S2
+ S2["Step 2 — cycle break
walk parent_uuid from each
node; revisit ⇒ null the
revisited node's parent;
warn"] --> S3
+ S3["Step 3 — children build
for each node with non-null
parent_uuid, append to
parent.children_uuids;
skip self-loops, dedup"] --> O["acyclic parent→children DAG
safe to walk"]
+ classDef step fill:#eef,stroke:#99c
+ class S1,S2,S3 step
+ ```
+
+ Steps 1 and 2 mutate `parent_uuid` on the input nodes (they're
+ one-way: a promoted-to-root node can't recover its dangling
+ parent later). Step 3 is the only step that builds the
+ `children_uuids` lists. Doing children first would propagate
+ any cyclic edge into the children graph, and downstream walks
+ via `children_uuids` would loop forever — so cycles must be
+ broken at the parent-pointer layer before children are
+ materialised.
### Phase 3: Extract Session DAG-lines
@@ -198,6 +228,16 @@ For each session (`extract_session_dag_lines` in `dag.py`):
5. If DAG walk coverage is incomplete, fall back to a timestamp sort for
the whole session.
+**Defence-in-depth in the walker** (PR #135): even though `build_dag`
+breaks parent-pointer cycles before populating `children_uuids`, a
+future bug or hand-edited fixture could reintroduce a cyclic edge
+*after* DAG construction. `_walk_session_with_forks` keeps a
+`walk_visited: set[str]` across the whole queue-driven walk; if a
+uuid is visited twice, the chain is truncated at that point and a
+warning is logged. The build-time cycle break and this walk-time
+guard together rule out the unbounded-loop class of hangs that
+motivated the PR.
+
### Phase 4: Build Session Tree
1. For each session, find where its DAG-line attaches to the DAG:
@@ -575,7 +615,11 @@ These should be checked at runtime (log warnings, don't crash):
produce multiple roots within one `sessionId`; all are walked and the
trunks are merged. Other multi-root causes warn (may indicate missing
parent data).
-3. **DAG acyclicity**: No cycles in `parentUuid` chains
+3. **DAG acyclicity**: `build_dag` walks each node's `parent_uuid`
+ chain and nulls the first revisited node's parent if a cycle is
+ detected (warns and promotes that node to root). The DAG seen by
+ downstream walks is always acyclic; `_walk_session_with_forks`
+ adds a `walk_visited` belt for defence-in-depth.
4. **Unique ownership**: After deduplication, each `uuid` belongs to
exactly one session
5. **Agent parenting**: Every top-level agent transcript has an identifiable
@@ -655,4 +699,4 @@ validate DAG construction against known transcripts.
- [rendering-architecture.md](rendering-architecture.md) — Current pipeline
- [messages.md](messages.md) — Message type reference
-- [rendering-next.md](rendering-next.md) — Future rendering improvements
+- [../work/rendering-next.md](../work/rendering-next.md) — Future rendering improvements
diff --git a/dev-docs/implementing-a-tool-renderer.md b/dev-docs/implementing-a-tool-renderer.md
index 45972fc8..7d464b3f 100644
--- a/dev-docs/implementing-a-tool-renderer.md
+++ b/dev-docs/implementing-a-tool-renderer.md
@@ -1,5 +1,7 @@
# Implementing a Tool Renderer
+> See [application_model.md](application_model.md) for the system overview.
+
This guide walks through adding rendering support for a new Claude Code tool, using WebSearch as an example.
## Overview
@@ -11,6 +13,14 @@ Tool rendering involves several components working together:
3. **HTML Formatters** (`html/tool_formatters.py`) - HTML rendering functions
4. **Renderers** - Integration with HTML and Markdown renderers
+JSON output (`json/renderer.py`, since PR #36) needs **no per-tool
+integration**: it serialises whatever typed input/output models the
+factory produced via `dataclasses.asdict` (with a `_json_default`
+shim for Pydantic models embedded inside the dataclasses). Add the
+models in Step 1 and the factory hooks in Steps 2–3, and your tool
+shows up in JSON exports automatically. The HTML/Markdown formatter
+work in Steps 4–5 stays format-specific.
+
## Step 1: Define Models
### Tool Input Model
@@ -253,6 +263,13 @@ Create test cases in the appropriate test files:
2. **Formatter tests** - Verify HTML/Markdown output is correct
3. **Integration tests** - Verify end-to-end rendering
+JSON output is exercised by the broader `test/test_json_rendering.py`
+/ `test/test_json_real_projects.py` suites; per-tool JSON output
+typically needs no dedicated test because the `dataclasses.asdict`
+serialisation is trivial. Add a JSON-specific case only if your tool
+embeds a non-dataclass type the `_json_default` shim doesn't already
+cover.
+
## Checklist
- [ ] Add input model to `models.py`
diff --git a/dev-docs/FOLD_STATE_DIAGRAM.md b/dev-docs/message-hierarchy.md
similarity index 99%
rename from dev-docs/FOLD_STATE_DIAGRAM.md
rename to dev-docs/message-hierarchy.md
index 3d261f71..50cb6906 100644
--- a/dev-docs/FOLD_STATE_DIAGRAM.md
+++ b/dev-docs/message-hierarchy.md
@@ -1,4 +1,6 @@
-# Fold Bar State Diagram
+# Message Hierarchy and Fold State
+
+> See [application_model.md](application_model.md) for the system overview.
## Message Hierarchy
diff --git a/dev-docs/messages.md b/dev-docs/messages.md
index b0ef2f42..d5b2969c 100644
--- a/dev-docs/messages.md
+++ b/dev-docs/messages.md
@@ -1,5 +1,7 @@
# Message Types in Claude Code Transcripts
+> See [application_model.md](application_model.md) for the system overview.
+
This document describes all message types found in Claude Code JSONL transcript files and their corresponding output representations. The goal is to define an **intermediate representation** that captures the logical message structure independent of HTML rendering.
## Overview
@@ -930,4 +932,4 @@ Sub-agent messages (from `Task` tool):
- [system_factory.py](../claude_code_log/factories/system_factory.py) - `create_system_message()`
- [meta_factory.py](../claude_code_log/factories/meta_factory.py) - `create_meta()`
- [rendering-architecture.md](rendering-architecture.md) - Rendering pipeline and Renderer class hierarchy
-- [rendering-next.md](rendering-next.md) - Future rendering improvements
+- [../work/rendering-next.md](../work/rendering-next.md) - Future rendering improvements
diff --git a/dev-docs/rendering-architecture.md b/dev-docs/rendering-architecture.md
index 1deadf20..dc6800ad 100644
--- a/dev-docs/rendering-architecture.md
+++ b/dev-docs/rendering-architecture.md
@@ -1,11 +1,13 @@
# Rendering Architecture
-This document describes how Claude Code transcript data flows from raw JSONL entries to final output (HTML, Markdown). The architecture separates concerns into distinct layers:
+> See [application_model.md](application_model.md) for the system overview.
+
+This document describes how Claude Code transcript data flows from raw JSONL entries to final output (HTML, Markdown, JSON). The architecture separates concerns into distinct layers:
1. **Parsing Layer** - Raw JSONL to typed transcript entries
2. **Factory Layer** - Transcript entries to `MessageContent` models
3. **Rendering Layer** - Format-neutral tree building and relationship processing
-4. **Output Layer** - Format-specific rendering (HTML, Markdown)
+4. **Output Layer** - Format-specific rendering (HTML, Markdown, JSON)
---
@@ -16,15 +18,27 @@ JSONL File
↓ (parser.py)
list[TranscriptEntry]
↓ (factories/)
-list[TemplateMessage] with MessageContent
+list[TemplateMessage] with MessageContent ← factory-layer
+ normalisation seam
+ (raw → display-polished)
↓ (renderer.py: generate_template_messages)
Tree of TemplateMessage (roots with children)
+ RenderingContext (message registry)
+ Session navigation data
- ↓ (html/renderer.py or markdown/renderer.py)
-Final output (HTML or Markdown)
+ ↓ (html/renderer.py | markdown/renderer.py | json/renderer.py)
+Final output (HTML, Markdown, or JSON)
```
+**The factory-layer seam matters**: any cleanup that should appear
+in *every* output format (slash-command normalisation, command-args
+hardening, teammate session-color enrichment, etc.) lives at factory
+time, in the typed `MessageContent` models. The three renderers are
+pure consumers of the polished tree — they never re-implement
+display polish per format. As a corollary, when a new output format
+is added (JSON shipped this way in PR #36), it inherits all polish
+for free as long as it consumes `generate_template_messages`'
+output.
+
**Key cardinality rules**:
- Each transcript entry has a `uuid`, but a single entry's `list[ContentItem]` may be chunked and produce multiple `MessageContent` objects (e.g., tool_use items are split into separate messages)
- Each `MessageContent` gets exactly one `TemplateMessage` wrapper
@@ -278,6 +292,20 @@ def title_ToolUseMessage(self, content: ToolUseMessage, message: TemplateMessage
- Writes directly to file/string without templates
- Simpler structure suited to plain text output
+**JsonRenderer** ([json/renderer.py](../claude_code_log/json/renderer.py)):
+- Doesn't implement `format_*` per content type — instead serialises
+ the entire `TemplateMessage` subtree via `dataclasses.asdict` plus
+ a small `_json_default` shim for the Pydantic models embedded in
+ tool inputs/outputs (and for `Enum`/`Path`).
+- Calls `title_content(msg)` to attach a per-node title that mirrors
+ what HTML/Markdown surface — the only place dispatcher methods are
+ reused.
+- Output is a single JSON document per session (or per combined
+ transcript / projects index) with the message tree nested directly
+ under each node's `children` array. See [application_model.md
+ § 2.5](application_model.md#25-json-export) for the payload shape
+ and inheritance from the factory-layer normalisation seam.
+
---
## 8. HTML Formatter Organization
@@ -333,9 +361,16 @@ Note that `meta.uuid` is the original transcript entry's UUID. Since a single en
### Separation of Concerns
- **models.py**: Pure data structures, no rendering logic
-- **factories/**: Data transformation, no I/O
+- **factories/**: Data transformation, no I/O. **The
+ normalisation seam** — display polish for *all* output formats
+ lives here, not in renderers (e.g. `simplify_command_tags` lifting
+ bare `X` to `/X`, with the same fix
+ applied to both `simplify_command_tags` and
+ `create_slash_command_message` so HTML/Markdown/JSON observe a
+ single shape).
- **renderer.py**: Format-neutral processing (pairing, hierarchy, tree)
-- **html/**, **markdown/**: Format-specific output generation
+- **html/**, **markdown/**, **json/**: Format-specific output generation,
+ consuming the polished tree without re-implementing display rules.
---
@@ -343,5 +378,5 @@ Note that `meta.uuid` is the original transcript entry's UUID. Since a single en
- [messages.md](messages.md) - Complete message type reference
- [css-classes.md](css-classes.md) - CSS class combinations and rules
-- [FOLD_STATE_DIAGRAM.md](FOLD_STATE_DIAGRAM.md) - Fold/unfold state machine
+- [message-hierarchy.md](message-hierarchy.md) - Fold/unfold state machine
- [dag.md](dag.md) - DAG-based message architecture (replaces timestamp-based ordering)
diff --git a/dev-docs/teammates.md b/dev-docs/teammates.md
index 2431ea54..366a4fb8 100644
--- a/dev-docs/teammates.md
+++ b/dev-docs/teammates.md
@@ -1,5 +1,7 @@
# Teammates Support
+> See [application_model.md](application_model.md) for the system overview.
+
This document describes how `claude-code-log` supports the Claude Code
teammates feature (research preview, gated by
`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`, available in CC 2.1.32+).
diff --git a/dev-docs/restoring-archived-sessions.md b/docs/restoring-archived-sessions.md
similarity index 100%
rename from dev-docs/restoring-archived-sessions.md
rename to docs/restoring-archived-sessions.md
diff --git a/work/phase-c-agent-transcripts.md b/work/phase-c-agent-transcripts.md
deleted file mode 100644
index 47ef30e6..00000000
--- a/work/phase-c-agent-transcripts.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# Phase C: Agent Transcript Rework
-
-## Status: Steps 1-2 Complete (DAG Integration)
-
-## What Changed
-
-### Step 1: Agent Data Shapes (Analysis Complete)
-
-Key findings from real data analysis:
-
-- **Agent entries share `sessionId`** with their parent session
-- All agent entries have `isSidechain: true` and `agentId`
-- First entry always has `parentUuid: null` (top-level agents)
-- Internal `parentUuid` chains form the same fork patterns as main sessions
- (tool-result side-branches)
-- `agentId` reference in main session: either entry-level `agentId` (old Task
- tool) or `toolUseResult.agentId` (new Agent tool, copied to entry level by
- converter.py parsing code)
-
-### Step 2: DAG-Level Agent Integration (Implemented)
-
-**`converter.py` — `_integrate_agent_entries()`**:
-1. Builds `agentId -> anchor_uuid` map from main-session entries with `agentId`
-2. For each sidechain entry: assigns synthetic `sessionId`
- (`{sessionId}#agent-{agentId}`) so agents form separate DAG-lines
-3. Parents root entries (`parentUuid=None`) to the anchor UUID
-
-**Effect**: Agent entries are included in the DAG. The existing DAG machinery
-(build_dag, extract_session_dag_lines, build_session_tree, traverse_session_tree)
-handles them as child sessions of the main session, spliced at the anchor point.
-
-**Key constraint**: `entry.sessionId` on disk / in cache is NEVER mutated.
-The synthetic ID is only assigned in-memory during `load_directory_transcripts()`.
-
-### Renderer Changes
-
-- Agent sessions (`#agent-` in session_id) **don't get session headers**
-- Agent messages use parent session's `render_session_id` for correct grouping
- in `_reorder_session_template_messages()`
-- Agent sessions excluded from session navigation and individual file generation
-
-### What Was Kept
-
-- `_cleanup_sidechain_duplicates()` — still needed for Task tool dedup
- (first user message = Task input, last assistant = Task output).
- This is content-level dedup that can't be handled at the DAG level.
-- `sidechain_uuids` parameter in `build_dag()` — still needed for unloaded
- subagent files (e.g. aprompt_suggestion agents never referenced via agentId)
-
-### What Was Removed (Step 4)
-
-- `_reorder_sidechain_template_messages()` — removed. With DAG integration,
- agent messages are already in correct order via DAG traversal. Single-file
- mode now also calls `_integrate_agent_entries()` so both paths use DAG-based
- ordering.
-
-## Remaining Steps
-
-### Step 3: Session Tree Integration (Partially Done)
-
-Agent DAG-lines already appear as child sessions in the tree. The
-`traverse_session_tree()` naturally visits them at the junction point.
-What's left:
-- Verify rendering hierarchy (levels 4/5) works correctly for all cases
-- Test with projects that have nested agents (agent spawning sub-agents)
-
-### Step 4: Rendering Cleanup (Done)
-
-- Removed `_reorder_sidechain_template_messages()` — no longer needed with
- DAG-based ordering. Added `_integrate_agent_entries()` to single-file mode
- in `converter.py` so both code paths use consistent DAG integration.
-- `_cleanup_sidechain_duplicates()` — kept as-is. Content-level dedup
- (Task input/output duplicated in sidechain) cannot be handled at the DAG
- level since it requires text comparison, not structural ordering.
-
-### Step 5: Agent Tool Renderer (separate PR, `dev/user-sidechain`)
-
-- Specialized rendering for Agent tool_use/tool_result (like old Task tool had)
-- Sidechain user messages rendered as markdown (already on `dev/user-sidechain`)
-
-## Test Coverage
-
-4 new integration tests in `TestAgentDagIntegration`:
-- `test_agent_entries_parented_to_anchor` — agent root gets parentUuid to anchor
-- `test_agent_session_in_tree` — synthetic session created, tree structure correct
-- `test_agent_no_session_header` — no session header generated for agents
-- `test_multiple_agents_ordered` — multiple agents placed at respective anchors
diff --git a/dev-docs/rendering-next.md b/work/rendering-next.md
similarity index 96%
rename from dev-docs/rendering-next.md
rename to work/rendering-next.md
index 9896ebbf..cf0d9e95 100644
--- a/dev-docs/rendering-next.md
+++ b/work/rendering-next.md
@@ -149,5 +149,5 @@ Syntax highlighting is a significant portion of render time. Could cache highlig
## Related Documentation
-- [rendering-architecture.md](rendering-architecture.md) - Current architecture
-- [FOLD_STATE_DIAGRAM.md](FOLD_STATE_DIAGRAM.md) - Fold/unfold state machine
+- [dev-docs/rendering-architecture.md](../dev-docs/rendering-architecture.md) - Current architecture
+- [dev-docs/message-hierarchy.md](../dev-docs/message-hierarchy.md) - Fold/unfold state machine