diff --git a/CLAUDE.md b/CLAUDE.md index 0f8aebc7..d4b32612 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -95,7 +95,30 @@ The interactive timeline is implemented in JavaScript within `claude_code_log/te ## Architecture -For detailed architecture documentation, see: +Start with [dev-docs/application_model.md](dev-docs/application_model.md) +— the entry point covering subsystems, data lifecycle, and a glossary, +with pointers to the deep-dive docs: + - [dev-docs/rendering-architecture.md](dev-docs/rendering-architecture.md) - Data flow and rendering pipeline - [dev-docs/messages.md](dev-docs/messages.md) - Message type reference - [dev-docs/css-classes.md](dev-docs/css-classes.md) - CSS class combinations +- [dev-docs/dag.md](dev-docs/dag.md) - DAG-based session/fork architecture +- [dev-docs/agents.md](dev-docs/agents.md) - Sync/async/teammate agent integration +- [dev-docs/teammates.md](dev-docs/teammates.md) - Teammates feature deep-dive +- [dev-docs/message-hierarchy.md](dev-docs/message-hierarchy.md) - Fold/unfold state machine +- [dev-docs/implementing-a-tool-renderer.md](dev-docs/implementing-a-tool-renderer.md) - How-to: add a new tool + +User-facing docs live in [docs/](docs/); plans and TODOs live in [work/](work/). + +### Keeping dev-docs/ in sync + +`dev-docs/` is **as-built reference** — the code is the authoritative +source. When a non-trivial change alters behavior, structure, or +invariants documented in a deep-dive, update the relevant page in +the same commit (or as a prompt follow-up). If `dev-docs/` and the +code disagree, the doc is wrong. + +Typical lifecycle: a feature begins as a spec in `work/`, evolves +into a WIP scratchpad as the code adapts to reality, then graduates +into `dev-docs/` (new page or merged into an existing one) once the +implementation has stabilized. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 5de26693..80293143 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -50,7 +50,9 @@ claude_code_log/ scripts/ # Development utilities test/test_data/ # Representative JSONL samples -dev-docs/ # Architecture documentation +dev-docs/ # Architecture / dev documentation (start in application_model.md) +docs/ # User-facing operations docs +work/ # Plans, TODOs, in-flight design docs ``` ## Development Setup @@ -199,7 +201,10 @@ The handler is installed in `cli.py` via `faulthandler.register(SIGUSR1)`. POSIX ## Architecture -For detailed architecture documentation, see [dev-docs/rendering-architecture.md](dev-docs/rendering-architecture.md). +Start with [dev-docs/application_model.md](dev-docs/application_model.md) +for the system overview (subsystems, data lifecycle, glossary). For +the rendering pipeline specifically, see +[dev-docs/rendering-architecture.md](dev-docs/rendering-architecture.md). ### Data Flow Overview diff --git a/claude_code_log/converter.py b/claude_code_log/converter.py index 280f7518..da2911ab 100644 --- a/claude_code_log/converter.py +++ b/claude_code_log/converter.py @@ -2278,7 +2278,7 @@ def _print_archived_sessions_note(total_archived: int) -> None: f"\nNote: {total_archived} archived session(s) found{cleanup_info}.\n" " These sessions were cached before their JSONL files were deleted.\n" " To restore them or adjust cleanup settings, see:\n" - " https://github.com/daaain/claude-code-log/blob/main/dev-docs/restoring-archived-sessions.md" + " https://github.com/daaain/claude-code-log/blob/main/docs/restoring-archived-sessions.md" ) diff --git a/dev-docs/agents.md b/dev-docs/agents.md index 786f8a90..022ebc88 100644 --- a/dev-docs/agents.md +++ b/dev-docs/agents.md @@ -1,5 +1,7 @@ # Agents +> See [application_model.md](application_model.md) for the system overview. + `claude-code-log` renders three flavors of Task-spawned agents: | Flavor | Trigger | Reference | diff --git a/dev-docs/application_model.md b/dev-docs/application_model.md new file mode 100644 index 00000000..34b0f42f --- /dev/null +++ b/dev-docs/application_model.md @@ -0,0 +1,448 @@ +# Application Model + +`claude-code-log` reads Claude Code transcript files (JSONL on disk) and +produces readable HTML, Markdown, and structured JSON views, with +optional caching, a TUI for navigation, and per-project aggregate +pages. + +This document is the entry point for `dev-docs/`: a high-level view of +the parts, what each does, and where to read about them in detail. For +end-user documentation see the project [`README.md`](../README.md); +for contributor onboarding see [`CONTRIBUTING.md`](../CONTRIBUTING.md); +for user-facing operations docs see [`docs/`](../docs/). + +--- + +## 1. Subsystems at a glance + +| Subsystem | Owner module(s) | Deep-dive | +|---|---|---| +| CLI | [`cli.py`](../claude_code_log/cli.py) | inlined below (§ 2.1) | +| TUI | [`tui.py`](../claude_code_log/tui.py) | inlined below (§ 2.2) | +| Cache (SQLite) | [`cache.py`](../claude_code_log/cache.py) + [`migrations/`](../claude_code_log/migrations/) | inlined below (§ 2.3); user-facing in [`docs/restoring-archived-sessions.md`](../docs/restoring-archived-sessions.md) | +| Migrations | [`migrations/`](../claude_code_log/migrations/) + `migrations/runner.py` | inlined below (§ 2.4) | +| Parsing | [`parser.py`](../claude_code_log/parser.py), [`factories/`](../claude_code_log/factories/) | [rendering-architecture.md § 3](rendering-architecture.md) | +| Message taxonomy | [`models.py`](../claude_code_log/models.py) | [messages.md](messages.md) | +| DAG (sessions, forks, agents) | [`dag.py`](../claude_code_log/dag.py) | [dag.md](dag.md) | +| Sync sub-agents (#79) | [`converter.py`](../claude_code_log/converter.py), `factories/agent_metadata_factory.py` | [agents.md § 1](agents.md) | +| Async task agents (#90) | `converter.py`, `factories/task_notification_factory.py` | [agents.md § 2](agents.md) | +| Teammates (#91) | `renderer.py`, `factories/teammate_factory.py`, `html/teammate_formatter.py` | [teammates.md](teammates.md) | +| Rendering pipeline | [`renderer.py`](../claude_code_log/renderer.py), `html/`, `markdown/`, `json/` | [rendering-architecture.md](rendering-architecture.md) | +| Fold-bar / message hierarchy | `html/templates/components/`, JS in `transcript.html` | [message-hierarchy.md](message-hierarchy.md) | +| CSS class taxonomy | `html/templates/components/*.css` | [css-classes.md](css-classes.md) | +| JSON export (#36) | [`json/`](../claude_code_log/json/) | inlined below (§ 2.5) | +| Detail-level filter | renderer.py § Detail-level filtering, `models.DetailLevel` | inlined below (§ 2.6) | +| Image export | [`image_export.py`](../claude_code_log/image_export.py) | inlined below (§ 2.7) | +| Performance profiling | [`renderer_timings.py`](../claude_code_log/renderer_timings.py) | inlined below (§ 2.8) | +| Diagnosing hangs (SIGUSR1) | [`cli.py`](../claude_code_log/cli.py) `_install_stack_dump_signal` | inlined below (§ 2.9) | +| Adding a new tool renderer | [`factories/tool_factory.py`](../claude_code_log/factories/tool_factory.py), `html/tool_formatters.py` | [implementing-a-tool-renderer.md](implementing-a-tool-renderer.md) (how-to) | + +A note on cross-cutting concerns: some behaviour spans several rows +of the table above and isn't owned by any single subsystem. **Label +and preview composition** (session header titles, branch labels, +fork-point box captions) is the most common one — it touches the +DAG layer (which decides what's a branch), the renderer's session +machinery (which assembles the label text), and the parsing layer +(which feeds the preview source). See the `SessionHeaderMessage` +entry in § 4 for the function-level surface. + +--- + +## 2. Subsystems without their own deep-dive + +The subsystems above with "inlined below" pointers don't have a +dedicated dev-doc — the paragraph here is the canonical reference. + +### 2.1 CLI + +[`cli.py`](../claude_code_log/cli.py) is the command-line entry point +(`claude-code-log`) built on Click. The default invocation processes +the entire `~/.claude/projects/` hierarchy; explicit paths target a +single transcript or directory. Major flags: + +- `--tui` — launch the interactive TUI (§ 2.2). +- `--detail {full,high,low,minimal,user-only}` — drop content from + the rendered output (§ 2.6). +- `--from-date "yesterday"`, `--to-date "today"` — natural-language + date filtering via `dateparser`. +- `--open-browser` — open the generated `index.html` after rendering. +- `--no-cache` / `--update-cache` — bypass or force-refresh the + SQLite cache (§ 2.3). +- `--format {html,md,markdown,json}` — switch output format (HTML is + the default; Markdown is mainly used for sharing transcripts inline; + JSON exports the processed tree for downstream tooling — see § 2.5). +- `--compact` — Markdown-only; suppresses repeated headings. +- `--page-size N` — paginate the combined-transcript HTML/Markdown + output, packing whole sessions into pages of up to N messages each + (sessions are never split across pages, so individual pages may + overflow). Per-session HTML files are not paginated. + +CLI orchestration delegates to `converter.py` (which owns the +high-level "load + render + write" flow) and never touches `renderer.py` +directly. Output paths follow a stable convention so the cache and +re-renders can find existing files: `combined_transcripts.html`, +`session-{id}.html`, `index.html`, with `--detail` and `--compact` +adding suffixes per `utils.variant_suffix`. + +### 2.2 TUI + +[`tui.py`](../claude_code_log/tui.py) is a Textual application that +browses the projects index, drills into individual sessions, and +exposes quick actions: render session to HTML, resume a session via +`claude --resume`, archive a session (move to cache-only), and so on. + +Architecture is straightforward Textual: a few `Screen` subclasses, +a `DataTable` for the session list, key bindings dispatched through +Textual's `BINDINGS` mechanism. The TUI reads through `cache.py` +exclusively (never re-parses JSONL itself) — opening a 50-project +hierarchy takes milliseconds because cache hydration is incremental. + +The "archive" action is interesting: it moves a session's source JSONL +out of `~/.claude/projects/` while keeping the cache row intact. The +session then renders from cache only. See +[`docs/restoring-archived-sessions.md`](../docs/restoring-archived-sessions.md) +for the user-facing behaviour and recovery flow. + +### 2.3 Cache (SQLite) + +[`cache.py`](../claude_code_log/cache.py) maintains a SQLite database +at `~/.claude/projects/claude-code-log-cache.db` (or +`$CLAUDE_CODE_LOG_CACHE_PATH`). Stored data: + +- Per-session: id, summary, first/last timestamps, message count, + per-role token totals, `team_name` (added in migration 005). +- Per-message: a denormalised view used by archived-session + restoration (the cache holds enough to re-render even after the + source JSONL is deleted). +- Per-rendered-HTML: the HTML output itself, indexed by source file + mtime + detail-level + compact flag (migrations 002–004) — so + re-runs with unchanged inputs serve the cached HTML directly. + +Invalidation is mtime-based: when a JSONL's mtime is newer than its +cache row, the session is reparsed. The schema-version row also +invalidates the entire HTML cache when migrations bump the version, +since rendered output may have changed even when source data hasn't. + +For the operations / recovery side (archived sessions, manual +deletion, `cleanupPeriodDays`), see +[`docs/restoring-archived-sessions.md`](../docs/restoring-archived-sessions.md). + +### 2.4 Migrations + +[`claude_code_log/migrations/`](../claude_code_log/migrations/) is a +small migration system. Each migration is a `NNN_description.sql` file +applied in numeric order by `migrations/runner.py`. The schema-version +table tracks which migrations have run; `cache.py` invokes the runner +on every connection open, so a fresh checkout running against an old +cache DB transparently upgrades. + +Current migrations: + +- `001_initial_schema.sql` — sessions table + per-message metadata. +- `002_html_cache.sql` — adds the rendered-HTML cache layer. +- `003_html_pagination.sql` / `004_html_pagination_variant.sql` — + per-page HTML chunks for `--page-size`. +- `005_session_team_name.sql` — adds `team_name` to sessions for the + teammates feature (PR #125). + +Recreating-tables migrations toggle `PRAGMA foreign_keys = OFF/ON` +around the rebuild to avoid losing rows to cascade-deletes during the +swap. + +### 2.5 JSON export + +[`claude_code_log/json/`](../claude_code_log/json/) is a thin renderer +that mirrors `HtmlRenderer` / `MarkdownRenderer`: same +`generate(...)` / `generate_session(...)` / `generate_projects_index(...)` +surface, same `--detail` and `--compact` honoring. Output is a +structured JSON document — top-level `version` / `title` / `detail` / +`compact` / `sessions` / `messages` keys; each node carries +`index` / `type` / `title` / `timestamp` / `session_id` / `content`, +plus optional `parent_uuid` / `agent_id` / `pair_first` etc. when +present. Children are nested directly under their parent's +`children` array — it's the same tree the HTML/Markdown renderers +walk, serialized verbatim. + +The renderer runs entries through `generate_template_messages` (the +same format-neutral pipeline § 3 describes), so JSON output inherits +**all** post-factory polishing for free: slash-command normalisation +(bare `X` → `/X`), command-args +hardening, teammate session-color enrichment, etc. There is no +JSON-specific cleanup pass — the rule of thumb is: *if it shows up +right in HTML/Markdown, it shows up right in JSON*. This is the +operative example of the **factory-layer normalisation seam**: raw +`TranscriptEntry` data is polished once at factory time into the +typed `MessageContent` models that all three renderers share, so +display polish lives in one place rather than being re-implemented +per output format. + +A few JSON-specific touches: + +- `_json_default` unwraps Pydantic models embedded in `MessageContent` + dataclasses (tool inputs/outputs are Pydantic; `dataclasses.asdict` + doesn't recurse into them, so without this hook they'd stringify + via `__repr__` and lose structure). Also handles `Enum` and `Path`. +- `is_outdated(file_path)` reads the `version` field from existing + JSON output and compares against the current library version — + same invalidation contract as the HTML cache so re-runs skip + unchanged outputs. +- `combined_transcripts.json` per project; `session-{id}.json` for + individual sessions. The naming respects `variant_suffix` for + detail/compact variants. + +The projects-index JSON (`all-projects-summary.json`) is a parallel +top-level file — same shape as HTML's `index.html` but consumable by +external tools (dashboards, query scripts, `jq` pipelines). + +### 2.6 Detail-level filter + +The `--detail` flag (and `models.DetailLevel`) lets users dial down +how much of the transcript renders: + +- `full` (default) — everything. +- `high` — detailed but cleaned: drops system/hook noise while + keeping the full conversation and tool I/O. +- `low` — drops most tool I/O, keeps the conversation plus a curated + set of "interaction signal" tools (WebSearch, WebFetch, Task, Agent — + the ones that show *what the agent did*, not *what it read*). See + `_LOW_KEEP_TOOLS` in [`renderer.py`](../claude_code_log/renderer.py). +- `minimal` — drops all tool I/O. +- `user-only` — drops everything except user messages and steering + (designed for feeding to downstream agents, e.g. building a + requirements doc). + +Filtering happens in two passes: a *pre-render* pass on `TranscriptEntry` +that strips content items (e.g., tool_use blocks from assistant turns), +and a *post-render* pass on `TemplateMessage` that drops whole content +types created by factories (`BashInputMessage`, `BashOutputMessage`, +`CommandOutputMessage` at low/minimal). The two-pass shape exists +because some content is identifiable only after factory dispatch (e.g., +distinguishing `BashInputMessage` from the tool_use that produced it). + +Important interaction: `_filter_template_by_detail` runs **before** +`_pair_skill_tool_uses` and other reorder passes, so paired-message +indices need re-mapping (`_reindex_filtered_context`). The reindex +pass also has to update cached parent-message references on +`SessionHeaderMessage` (see PR #131 fix). + +### 2.7 Image export + +[`image_export.py`](../claude_code_log/image_export.py) is +format-agnostic: HTML and Markdown both call into it. Three modes +(matching the `--image-export-mode` CLI choices): + +- `placeholder` — drop the image and render a placeholder marker + in its place. +- `embedded` — base64-encode the image directly into the output as + a data URL. +- `referenced` — write the image to disk next to the output and + embed a `src=` reference. + +Default is `embedded` for HTML (single self-contained file) and +`referenced` for Markdown (keeps the `.md` text small and lets +images live as separate PNGs alongside). + +### 2.8 Performance profiling + +[`renderer_timings.py`](../claude_code_log/renderer_timings.py) +provides `log_timing(label, t_start)` context managers used throughout +`renderer.py`. Set `CLAUDE_CODE_LOG_DEBUG_TIMING=1` to print per-phase +times to stderr — useful for spotting which phase regressed when a +large transcript suddenly takes seconds longer than before. + +### 2.9 Diagnosing hangs (SIGUSR1 stack dump) + +When `claude-code-log` appears stuck (100% CPU, no output), a +single `SIGUSR1` to the running process dumps the live Python +stack of every thread to stderr without killing it: + +```bash +# In another terminal +kill -USR1 $(pgrep -f claude-code-log | head -1) +``` + +The handler is wired in `cli.py::_install_stack_dump_signal()` via +`faulthandler.register(SIGUSR1, all_threads=True, chain=False)` and +installed before any heavy work in the entry point. POSIX-only — +Windows lacks `SIGUSR1`, the install is a silent no-op there. Unlike +`py-spy`, this needs no root and no extra install, since the runtime +is already wired to dump itself on demand. Added by PR #135 to make +the DAG cyclic-children class of bug diagnosable in the field; useful +for any future hang. + +--- + +## 3. Data lifecycle + +``` + ┌──────────────────┐ + │ JSONL file(s) │ + │ (~/.claude/...) │ + └────────┬─────────┘ + │ + parser.py + factories/ + │ + ▼ + ┌───────────────────────┐ + │ list[TranscriptEntry] │ (typed Pydantic models) + └───────────┬───────────┘ + │ + factories/ dispatch + │ + ▼ + ┌─────────────────────────┐ + │ list[TemplateMessage] │ (each carrying a typed + │ with MessageContent │ MessageContent variant) + └─────────────┬───────────┘ + │ + renderer.py (generate_template_messages): + build DAG → pair → reorder → relocate + subagent blocks → build hierarchy → + cleanup sidechain dups → populate caches + │ + ▼ + ┌──────────────────────┐ + │ Tree of TemplateMsg │ + │ + RenderingContext │ (caches: teammate_colors, + │ + nav data │ task_subjects, etc.) + └──────────┬───────────┘ + │ + ┌────────────┬─────────────┴─────────────┬────────────┐ + ▼ ▼ ▼ ▼ +html/renderer.py markdown/renderer.py json/renderer.py + │ │ │ + ▼ ▼ ▼ + index.html + *.md combined_transcripts.json + session-*.html (single file) session-*.json + all-projects-summary.json + │ │ │ + └──────────────────┼──────────────────────┘ + │ + ┌──────────┴────────────┐ + ▼ ▼ + cache.py image_export.py + (SQLite) (HTML / Markdown only — + JSON serialises paths) +``` + +Cache reads/writes happen *in parallel* with the main pipeline: +`cache.py` is consulted before parsing (cache hit → skip parse), after +rendering (write the rendered HTML), and during TUI navigation (the +TUI never re-parses). + +--- + +## 4. Cross-cutting glossary + +Terms that appear across multiple subsystems — defined once here. + +- **TranscriptEntry**: typed Pydantic model for a single line in the + source JSONL. Variants: `User`, `Assistant`, `Summary`, `System`, + `Passthrough`, `QueueOperation`. See + [`parser.py`](../claude_code_log/parser.py) and + [`models.py`](../claude_code_log/models.py). + +- **MessageContent**: render-time content variant produced by the + factories from `TranscriptEntry`. Many flavours + (`UserTextMessage`, `ToolUseMessage`, `TeammateMessage`, …). One + `TranscriptEntry` may yield multiple `MessageContent`s (a single + assistant turn with N tool_uses produces N+1 messages). See + [messages.md](messages.md) for the full taxonomy. + +- **TemplateMessage**: the render-time wrapper around a + `MessageContent`. Carries `message_index`, parent/child links, + pair_first/pair_middle/pair_last, ancestry, and the renderer-format + CSS classes. Defined in [`renderer.py`](../claude_code_log/renderer.py). + +- **RenderingContext**: mutable cache attached to one render pass. + Holds the message registry plus nested per-session caches + (`teammate_colors`, `task_subjects`, `task_id_for_tool_use`, + `session_first_message`, etc.). Caches are session-scoped because + combined-transcripts mode merges multiple sessions and per-session + identifiers (teammate_id, task_id) aren't globally unique. + +- **session_id**: the JSONL's `sessionId` field. Often a UUID string. + In some renderer paths a *synthetic* form is used: + - `{trunk}#agent-{agentId}` for sub-agent transcripts (so they + form a separate DAG-line attached to their spawning trunk). + - `{trunk}@{first_uuid_prefix}` for branch sessions (rewinds / + parallel-tool_use forks). See [dag.md](dag.md). + +- **render_session_id**: the session id that should be used when + walking `ctx.messages` to find content for rendering, accounting + for synthetic rewrites. + +- **sidechain**: a sub-agent's transcript entries are flagged + `isSidechain: true`. The DAG layer integrates them into the parent + session's tree under the spawning Task/Agent tool_use anchor. See + [agents.md](agents.md), [dag.md](dag.md). + +- **agent_id**: identifier copied from a Task/Agent tool_result + (either `toolUseResult.agentId` or parsed from the Markdown + metadata tail). Used to stitch sub-agent JSONL files into the + trunk DAG. See [agents.md](agents.md). + +- **fork point** / **branch**: when a session has multiple children + with the same parent, the parent is the fork point and each child + initiates a branch. Real forks come from `/exit` rewinds; spurious + forks (parallel tool_uses, structural-only siblings) are collapsed + by `_walk_session_with_forks`. See [dag.md](dag.md). + +- **SessionHeaderMessage**: the synthetic content type produced for + every session boundary in the rendered output — the header that + appears above each session's first real message. Two flavours: + *trunk* headers for top-level sessions, and *branch* headers for + fork branches (the "branch heading" you'll see referenced in bug + reports). The branch header's title is composed by `_branch_label` + and back-filled by `_enrich_branch_titles` (both in `renderer.py`) + in the shape `Branch • `; the preview text + itself is built by `create_session_preview` in `utils.py` (which + calls `simplify_command_tags` to strip raw `` XML + soup down to `/cmd`). When troubleshooting branch-heading + rendering, those four functions are the surface area. + +- **pair_first / pair_middle / pair_last**: a pair of messages + rendered as one logical unit (tool_use + tool_result, Slash + UserSlash, + thinking + assistant). `pair_middle` exists for triples — currently + the slash-command `(UserSlash → Slash → CommandOutput)` shape. + +- **detail level**: see § 2.6. + +- **detail-aware tools**: the curated set of tools whose I/O survives + `--detail low` because they convey *what the agent did*, not *what + it read* (`WebSearch`, `WebFetch`, `Task`, `Agent`). + +- **passthrough**: a `PassthroughTranscriptEntry` is a non-conversation + entry (hook callbacks, progress updates, last-prompt markers). The + DAG layer keeps them in the structure but the renderer typically + hides them. + +--- + +## 5. Where to start reading + +Common entry questions and their best first stop: + +- "How does a JSONL line become an HTML row?" + → [rendering-architecture.md](rendering-architecture.md). +- "Why are forks rendered weirdly / what is a branch session?" + → [dag.md](dag.md). +- "What message types exist and what do they look like?" + → [messages.md](messages.md) plus the samples in `messages/`. +- "I want to add support for a new Claude Code tool." + → [implementing-a-tool-renderer.md](implementing-a-tool-renderer.md). +- "How does folding / collapsible content work?" + → [message-hierarchy.md](message-hierarchy.md). +- "What CSS classes does a message div get?" + → [css-classes.md](css-classes.md). +- "How are sub-agent transcripts (sync, async, teammates) integrated?" + → [agents.md](agents.md), then [teammates.md](teammates.md) for the + teammates-specific machinery. +- "I want to extend the cache / change the schema." + → § 2.3, § 2.4 here, then read the migration files in order. +- "How do I export to JSON for downstream tooling?" + → § 2.5 here (and `--format json` from § 2.1). +- "claude-code-log is hung — how do I see what it's doing?" + → § 2.9 (`SIGUSR1` stack dump). +- "What's planned but not implemented?" + → [`work/`](../work/) — each `.md` is an in-flight or proposed plan. diff --git a/dev-docs/css-classes.md b/dev-docs/css-classes.md index 1001b78c..ce305c20 100755 --- a/dev-docs/css-classes.md +++ b/dev-docs/css-classes.md @@ -1,5 +1,7 @@ # CSS Classes for Message Types +> See [application_model.md](application_model.md) for the system overview. + This document provides a comprehensive reference for CSS class combinations used in Claude Code Log HTML output, their CSS rule support status, and pairing behavior. **Generated from analysis of:** 29 session HTML files (3,244 message elements) diff --git a/dev-docs/dag.md b/dev-docs/dag.md index f85b6fb8..596b4d80 100644 --- a/dev-docs/dag.md +++ b/dev-docs/dag.md @@ -1,5 +1,7 @@ # DAG-Based Message Architecture +> See [application_model.md](application_model.md) for the system overview. + Replaces timestamp-based ordering with `parentUuid` → `uuid` graph traversal. Reference: [Messages as Commits: Claude Code's Git-Like DAG of Conversations](https://piebald.ai/blog/messages-as-commits-claude-codes-git-like-dag-of-conversations) @@ -96,6 +98,14 @@ Where `s1`, `s2`, `s3` are synthesized session header messages. - **Backlinks** on session headers: "Continues from message X in Session Y" (shown on `s2` and `s3`) +> Where branch / session header *titles* (the `Branch • • +> ` text) are assembled is a renderer concern, not a DAG +> concern. See the `SessionHeaderMessage` glossary entry in +> [application_model.md](application_model.md#4-cross-cutting-glossary) +> for the four functions involved (`_branch_label`, +> `_enrich_branch_titles`, `create_session_preview`, +> `simplify_command_tags`). + #### Current: `d-{index}` anchors (combined transcript only) Backlinks use `#msg-d-{N}` anchors which are sequential indices assigned @@ -178,8 +188,28 @@ is available. 1. Parse all entries, index by `uuid` 2. For duplicate `uuid`s, keep the one from the earliest `sessionId` -3. Build `children_by_uuid` from `parentUuid` links -4. Group messages by `sessionId` +3. Group messages by `sessionId` +4. `build_dag(nodes, sidechain_uuids)` populates `children_uuids` — + in three steps that **must run in this order** (PR #135): + + ```mermaid + flowchart TB + A["entries indexed by uuid
(parent_uuid pointers may
dangle or cycle)"] --> S1 + S1["Step 1 — orphan promotion
parent_uuid not in nodes →
null it; warn unless the
parent is a known sidechain
uuid (silently promote)"] --> S2 + S2["Step 2 — cycle break
walk parent_uuid from each
node; revisit ⇒ null the
revisited node's parent;
warn"] --> S3 + S3["Step 3 — children build
for each node with non-null
parent_uuid, append to
parent.children_uuids;
skip self-loops, dedup"] --> O["acyclic parent→children DAG
safe to walk"] + classDef step fill:#eef,stroke:#99c + class S1,S2,S3 step + ``` + + Steps 1 and 2 mutate `parent_uuid` on the input nodes (they're + one-way: a promoted-to-root node can't recover its dangling + parent later). Step 3 is the only step that builds the + `children_uuids` lists. Doing children first would propagate + any cyclic edge into the children graph, and downstream walks + via `children_uuids` would loop forever — so cycles must be + broken at the parent-pointer layer before children are + materialised. ### Phase 3: Extract Session DAG-lines @@ -198,6 +228,16 @@ For each session (`extract_session_dag_lines` in `dag.py`): 5. If DAG walk coverage is incomplete, fall back to a timestamp sort for the whole session. +**Defence-in-depth in the walker** (PR #135): even though `build_dag` +breaks parent-pointer cycles before populating `children_uuids`, a +future bug or hand-edited fixture could reintroduce a cyclic edge +*after* DAG construction. `_walk_session_with_forks` keeps a +`walk_visited: set[str]` across the whole queue-driven walk; if a +uuid is visited twice, the chain is truncated at that point and a +warning is logged. The build-time cycle break and this walk-time +guard together rule out the unbounded-loop class of hangs that +motivated the PR. + ### Phase 4: Build Session Tree 1. For each session, find where its DAG-line attaches to the DAG: @@ -575,7 +615,11 @@ These should be checked at runtime (log warnings, don't crash): produce multiple roots within one `sessionId`; all are walked and the trunks are merged. Other multi-root causes warn (may indicate missing parent data). -3. **DAG acyclicity**: No cycles in `parentUuid` chains +3. **DAG acyclicity**: `build_dag` walks each node's `parent_uuid` + chain and nulls the first revisited node's parent if a cycle is + detected (warns and promotes that node to root). The DAG seen by + downstream walks is always acyclic; `_walk_session_with_forks` + adds a `walk_visited` belt for defence-in-depth. 4. **Unique ownership**: After deduplication, each `uuid` belongs to exactly one session 5. **Agent parenting**: Every top-level agent transcript has an identifiable @@ -655,4 +699,4 @@ validate DAG construction against known transcripts. - [rendering-architecture.md](rendering-architecture.md) — Current pipeline - [messages.md](messages.md) — Message type reference -- [rendering-next.md](rendering-next.md) — Future rendering improvements +- [../work/rendering-next.md](../work/rendering-next.md) — Future rendering improvements diff --git a/dev-docs/implementing-a-tool-renderer.md b/dev-docs/implementing-a-tool-renderer.md index 45972fc8..7d464b3f 100644 --- a/dev-docs/implementing-a-tool-renderer.md +++ b/dev-docs/implementing-a-tool-renderer.md @@ -1,5 +1,7 @@ # Implementing a Tool Renderer +> See [application_model.md](application_model.md) for the system overview. + This guide walks through adding rendering support for a new Claude Code tool, using WebSearch as an example. ## Overview @@ -11,6 +13,14 @@ Tool rendering involves several components working together: 3. **HTML Formatters** (`html/tool_formatters.py`) - HTML rendering functions 4. **Renderers** - Integration with HTML and Markdown renderers +JSON output (`json/renderer.py`, since PR #36) needs **no per-tool +integration**: it serialises whatever typed input/output models the +factory produced via `dataclasses.asdict` (with a `_json_default` +shim for Pydantic models embedded inside the dataclasses). Add the +models in Step 1 and the factory hooks in Steps 2–3, and your tool +shows up in JSON exports automatically. The HTML/Markdown formatter +work in Steps 4–5 stays format-specific. + ## Step 1: Define Models ### Tool Input Model @@ -253,6 +263,13 @@ Create test cases in the appropriate test files: 2. **Formatter tests** - Verify HTML/Markdown output is correct 3. **Integration tests** - Verify end-to-end rendering +JSON output is exercised by the broader `test/test_json_rendering.py` +/ `test/test_json_real_projects.py` suites; per-tool JSON output +typically needs no dedicated test because the `dataclasses.asdict` +serialisation is trivial. Add a JSON-specific case only if your tool +embeds a non-dataclass type the `_json_default` shim doesn't already +cover. + ## Checklist - [ ] Add input model to `models.py` diff --git a/dev-docs/FOLD_STATE_DIAGRAM.md b/dev-docs/message-hierarchy.md similarity index 99% rename from dev-docs/FOLD_STATE_DIAGRAM.md rename to dev-docs/message-hierarchy.md index 3d261f71..50cb6906 100644 --- a/dev-docs/FOLD_STATE_DIAGRAM.md +++ b/dev-docs/message-hierarchy.md @@ -1,4 +1,6 @@ -# Fold Bar State Diagram +# Message Hierarchy and Fold State + +> See [application_model.md](application_model.md) for the system overview. ## Message Hierarchy diff --git a/dev-docs/messages.md b/dev-docs/messages.md index b0ef2f42..d5b2969c 100644 --- a/dev-docs/messages.md +++ b/dev-docs/messages.md @@ -1,5 +1,7 @@ # Message Types in Claude Code Transcripts +> See [application_model.md](application_model.md) for the system overview. + This document describes all message types found in Claude Code JSONL transcript files and their corresponding output representations. The goal is to define an **intermediate representation** that captures the logical message structure independent of HTML rendering. ## Overview @@ -930,4 +932,4 @@ Sub-agent messages (from `Task` tool): - [system_factory.py](../claude_code_log/factories/system_factory.py) - `create_system_message()` - [meta_factory.py](../claude_code_log/factories/meta_factory.py) - `create_meta()` - [rendering-architecture.md](rendering-architecture.md) - Rendering pipeline and Renderer class hierarchy -- [rendering-next.md](rendering-next.md) - Future rendering improvements +- [../work/rendering-next.md](../work/rendering-next.md) - Future rendering improvements diff --git a/dev-docs/rendering-architecture.md b/dev-docs/rendering-architecture.md index 1deadf20..dc6800ad 100644 --- a/dev-docs/rendering-architecture.md +++ b/dev-docs/rendering-architecture.md @@ -1,11 +1,13 @@ # Rendering Architecture -This document describes how Claude Code transcript data flows from raw JSONL entries to final output (HTML, Markdown). The architecture separates concerns into distinct layers: +> See [application_model.md](application_model.md) for the system overview. + +This document describes how Claude Code transcript data flows from raw JSONL entries to final output (HTML, Markdown, JSON). The architecture separates concerns into distinct layers: 1. **Parsing Layer** - Raw JSONL to typed transcript entries 2. **Factory Layer** - Transcript entries to `MessageContent` models 3. **Rendering Layer** - Format-neutral tree building and relationship processing -4. **Output Layer** - Format-specific rendering (HTML, Markdown) +4. **Output Layer** - Format-specific rendering (HTML, Markdown, JSON) --- @@ -16,15 +18,27 @@ JSONL File ↓ (parser.py) list[TranscriptEntry] ↓ (factories/) -list[TemplateMessage] with MessageContent +list[TemplateMessage] with MessageContent ← factory-layer + normalisation seam + (raw → display-polished) ↓ (renderer.py: generate_template_messages) Tree of TemplateMessage (roots with children) + RenderingContext (message registry) + Session navigation data - ↓ (html/renderer.py or markdown/renderer.py) -Final output (HTML or Markdown) + ↓ (html/renderer.py | markdown/renderer.py | json/renderer.py) +Final output (HTML, Markdown, or JSON) ``` +**The factory-layer seam matters**: any cleanup that should appear +in *every* output format (slash-command normalisation, command-args +hardening, teammate session-color enrichment, etc.) lives at factory +time, in the typed `MessageContent` models. The three renderers are +pure consumers of the polished tree — they never re-implement +display polish per format. As a corollary, when a new output format +is added (JSON shipped this way in PR #36), it inherits all polish +for free as long as it consumes `generate_template_messages`' +output. + **Key cardinality rules**: - Each transcript entry has a `uuid`, but a single entry's `list[ContentItem]` may be chunked and produce multiple `MessageContent` objects (e.g., tool_use items are split into separate messages) - Each `MessageContent` gets exactly one `TemplateMessage` wrapper @@ -278,6 +292,20 @@ def title_ToolUseMessage(self, content: ToolUseMessage, message: TemplateMessage - Writes directly to file/string without templates - Simpler structure suited to plain text output +**JsonRenderer** ([json/renderer.py](../claude_code_log/json/renderer.py)): +- Doesn't implement `format_*` per content type — instead serialises + the entire `TemplateMessage` subtree via `dataclasses.asdict` plus + a small `_json_default` shim for the Pydantic models embedded in + tool inputs/outputs (and for `Enum`/`Path`). +- Calls `title_content(msg)` to attach a per-node title that mirrors + what HTML/Markdown surface — the only place dispatcher methods are + reused. +- Output is a single JSON document per session (or per combined + transcript / projects index) with the message tree nested directly + under each node's `children` array. See [application_model.md + § 2.5](application_model.md#25-json-export) for the payload shape + and inheritance from the factory-layer normalisation seam. + --- ## 8. HTML Formatter Organization @@ -333,9 +361,16 @@ Note that `meta.uuid` is the original transcript entry's UUID. Since a single en ### Separation of Concerns - **models.py**: Pure data structures, no rendering logic -- **factories/**: Data transformation, no I/O +- **factories/**: Data transformation, no I/O. **The + normalisation seam** — display polish for *all* output formats + lives here, not in renderers (e.g. `simplify_command_tags` lifting + bare `X` to `/X`, with the same fix + applied to both `simplify_command_tags` and + `create_slash_command_message` so HTML/Markdown/JSON observe a + single shape). - **renderer.py**: Format-neutral processing (pairing, hierarchy, tree) -- **html/**, **markdown/**: Format-specific output generation +- **html/**, **markdown/**, **json/**: Format-specific output generation, + consuming the polished tree without re-implementing display rules. --- @@ -343,5 +378,5 @@ Note that `meta.uuid` is the original transcript entry's UUID. Since a single en - [messages.md](messages.md) - Complete message type reference - [css-classes.md](css-classes.md) - CSS class combinations and rules -- [FOLD_STATE_DIAGRAM.md](FOLD_STATE_DIAGRAM.md) - Fold/unfold state machine +- [message-hierarchy.md](message-hierarchy.md) - Fold/unfold state machine - [dag.md](dag.md) - DAG-based message architecture (replaces timestamp-based ordering) diff --git a/dev-docs/teammates.md b/dev-docs/teammates.md index 2431ea54..366a4fb8 100644 --- a/dev-docs/teammates.md +++ b/dev-docs/teammates.md @@ -1,5 +1,7 @@ # Teammates Support +> See [application_model.md](application_model.md) for the system overview. + This document describes how `claude-code-log` supports the Claude Code teammates feature (research preview, gated by `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`, available in CC 2.1.32+). diff --git a/dev-docs/restoring-archived-sessions.md b/docs/restoring-archived-sessions.md similarity index 100% rename from dev-docs/restoring-archived-sessions.md rename to docs/restoring-archived-sessions.md diff --git a/work/phase-c-agent-transcripts.md b/work/phase-c-agent-transcripts.md deleted file mode 100644 index 47ef30e6..00000000 --- a/work/phase-c-agent-transcripts.md +++ /dev/null @@ -1,87 +0,0 @@ -# Phase C: Agent Transcript Rework - -## Status: Steps 1-2 Complete (DAG Integration) - -## What Changed - -### Step 1: Agent Data Shapes (Analysis Complete) - -Key findings from real data analysis: - -- **Agent entries share `sessionId`** with their parent session -- All agent entries have `isSidechain: true` and `agentId` -- First entry always has `parentUuid: null` (top-level agents) -- Internal `parentUuid` chains form the same fork patterns as main sessions - (tool-result side-branches) -- `agentId` reference in main session: either entry-level `agentId` (old Task - tool) or `toolUseResult.agentId` (new Agent tool, copied to entry level by - converter.py parsing code) - -### Step 2: DAG-Level Agent Integration (Implemented) - -**`converter.py` — `_integrate_agent_entries()`**: -1. Builds `agentId -> anchor_uuid` map from main-session entries with `agentId` -2. For each sidechain entry: assigns synthetic `sessionId` - (`{sessionId}#agent-{agentId}`) so agents form separate DAG-lines -3. Parents root entries (`parentUuid=None`) to the anchor UUID - -**Effect**: Agent entries are included in the DAG. The existing DAG machinery -(build_dag, extract_session_dag_lines, build_session_tree, traverse_session_tree) -handles them as child sessions of the main session, spliced at the anchor point. - -**Key constraint**: `entry.sessionId` on disk / in cache is NEVER mutated. -The synthetic ID is only assigned in-memory during `load_directory_transcripts()`. - -### Renderer Changes - -- Agent sessions (`#agent-` in session_id) **don't get session headers** -- Agent messages use parent session's `render_session_id` for correct grouping - in `_reorder_session_template_messages()` -- Agent sessions excluded from session navigation and individual file generation - -### What Was Kept - -- `_cleanup_sidechain_duplicates()` — still needed for Task tool dedup - (first user message = Task input, last assistant = Task output). - This is content-level dedup that can't be handled at the DAG level. -- `sidechain_uuids` parameter in `build_dag()` — still needed for unloaded - subagent files (e.g. aprompt_suggestion agents never referenced via agentId) - -### What Was Removed (Step 4) - -- `_reorder_sidechain_template_messages()` — removed. With DAG integration, - agent messages are already in correct order via DAG traversal. Single-file - mode now also calls `_integrate_agent_entries()` so both paths use DAG-based - ordering. - -## Remaining Steps - -### Step 3: Session Tree Integration (Partially Done) - -Agent DAG-lines already appear as child sessions in the tree. The -`traverse_session_tree()` naturally visits them at the junction point. -What's left: -- Verify rendering hierarchy (levels 4/5) works correctly for all cases -- Test with projects that have nested agents (agent spawning sub-agents) - -### Step 4: Rendering Cleanup (Done) - -- Removed `_reorder_sidechain_template_messages()` — no longer needed with - DAG-based ordering. Added `_integrate_agent_entries()` to single-file mode - in `converter.py` so both code paths use consistent DAG integration. -- `_cleanup_sidechain_duplicates()` — kept as-is. Content-level dedup - (Task input/output duplicated in sidechain) cannot be handled at the DAG - level since it requires text comparison, not structural ordering. - -### Step 5: Agent Tool Renderer (separate PR, `dev/user-sidechain`) - -- Specialized rendering for Agent tool_use/tool_result (like old Task tool had) -- Sidechain user messages rendered as markdown (already on `dev/user-sidechain`) - -## Test Coverage - -4 new integration tests in `TestAgentDagIntegration`: -- `test_agent_entries_parented_to_anchor` — agent root gets parentUuid to anchor -- `test_agent_session_in_tree` — synthetic session created, tree structure correct -- `test_agent_no_session_header` — no session header generated for agents -- `test_multiple_agents_ordered` — multiple agents placed at respective anchors diff --git a/dev-docs/rendering-next.md b/work/rendering-next.md similarity index 96% rename from dev-docs/rendering-next.md rename to work/rendering-next.md index 9896ebbf..cf0d9e95 100644 --- a/dev-docs/rendering-next.md +++ b/work/rendering-next.md @@ -149,5 +149,5 @@ Syntax highlighting is a significant portion of render time. Could cache highlig ## Related Documentation -- [rendering-architecture.md](rendering-architecture.md) - Current architecture -- [FOLD_STATE_DIAGRAM.md](FOLD_STATE_DIAGRAM.md) - Fold/unfold state machine +- [dev-docs/rendering-architecture.md](../dev-docs/rendering-architecture.md) - Current architecture +- [dev-docs/message-hierarchy.md](../dev-docs/message-hierarchy.md) - Fold/unfold state machine