Skip to content

Integrate agent transcripts into the DAG (Phase C)#99

Merged
cboos merged 14 commits intomainfrom
dev/dag-agents
Apr 17, 2026
Merged

Integrate agent transcripts into the DAG (Phase C)#99
cboos merged 14 commits intomainfrom
dev/dag-agents

Conversation

@cboos
Copy link
Copy Markdown
Collaborator

@cboos cboos commented Mar 8, 2026

Summary

  • Agent (sidechain) entries are now included in the DAG instead of being partitioned out and heuristically reordered
  • New _integrate_agent_entries() in converter.py parents agent roots to their anchor tool_result and assigns synthetic session IDs ({sessionId}#agent-{agentId}) to form separate DAG-lines
  • Agent sessions appear inline (no separate session headers, no individual session files, no nav entries) — visual output unchanged
  • 4 new integration tests covering agent parenting, session tree structure, header suppression, and multi-agent ordering

The look and feel of fork points and branches is now like this:

Screenshot from 2026-04-12 19-26-03

Details

How it works

  1. Anchor discovery: Main-session entries with agentId (from entry-level or toolUseResult.agentId) become anchors
  2. Parenting: Agent root entries (parentUuid=null) get their parentUuid set to the anchor's UUID
  3. Session separation: Synthetic sessionId (s1#agent-xyz) separates agents from their parent session in the DAG
  4. Traversal: Existing DAG machinery (build_session_tree, traverse_session_tree) naturally handles agent sessions as child sessions at junction points

What's kept (conservative approach)

  • _reorder_sidechain_template_messages() — now effectively a no-op for integrated agents, kept as fallback
  • _cleanup_sidechain_duplicates() — still needed for Task tool input/output dedup
  • sidechain_uuids in build_dag() — still needed for unloaded subagent files (e.g. aprompt_suggestion)

Key constraint

entry.sessionId on disk / in cache is never mutated. Synthetic IDs are only assigned in-memory during load_directory_transcripts().

Test plan

  • 4 new TestAgentDagIntegration tests pass
  • Updated test_load_directory_with_sidechains for new DAG ordering
  • Full test suite (801 passed, 7 skipped)
  • Snapshot tests pass (no rendering regression)
  • ty check and ruff check clean
  • Visual check with real agent data (-src-experiments-claude_p)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactoring

    • Agent/sidechain messages are now integrated into parent sessions: they render nested under the parent task, no longer show separate session headers, and are excluded from session navigation and per-session exports.
  • Documentation

    • Updated docs to reflect DAG-based integration of agent transcripts and next-phase roadmap.
  • Tests

    • Expanded integration tests for agent transcript ordering, parent-child placement, and rendering/grouping behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 703eaf33-a303-44a3-9917-71da1ab2c1ba

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Integrates agent/sidechain transcript entries into the DAG by adding _integrate_agent_entries() in the converter, assigning synthetic session IDs for agents, wiring agent roots to anchor UUIDs, and updating DAG construction, renderer grouping/pagination, utilities, and tests so agent sessions are treated as child sessions of their parent sessions.

Changes

Cohort / File(s) Summary
Agent Integration Core
claude_code_log/converter.py
Added _integrate_agent_entries() to attach agent sidechain roots to main-session anchors, assign in-memory synthetic session IDs ({sessionId}#agent-{agentId}), use integrated all_messages for DAG building and non-DAG collections, and filter/exclude agent sessions across pagination, dedup, and per-session file generation.
DAG Structure
claude_code_log/dag.py
Added is_sidechain: bool = False to SessionDAGLine and updated docstrings to reflect "unloaded sidechain data" wording; no API signature breaks.
Renderer Adjustments
claude_code_log/renderer.py
Imported is_agent_session, get_parent_session_id, and get_project_display_name; skip agent sessions in navigation and per-session headers; compute effective parent session for agent messages so agent content renders under parent sessions.
Utilities
claude_code_log/utils.py
Added is_agent_session(session_id: str) and get_parent_session_id(session_id: str) helpers (note: duplicated definitions were introduced and should be deduplicated).
Tests
test/test_dag_integration.py
Extended _make_assistant_entry to accept agent_id, updated expectations, and added TestAgentDagIntegration coverage verifying agent anchoring, synthetic sessions, ordering, branch scenarios, and rendering behavior.
Docs & Roadmap
dev-docs/dag.md, work/phase-c-agent-transcripts.md
Updated Phase C status and roadmap, documenting _integrate_agent_entries() behavior, DAG splicing of agent lines, renderer handling, retained fallback helpers, and Phase D next steps.

Sequence Diagram

sequenceDiagram
    participant Directory as Directory Scanner
    participant Converter as Converter
    participant Integrator as _integrate_agent_entries()
    participant DAGBuilder as DAG Builder
    participant Renderer as Renderer

    Directory->>Converter: load_directory_transcripts()
    Converter->>Converter: scan & collect all messages
    Converter->>Integrator: _integrate_agent_entries(all_messages)
    Integrator->>Integrator: build agentId → anchor UUID map
    Integrator->>Integrator: set parentUuid on agent roots
    Integrator->>Integrator: assign synthetic sessionIds ("#agent-")
    Integrator-->>Converter: modified messages (in-place)
    Converter->>DAGBuilder: build_dag_from_entries(all_messages, sidechain_uuids)
    DAGBuilder->>DAGBuilder: construct DAG including agent child sessions
    DAGBuilder-->>Converter: DAG structure
    Converter->>Renderer: generate HTML/navigation
    Renderer->>Renderer: filter out agent sessions from nav
    Renderer->>Renderer: use effective parent session for agent messages
    Renderer-->>Converter: rendered content grouped under parent sessions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰
I stitched sidechains to the main tree bright,
Anchors hummed and tiny IDs took flight,
Agent hops tucked neatly under boughs,
Parent sessions hold their curious brows,
Rabbit drums the DAG — all snug tonight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title "Integrate agent transcripts into the DAG (Phase C)" accurately and clearly describes the main objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev/dag-agents

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cboos
Copy link
Copy Markdown
Collaborator Author

cboos commented Mar 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/test_dag_integration.py (1)

741-986: Add one branch-scoped agent regression.

These cases only exercise agents anchored on ordinary session trunks. A case where the anchor lives inside a within-session fork would catch regressions where agent messages get regrouped under the base session instead of the branch pseudo-session.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_dag_integration.py` around lines 741 - 986, Add a new test (e.g.,
test_agent_branch_scoped_session) that creates a within-session fork and places
an agent anchor inside one branch to ensure agent entries are attached to the
branch pseudo-session rather than the base session: use _make_user_entry and
_make_assistant_entry to build a main trunk (u1→a1), then create two divergent
branch entries both referencing a1 (branch anchor u2_branch and a continuation
a2_branch) and include agent sidechain entries with agent_id tied to u2_branch;
write them with _write_jsonl and call load_directory_transcripts, then assert
the synthetic agent session appears in tree.sessions with parent_session_id
equal to the specific branch session id and attachment_uuid equal to the branch
anchor's uuid (using tree.sessions, parent_session_id, and attachment_uuid) and
that agent entries are ordered between the branch anchor and its continuation in
the returned result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude_code_log/renderer.py`:
- Around line 2025-2031: The code incorrectly derives agent parent scope by
slicing msg_session_id (msg_session_id.split("#agent-") ...) which loses branch
context; instead, use the session-tree/explicit render parent information (e.g.
prefer message.render_session_id or call the SessionTree method that resolves
the parent/anchor for a session) to compute agent_parent_session so branch
pseudo-sessions are preserved; update occurrences around
msg_session_id/agent_parent_session (and any logic that feeds
_reorder_session_template_messages) to stop splitting on "#agent-" and use the
SessionTree API or message.render_session_id to determine the true parent
session.

---

Nitpick comments:
In `@test/test_dag_integration.py`:
- Around line 741-986: Add a new test (e.g., test_agent_branch_scoped_session)
that creates a within-session fork and places an agent anchor inside one branch
to ensure agent entries are attached to the branch pseudo-session rather than
the base session: use _make_user_entry and _make_assistant_entry to build a main
trunk (u1→a1), then create two divergent branch entries both referencing a1
(branch anchor u2_branch and a continuation a2_branch) and include agent
sidechain entries with agent_id tied to u2_branch; write them with _write_jsonl
and call load_directory_transcripts, then assert the synthetic agent session
appears in tree.sessions with parent_session_id equal to the specific branch
session id and attachment_uuid equal to the branch anchor's uuid (using
tree.sessions, parent_session_id, and attachment_uuid) and that agent entries
are ordered between the branch anchor and its continuation in the returned
result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1c21610-c91a-40de-856a-11a147b04f98

📥 Commits

Reviewing files that changed from the base of the PR and between 2eeacec and a9f3c56.

📒 Files selected for processing (6)
  • claude_code_log/converter.py
  • claude_code_log/dag.py
  • claude_code_log/renderer.py
  • dev-docs/dag.md
  • test/test_dag_integration.py
  • work/phase-c-agent-transcripts.md

Comment thread claude_code_log/converter.py
Comment thread claude_code_log/renderer.py Outdated
@cboos
Copy link
Copy Markdown
Collaborator Author

cboos commented Mar 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
claude_code_log/converter.py (1)

1431-1437: ⚠️ Potential issue | 🟠 Major

Fold agent entries into the parent session cache instead of skipping them.

These messages are grouped back under the parent session later when rendering/paginating, so dropping #agent-... sessions here makes the cached message_count and token totals too low for any session with agents. That skews page assignment and session/page stats away from what the UI actually renders. Coalesce synthetic agent sessions back to the parent session ID here, and mirror the same normalization in _build_session_data_from_messages().

💡 Suggested fix
-            session_id = getattr(message, "sessionId", "")
-            if not session_id or is_agent_session(session_id):
+            session_id = getattr(message, "sessionId", "")
+            if not session_id:
                 continue
+            if is_agent_session(session_id):
+                session_id = session_id.split("#agent-")[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/converter.py` around lines 1431 - 1437, The current branch
skips any message whose sessionId is an agent session
(is_agent_session(session_id)), which drops those messages from the session
cache and undercounts message_count/tokens; instead, when processing
session-level data in this block (and similarly inside
_build_session_data_from_messages()), detect agent sessions and remap/coalesce
them into their parent session id (e.g., derive parent_id from session_id rather
than continue), then proceed to accumulate counts/tokens under that parent_id;
update both this message-processing branch (where message and
SummaryTranscriptEntry are checked) and the _build_session_data_from_messages()
normalization so agent sessions are folded into the parent session consistently.
🧹 Nitpick comments (1)
test/test_dag_integration.py (1)

738-1072: Add a regression for session-metadata/pagination coalescing.

These tests cover DAG/tree/rendering well, but the PR also changes cache/session aggregation and paginated grouping. A focused assertion around _update_cache_with_session_data() or paginated HTML generation would catch cases where agent messages render inline but are omitted from the parent session’s message_count or token totals.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_dag_integration.py` around lines 738 - 1072, Add a regression test
that exercises _update_cache_with_session_data (via load_directory_transcripts)
to ensure agent sidechain entries are included in the parent session's
pagination/aggregation metadata: create a main session with an agent anchor and
sidechain agent_entries (like existing tests), call
load_directory_transcripts(tmp_path, silent=True) then call or verify the cache
update path (_update_cache_with_session_data) ran, and assert that
session_tree.sessions["s1"] (or the parent session object) has message_count and
total token-related fields updated to include the agent messages (e.g.,
message_count increased by agent entries, total_tokens or similar totals reflect
agent content) so paginated HTML generation will include agent messages in
parent session aggregates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@claude_code_log/converter.py`:
- Around line 1431-1437: The current branch skips any message whose sessionId is
an agent session (is_agent_session(session_id)), which drops those messages from
the session cache and undercounts message_count/tokens; instead, when processing
session-level data in this block (and similarly inside
_build_session_data_from_messages()), detect agent sessions and remap/coalesce
them into their parent session id (e.g., derive parent_id from session_id rather
than continue), then proceed to accumulate counts/tokens under that parent_id;
update both this message-processing branch (where message and
SummaryTranscriptEntry are checked) and the _build_session_data_from_messages()
normalization so agent sessions are folded into the parent session consistently.

---

Nitpick comments:
In `@test/test_dag_integration.py`:
- Around line 738-1072: Add a regression test that exercises
_update_cache_with_session_data (via load_directory_transcripts) to ensure agent
sidechain entries are included in the parent session's pagination/aggregation
metadata: create a main session with an agent anchor and sidechain agent_entries
(like existing tests), call load_directory_transcripts(tmp_path, silent=True)
then call or verify the cache update path (_update_cache_with_session_data) ran,
and assert that session_tree.sessions["s1"] (or the parent session object) has
message_count and total token-related fields updated to include the agent
messages (e.g., message_count increased by agent entries, total_tokens or
similar totals reflect agent content) so paginated HTML generation will include
agent messages in parent session aggregates.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3db4e5f1-dedf-4f81-9ffa-473893d3cb36

📥 Commits

Reviewing files that changed from the base of the PR and between a9f3c56 and 000817f.

📒 Files selected for processing (4)
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
  • claude_code_log/utils.py
  • test/test_dag_integration.py

@cboos
Copy link
Copy Markdown
Collaborator Author

cboos commented Mar 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude_code_log/converter.py`:
- Around line 1649-1658: The fallback path still groups by raw sessionId and
thus leaks synthetic agent sessions; update _collect_project_sessions() (or the
process_projects_hierarchy() fallback call) to exclude any session_id where
is_agent_session(session_id) is true and to skip warmup_session_ids, mirroring
the filtering used earlier (session_id not in warmup_session_ids and not
is_agent_session(session_id)); ensure the grouping step uses the filtered
session_ids variable so pseudo-sessions like "#agent-..." are never emitted as
standalone project-index sessions.
- Around line 416-456: The agent anchoring logic in _integrate_agent_entries
only collects anchors from non-sidechain entries so nested agents (whose anchor
is in a sidechain) remain unanchored; update the agent_anchors build to include
any entry with an agentId (i.e., check msg.agentId regardless of
msg.isSidechain) and when multiple candidates exist prefer a non-sidechain
anchor over a sidechain one (so preserve an existing non-sidechain anchor and
only set/override when the current candidate is a better choice), keeping
references to BaseTranscriptEntry, agentId, isSidechain, agent_anchors,
parentUuid, and sessionId to locate the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 047fec33-895e-4754-b83c-c8d84df4dae8

📥 Commits

Reviewing files that changed from the base of the PR and between 000817f and 61b7ff1.

📒 Files selected for processing (4)
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
  • claude_code_log/utils.py
  • test/test_dag_integration.py

Comment on lines +416 to +456
def _integrate_agent_entries(messages: list[TranscriptEntry]) -> None:
"""Parent agent entries and assign synthetic session IDs.

Agent (sidechain) entries share sessionId with their parent session
but form separate conversation threads. This function:

1. Builds a map of agentId -> anchor UUID (the main-session User entry
whose agentId matches, i.e. the tool_result that references the agent)
2. For each agent's root entry (parentUuid=None, isSidechain=True),
sets parentUuid to the anchor UUID
3. Assigns a synthetic sessionId ("{sessionId}#agent-{agentId}") to all
agent entries so they form separate DAG-lines

Mutates entries in place (Pydantic v2 models are mutable by default).
"""
# Build agentId -> anchor UUID map from main-session entries
agent_anchors: dict[str, str] = {}
for msg in messages:
if not isinstance(msg, BaseTranscriptEntry):
continue
if msg.isSidechain:
continue
# Main-session entries with agentId reference an agent transcript
if msg.agentId:
agent_anchors[msg.agentId] = msg.uuid

if not agent_anchors:
return

# Process sidechain entries: parent roots and assign synthetic sessionIds
for msg in messages:
if not isinstance(msg, BaseTranscriptEntry):
continue
if not msg.isSidechain or not msg.agentId:
continue
agent_id = msg.agentId
# Assign synthetic session ID to separate from main session
msg.sessionId = f"{msg.sessionId}#agent-{agent_id}"
# Parent the root entry to the anchor
if msg.parentUuid is None and agent_id in agent_anchors:
msg.parentUuid = agent_anchors[agent_id]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Nested agent transcripts still won't get anchored.

load_transcript() already recurses into agent files, but agent_anchors is built only from non-sidechain entries. If an agent launches another agent, that inner anchor lives in a sidechain entry, so the nested root keeps parentUuid=None and becomes a separate root instead of attaching under the outer agent branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/converter.py` around lines 416 - 456, The agent anchoring
logic in _integrate_agent_entries only collects anchors from non-sidechain
entries so nested agents (whose anchor is in a sidechain) remain unanchored;
update the agent_anchors build to include any entry with an agentId (i.e., check
msg.agentId regardless of msg.isSidechain) and when multiple candidates exist
prefer a non-sidechain anchor over a sidechain one (so preserve an existing
non-sidechain anchor and only set/override when the current candidate is a
better choice), keeping references to BaseTranscriptEntry, agentId, isSidechain,
agent_anchors, parentUuid, and sessionId to locate the change.

Comment on lines +1649 to +1658
# Find all unique session IDs (excluding warmup and agent sessions)
session_ids: set[str] = set()
for message in messages:
if hasattr(message, "sessionId"):
session_id: str = getattr(message, "sessionId")
if session_id and session_id not in warmup_session_ids:
if (
session_id
and session_id not in warmup_session_ids
and not is_agent_session(session_id)
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The no-cache project-summary fallback still leaks synthetic agent sessions.

Filtering agent sessions out here fixes individual session files, but process_projects_hierarchy() falls back to _collect_project_sessions() when cached project data is missing, and that helper still groups by raw sessionId. In that path, #agent-... pseudo-sessions will still show up as standalone project-index sessions, which contradicts the inline-only behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude_code_log/converter.py` around lines 1649 - 1658, The fallback path
still groups by raw sessionId and thus leaks synthetic agent sessions; update
_collect_project_sessions() (or the process_projects_hierarchy() fallback call)
to exclude any session_id where is_agent_session(session_id) is true and to skip
warmup_session_ids, mirroring the filtering used earlier (session_id not in
warmup_session_ids and not is_agent_session(session_id)); ensure the grouping
step uses the filtered session_ids variable so pseudo-sessions like "#agent-..."
are never emitted as standalone project-index sessions.

cboos and others added 14 commits April 17, 2026 18:25
Agent (sidechain) entries are now part of the DAG instead of being
partitioned out and reordered heuristically. _integrate_agent_entries()
parents agent roots to their anchor tool_result and assigns synthetic
session IDs ({sessionId}#agent-{agentId}) so they form separate
DAG-lines spliced at the correct junction point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use session hierarchy to determine the correct parent session for agent
messages (may be a branch pseudo-session when the anchor is inside a
within-session fork). Fallback to extracting from synthetic ID.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract `is_agent_session()` helper to `utils.py` and use it to filter
agent sessions from: session metadata building, cache population,
pagination session collection, and message grouping for pages. Agent
messages in pagination are grouped under their parent session.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of dropping agent messages from session metadata, remap their
synthetic session IDs to the parent session using get_parent_session_id().
This ensures message counts, token totals, and timestamps in pagination
and cache include agent contributions.

Add regression test verifying agent messages fold into parent session's
message_count and token totals via _build_session_data_from_messages().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _integrate_agent_entries: also collect anchors from sidechain entries
  (for nested agents where agent A spawns agent B), preferring
  non-sidechain anchors when both exist.
- _collect_project_sessions: coalesce agent sessions into parent via
  get_parent_session_id(), preventing synthetic #agent- sessions from
  appearing as standalone entries in the project index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With DAG integration, agent messages are already in correct order via
DAG traversal — the sidechain reorder function was a no-op for directory
mode. To make it safe to remove for single-file mode too, add
_integrate_agent_entries() to the single-file path in converter.py so
both code paths use consistent DAG-based ordering.

_cleanup_sidechain_duplicates() is kept as-is: content-level dedup
(Task input/output duplicated in sidechain) cannot be handled at the
DAG level since it requires text comparison, not structural ordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reset current_render_session to None when entering a new non-agent
session in _render_messages(). Previously, the branch tracking state
from a within-session fork would leak into subsequent sessions,
causing their messages to inherit a stale branch render_session_id.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fork points now render as standalone structural elements (⑂) outside
the message box, listing each branch with preview text. Branch headers
are compact with ↳ prefix, no session divider, and a back-reference
to the fork point.

Changes:
- renderer.py: Extend junction_forward_links with branch preview text,
  add fork_point_preview attribute to TemplateMessage
- transcript.html: Move fork element outside message box, update branch
  header to use ↳ prefix and skip session divider
- system_formatters.py: Redesign branch header HTML with compact
  back-reference (from ⑂ Fork point • preview)
- message_styles.css: Branch headers are transparent, no box shadow,
  left border only
- session_nav_styles.css: Replace junction-link styles with fork-point
  element styles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- .branch-header: margin 2em/1em, fold-bar border override to gray
- .fork-point: margin-bottom 2em, border-radius 8px
- Branch headers show "(in Session X)" for cross-session forks
  (when branch's original session differs from parent's real session)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --fork-point-color and --branch-point-color CSS variables to :root
- Replace hardcoded #adb5bd with var() references in fork/branch styles
- Fix fold-bar specificity: use .branch-header .fold-bar[data-border-color=
  "session-header"] to override the yellow/orange session-header color

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entries like "attachment", "permission-mode", "progress", and
"file-history-snapshot" have uuid/parentUuid and are part of the DAG
chain, but were previously dropped at parse time. This caused false
roots — any message whose parentUuid pointed to a dropped entry
became an orphan in the DAG.

PassthroughTranscriptEntry captures just the DAG-relevant fields
(uuid, parentUuid, sessionId, timestamp, isSidechain) so these
entries participate in the DAG graph without being rendered.

Changes:
- models.py: Add PassthroughTranscriptEntry, add to TranscriptEntry union
- converter.py: Create PassthroughTranscriptEntry for unknown types with
  uuid+sessionId; update _repair_parent_chains to skip repair for
  progress entries that are present as passthrough nodes;
  update _integrate_agent_entries and _build_session_data_from_messages
- transcript_factory.py: Fall back to PassthroughTranscriptEntry in
  create_transcript_entry for cache deserialization
- renderer.py: Filter out PassthroughTranscriptEntry in _filter_messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes found from real-data testing:

1. deduplicate_messages() was dropping PassthroughTranscriptEntry
   entries with identical timestamps (common for attachments). Use
   uuid as content_key to prevent false dedup of DAG chain nodes.

2. Fork points with fewer than 2 branches with meaningful preview
   text (i.e. not UUID-fragment fallbacks) are now elided — they
   add visual noise without providing navigation value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs when processing Windows-origin transcripts:

1. deduplicate_messages() collapsed distinct entries sharing a
   timestamp: system entries used content_key="" (now uuid), and
   assistant split-content-blocks shared the same message.id
   (now message.id + block types, so stutters still dedup).

2. SystemTranscriptEntry.level can be None (turn_duration,
   memory_saved, away_summary subtypes). The factory's getattr
   fallback never triggered because the attribute exists as None;
   changed to `transcript.level or "info"`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address monk review note: use `message.level or "info"` instead of
`getattr(message, "level", "info")` for consistency with the
system_factory fix (same None-attribute-exists issue).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cboos cboos changed the base branch from dev/dag to main April 17, 2026 16:47
@cboos
Copy link
Copy Markdown
Collaborator Author

cboos commented Apr 17, 2026

Same remark as on #97, I can't really be sure about that branch alone; it's probably missing a few important "DAG"-related fixes that can now be found in #96. Since #96 is now pretty solid, we can merge and focus on #96.

@cboos cboos merged commit b5aefa0 into main Apr 17, 2026
1 check passed
@cboos
Copy link
Copy Markdown
Collaborator Author

cboos commented Apr 18, 2026

Oops, not sure why it was merged without the typecheck/ty fixes... will follow-up on main directly.

@cboos cboos mentioned this pull request Apr 19, 2026
cboos added a commit that referenced this pull request May 6, 2026
…lize naming

Split planning content from as-built reference content per the principle
that ``work/`` holds plans/TODOs (authoritative until the code lands)
and ``dev-docs/`` reflects the current code (authoritative once the
code lands).

Three file moves, one rename, one deletion, one new file:

- ``dev-docs/FOLD_STATE_DIAGRAM.md`` → ``dev-docs/message-hierarchy.md``
  (drops the SCREAMING_SNAKE; H1 also updated to "Message Hierarchy
  and Fold State" to match content scope).
- ``dev-docs/restoring-archived-sessions.md`` → ``docs/restoring-archived-sessions.md``
  (user-facing operations content; ``docs/`` is the new home for
  user-facing documentation distinct from contributor docs).
- ``dev-docs/rendering-next.md`` → ``work/rendering-next.md``
  ("Future Work for the rendering system" — explicitly plan content).
- ``work/phase-c-agent-transcripts.md`` deleted: the work fully landed
  via PR #99 (b5aefa0); the as-built reality is now in ``agents.md``.
- ``dev-docs/application_model.md`` (new): entry-point doc.

Inbound references updated in ``CLAUDE.md``, ``CONTRIBUTING.md``,
``claude_code_log/converter.py`` (URL to docs/), and the four dev-docs
that cross-referenced the moved/renamed files.

The new ``application_model.md`` is the entry point: § 1 subsystem
table with pointers to deep-dive docs, § 2 inlined coverage of
subsystems without their own deep-dive (CLI, TUI, cache implementation,
migrations, detail-level filter, image export, performance profiling),
§ 3 data lifecycle ASCII diagram, § 4 cross-cutting glossary
(TranscriptEntry, MessageContent, TemplateMessage, RenderingContext,
session_id and synthetic forms, sidechain, agent_id, fork point,
pair_first/middle/last, detail level, passthrough), § 5 "where to
start reading" by question.

Surviving dev-docs all carry a one-line breadcrumb pointing at
``application_model.md`` so a reader landing on a deep-dive without
context can find the system overview.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai coderabbitai Bot mentioned this pull request May 6, 2026
cboos added a commit that referenced this pull request May 7, 2026
…ming, clean work/ (#134)

* dev-docs/work/docs restructure: introduce application_model.md, normalize naming

Split planning content from as-built reference content per the principle
that ``work/`` holds plans/TODOs (authoritative until the code lands)
and ``dev-docs/`` reflects the current code (authoritative once the
code lands).

Three file moves, one rename, one deletion, one new file:

- ``dev-docs/FOLD_STATE_DIAGRAM.md`` → ``dev-docs/message-hierarchy.md``
  (drops the SCREAMING_SNAKE; H1 also updated to "Message Hierarchy
  and Fold State" to match content scope).
- ``dev-docs/restoring-archived-sessions.md`` → ``docs/restoring-archived-sessions.md``
  (user-facing operations content; ``docs/`` is the new home for
  user-facing documentation distinct from contributor docs).
- ``dev-docs/rendering-next.md`` → ``work/rendering-next.md``
  ("Future Work for the rendering system" — explicitly plan content).
- ``work/phase-c-agent-transcripts.md`` deleted: the work fully landed
  via PR #99 (b5aefa0); the as-built reality is now in ``agents.md``.
- ``dev-docs/application_model.md`` (new): entry-point doc.

Inbound references updated in ``CLAUDE.md``, ``CONTRIBUTING.md``,
``claude_code_log/converter.py`` (URL to docs/), and the four dev-docs
that cross-referenced the moved/renamed files.

The new ``application_model.md`` is the entry point: § 1 subsystem
table with pointers to deep-dive docs, § 2 inlined coverage of
subsystems without their own deep-dive (CLI, TUI, cache implementation,
migrations, detail-level filter, image export, performance profiling),
§ 3 data lifecycle ASCII diagram, § 4 cross-cutting glossary
(TranscriptEntry, MessageContent, TemplateMessage, RenderingContext,
session_id and synthetic forms, sidechain, agent_id, fork point,
pair_first/middle/last, detail level, passthrough), § 5 "where to
start reading" by question.

Surviving dev-docs all carry a one-line breadcrumb pointing at
``application_model.md`` so a reader landing on a deep-dive without
context can find the system overview.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* dev-docs: address branch-heading navigation gap surfaced by docs-walk-through

A reviewer reading the new ``application_model.md`` cold landed at
``dag.md`` for branch sessions, but the doc was silent on where the
branch-heading text itself is assembled — they'd have had to grep
``renderer.py`` to find it. Three small fixes close the gap:

1. ``SessionHeaderMessage`` glossary entry in
   ``application_model.md`` § 4: defines the term, distinguishes
   trunk vs branch flavours, names the four functions that compose
   the title (``_branch_label``, ``_enrich_branch_titles``,
   ``create_session_preview``, ``simplify_command_tags``). Lands
   between the ``fork point / branch`` and ``pair_first / middle /
   last`` entries since branch headings are the lookup most likely
   to lead here.

2. Cross-reference in ``dag.md``'s "Navigation Links" section:
   one-line callout that title composition is a renderer concern,
   pointing back at the new glossary entry. Keeps ``dag.md``
   focused on structure (its job) without the reader hitting a
   silent dead end on the labeling question.

3. Cross-cutting-concerns paragraph after the § 1 subsystem table:
   names "label and preview composition" as the canonical example
   of a concern that touches several rows (DAG, renderer, parsing)
   and isn't owned by any single one. Helps the next reader who
   looks for "where do branch labels live?" and finds neither the
   DAG row nor the renderer row obviously authoritative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* dev-docs: sync with main for #36 (JSON), #135 (DAG cycle-break + SIGUSR1), #133 (factory-layer normalisation seam)

Three PRs landed on main since this branch was created. The dev-docs
deltas they imply:

- **PR #36 (JSON renderer)** —
  - `application_model.md` § 1 subsystem table gains a JSON export
    row; new § 2.5 "JSON export" describes the renderer (mirrors
    HTML/Markdown surface, runs through `generate_template_messages`
    so it inherits all post-factory polishing for free, JSON-specific
    `_json_default` shim for embedded Pydantic models, output naming
    via `variant_suffix`).
  - § 2.1 CLI lists `json` alongside `html`/`md`/`markdown` for
    `--format`.
  - § 3 data lifecycle diagram adds `json/renderer.py` as a third
    fan-out branch alongside HTML/Markdown.
  - § 5 entry-questions gains a "How do I export to JSON?" pointer.
  - `rendering-architecture.md` § 1 / § 4 list JSON; new
    `JsonRenderer` subsection in § 7 describes its serialisation
    approach (single document via `dataclasses.asdict` + shim; only
    reuses `title_content` from the dispatcher).
  - § 10 Separation of Concerns clarifies that **factories/** is the
    normalisation seam — display polish for *all* output formats lives
    there, not in renderers.
  - `implementing-a-tool-renderer.md` Overview + Tests sections note
    that JSON needs no per-tool integration: serialises the typed
    factory output directly.

- **PR #135 (DAG cycle-break + SIGUSR1)** —
  - `dag.md` Phase 2 (Build DAG) gains a mermaid diagram of the
    three-step `build_dag` ordering (orphan promotion → cycle break
    → children build) and explains the ordering rationale (children
    must be built last because cyclic edges in `parent_uuid` would
    propagate into `children_uuids` otherwise, hanging downstream
    walks).
  - Phase 3 documents the `walk_visited` defence-in-depth in
    `_walk_session_with_forks` — backstop for reintroduced cycles
    after build time.
  - Invariant 3 (DAG acyclicity) updated from "no cycles" assertion
    to "actively broken at build time, walker has belt".
  - `application_model.md` new § 2.9 "Diagnosing hangs" describes
    the SIGUSR1 stack-dump handler installed in `cli.py`.

- **PR #133 (system-info-cleanup)** —
  - Already mostly absorbed via the branch-heading navigation gap
    commit; the renormalisation ("if it shows up right in HTML, it
    shows up right in JSON") and the explicit "factory-layer
    normalisation seam" framing in `rendering-architecture.md` § 10
    cite `simplify_command_tags` as the canonical example.

Also drive-by: § 2.6 detail-level table now includes the `high`
level (was missing — `models.DetailLevel.HIGH` predates this branch
but the doc never listed it).

`just ci` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* dev-docs: address CR findings on PR #134 (page-size scope, image-export modes)

Two doc claims didn't match the actual code; both verified directly
against `cli.py` / `converter.py` / `image_export.py` rather than the
CR text alone.

- L75-76 page-size scope: clarify pagination targets the
  combined-transcript output (whole-session packing into N-message
  pages, with overflow when a single session exceeds N), not
  per-session HTML files. Verified against `cli.py:547-551` (help
  text says "combined transcript") and `converter.py:1002`
  `_assign_sessions_to_pages` (per-page accumulation at the
  combined-transcript level).

- § 2.7 image-export modes: rewrite to match the actual CLI choices.
  Mode names were wrong (`inline/referenced/dropped` →
  `placeholder/embedded/referenced`); defaults were also swapped
  (now `embedded` for HTML, `referenced` for Markdown). Verified
  against `cli.py:541-544` and `image_export.py:29-45`.

Skipped CR's L361 nit ("each `.md` is an in-flight or proposed plan"
→ "each `.md` is a plan"): the longer phrasing carries information
about work/'s mixed state and isn't redundant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* CLAUDE.md: ask Claude to keep dev-docs/ in sync with code changes

Per discussion on PR #134: dev-docs/ are useful only if they remain
true to the as-built reality. Articulate the principle (code is
authoritative; doc is wrong if they disagree) and the work/ →
dev-docs/ lifecycle so non-trivial changes prompt a doc update in
the same commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant