Skip to content

Support streaming tool output and deduplication#7

Merged
timvisher-dd merged 3 commits into
mainfrom
streaming-dedup
May 12, 2026
Merged

Support streaming tool output and deduplication#7
timvisher-dd merged 3 commits into
mainfrom
streaming-dedup

Conversation

@timvisher-dd
Copy link
Copy Markdown
Owner

@timvisher-dd timvisher-dd commented Mar 15, 2026

Closes xenodium#342
Closes xenodium#343

Checklist

  • I agree to communicate (PR description and comments) with the author myself (not AI-generated).
  • I've reviewed all code in PR myself and will vouch for its quality.
  • I've read and followed the Contributing guidelines.
  • I've filed a feature request/discussion for a new feature.
  • I've added tests where applicable.
  • I've run M-x checkdoc and M-x byte-compile-file.

Problem

The two most popular agent ACPs, codex-acp and claude-agent-acp, perform very poorly in agent-shell when tool executions emit a lot of text:

  • codex-acp: O(n²) rendering and massive data transfer. A 35k-line bash command takes ~60s and transfers ~890 MB of JSON — a 3,000× amplification of ~280 KB actual output. Each tool_call_update carries the full accumulated output; agent-shell replaces the entire fragment body and reruns markdown-overlays-put on every update.

  • claude-agent-acp: output is silently lost. The same command truncates to 241 of 35,001 lines. The user sees raw <persisted-output> XML tags rendered verbatim in the shell buffer.

Cause

agent-shell does not advertise _meta.terminal_output in clientCapabilities during the ACP initialize handshake. Without this capability:

  • codex-acp falls back to sending the full accumulated output in every tool_call_update (O(n²) content growth).
  • claude-agent-acp sends a single truncated result at completion instead of streaming the full output.

Fix

Advertise _meta.terminal_output during initialize and handle the resulting streaming behavior:

  1. Extend acp.el to accept :meta-capabilities on acp-make-initialize-request.
  2. Pass :meta-capabilities '((terminal_output . t)) from agent-shell.el during initialize.
  3. Handle incremental _meta.terminal_output.data chunks (codex-acp) and batch _meta.terminal_output results (claude-agent-acp) in a new streaming handler with deduplication.
  4. Strip <persisted-output> tags and render previews cleanly.

Implementation

New files

agent-shell-meta.el — extractors for ACP _meta payloads:

  • agent-shell--meta-lookup — key lookup handling both symbol and string keys in alists.
  • agent-shell--meta-find-tool-response — walks any _meta namespace to find a toolResponse value.
  • agent-shell--tool-call-meta-response-text — extracts stdout text from _meta.*.toolResponse in its various shapes (string, alist with stdout key, vector of content blocks).
  • agent-shell--tool-call-terminal-output-data — extracts _meta.terminal_output.data.

agent-shell-streaming.el — streaming tool call update handler:

  • agent-shell--tool-call-normalize-output — strips markdown fences, strips <persisted-output> XML tags (rendering the preview with font-lock-comment-face), and ensures trailing newlines.
  • agent-shell--append-tool-call-output — accumulates streamed output in the state's :tool-calls hash under an :accumulated key per tool call ID.
  • agent-shell--handle-tool-call-update-streaming — the main handler, replacing the inline tool_call_update block in agent-shell.el. Three branches:
    1. Terminal data (_meta.terminal_output.data): normalize the chunk, accumulate it, and immediately append it to the fragment body for live streaming.
    2. Meta response (_meta.*.toolResponse): normalize and accumulate silently (rendered only on final update to avoid duplication).
    3. Final update (status is "completed" or "failed"): render accumulated output (or fall back to content text), log to transcript, clean up permission dialogs, and apply title/label updates.
  • agent-shell--mark-tool-calls-cancelled — marks all in-progress tool calls as cancelled (called from agent-shell-interrupt).

Changes to agent-shell.el

  • (require 'agent-shell-streaming) added.
  • The ~50-line inline tool_call_update rendering block is replaced by a single call to agent-shell--handle-tool-call-update-streaming. The metadata save (title/description/command/raw-input/diff) remains inline before the handler call.
  • The initialize request now passes :meta-capabilities '((terminal_output . t)) to acp-make-initialize-request.
  • agent-shell-interrupt calls agent-shell--mark-tool-calls-cancelled after sending the cancel notification.
  • shell-maker-define-major-mode call passes 'agent-shell-mode-map (quoted symbol) instead of the bare variable. This is a bug fix — the bare-variable form errors with void-function keymap every time agent-shell-mode is invoked, because shell-maker-define-major-mode splices the keymap value into a backquoted form that re-evaluates (keymap ...) as a function call. Quoting passes the symbol so use-local-map resolves it correctly. Reproducible upstream; should be filed there separately.

Tests

Three new test files cover the streaming/dedup work and the runtime invariants library:

  • tests/agent-shell-streaming-tests.el (40 tests) — streaming dispatcher, dedup math (agent-shell--thought-chunk-delta four-case proof), label transitions, empty-chunk paragraph break, post-turn-end render, ID-split helpers, mixed-source dedup (terminal_output mid-flight + _meta.toolResponse on final), append-boundary newline cap, and the streaming tool_call_update general title upgrade.
  • tests/agent-shell-invariants-tests.el (11 tests) — event ring, mutation hooks, violation handler bundle output (head + tail snapshot for long buffers).
  • tests/agent-shell-table-tests.el (3 tests) — markdown table overlay rendering after streamed updates.

Plus regression tests added to tests/agent-shell-tests.el for: _claude/sdkMessage log surfacing, killed-buffer notification drop, and the markdown-overlay debounce buffer-kill race.

Perf measurements

Test: for x in {0..35000}; do printf 'line %d\n' "$x"; done (35,001 lines)

codex-acp

measure_ms (avg) content_bytes (avg) terminal_bytes (avg)
Without terminal caps ~60,000 ~900,000,000 0
With terminal caps ~7,500 ~3,000 ~240,000

~8× faster. Content drops from ~900 MB to ~3 KB.

claude-agent-acp

measure_ms (avg) content_bytes terminal_bytes
Without terminal caps ~22,000 2,321 (truncated to 241 lines) 0
With terminal caps ~23,000 0 2,270

No timing improvement (execution is server-side), but <persisted-output> tags are handled cleanly.

Prerequisite: acp.el changes

acp.el needs to accept the :meta-capabilities keyword argument on acp-make-initialize-request so the meta capability map can be advertised during the initialize handshake. See xenodium/acp.el#15.

Other changes bundled in this branch

The streaming/dedup work is the main motivation, but this branch also lands several tightly-coupled changes that share file boundaries with the streaming integration. They're noted here so reviewers see the full scope rather than discovering them as drift.

New libraries / files

  • agent-shell-meta.el_meta payload extractors (described above; called out here because it's a new top-level module).
  • agent-shell-streaming.el — streaming handler (described above).
  • agent-shell-invariants.el — runtime buffer invariant checks with event tracing and a violation debug bundle (agent-shell-invariants-enabled, default off; gated for the live-validate workflow).

Behavior changes outside the streaming handler

  • DWIM context insertion lands at the prompt and fragment updates no longer drag the process mark past it (added agent-shell--with-preserved-process-mark macro and :insert-cursor state slot).
  • agent-shell-ui.el append-in-place rewrite — chunks append to existing fragments without rebuilding the body, with boundary-newline normalization so paragraph-break chunks don't compound newlines on consecutive appends.
  • Empty agent_message_chunk mid-stream is rewritten to \n\n so two content blocks in the same turn don't run together.
  • Label-less fragments now default to expanded; previously they collapsed and inherited the invisible property from trailing-whitespace hiding.
  • tool_call_update notifications arriving after session/prompt resolves still render — covers Claude Code's Stop-hook bounce-and-regen cycle that streams chunks for ~50s after end_turn.
  • _claude/sdkMessage notifications are pretty-printed into the per-shell debug log when agent-shell-logging-enabled is set, surfacing hook lifecycle events that the ACP layer otherwise drops.
  • New defcustom agent-shell-markdown-overlay-debounce-delay (default 0.15s) for tuning idle-overlay cadence on slow terminals.

Build / CI / docs

  • New CI workflow (.github/workflows/ci.yml) with byte-compile, ERT, dependency-DAG, agent-symlinks, and PR-only README-update jobs.
  • New bin/test driver that parses ci.yml via yq so the local invocations stay in lockstep with CI; it bails fast on any untranslated ${{ }} expressions.
  • Multi-IDE config (.claude, .codex, .gemini symlinks → .agents; CLAUDE.md / CODEX.md / GEMINI.mdAGENTS.md).
  • CONTRIBUTING.org and AGENTS.md document the yq prerequisite and the dev-deps env-var override (acp_root, shell_maker_root).
  • New .agents/commands/live-validate.md describes the live-batch validation workflow used for rendering changes.

Bug fixes that ride along

  • Permission-title file-name fallback (Edit and read permission UI request does not show filename to edit xenodium/agent-shell#415).
  • Restart preserves default-directory.
  • Run-on paragraph fix when the model resumes after a mid-turn tool call.
  • Cancel path now marks all in-progress tool calls cancelled (agent-shell-interrupt).
  • "Thinking" label restored on the thought-chunk fragment (the streaming-dedup rewrite accidentally reverted xenodium's 2026-03-18 rename).

@timvisher-dd timvisher-dd force-pushed the streaming-dedup branch 2 times, most recently from 3415d07 to 9101647 Compare March 16, 2026 14:26
@timvisher-dd timvisher-dd changed the title # Support streaming tool output and deduplication Support streaming tool output and deduplication Mar 16, 2026
@timvisher-dd timvisher-dd force-pushed the streaming-dedup branch 7 times, most recently from 62b5363 to cb357ac Compare March 20, 2026 14:20
@timvisher-dd timvisher-dd force-pushed the streaming-dedup branch 2 times, most recently from 773099b to ee1e040 Compare April 6, 2026 14:14
@timvisher-dd timvisher-dd force-pushed the streaming-dedup branch 2 times, most recently from 31c4d9c to c7ee08b Compare April 13, 2026 12:16
@timvisher-dd timvisher-dd force-pushed the streaming-dedup branch 4 times, most recently from 2ee10cd to 1913d2e Compare April 29, 2026 12:50
@timvisher-dd timvisher-dd marked this pull request as ready for review April 29, 2026 12:58
timvisher-dd added a commit that referenced this pull request Apr 30, 2026
Reconciled conflicts:
- README.org: keep both feature lists (streaming-dedup PR #7 entries
  plus queue-via-temp-gfm-compose-buffer entry).
- bin/test: keep streaming-dedup's ci.yml-driven runner and add
  markdown-mode dependency resolution from the queue branch (markdown-mode
  is now in the merged ci.yml). Drop set -euo pipefail to match project
  bash conventions; replace with explicit || exit 1 checks.
- tests/agent-shell-tests.el: keep both new test sections.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timvisher-dd timvisher-dd force-pushed the streaming-dedup branch 3 times, most recently from 0676213 to 5a28f60 Compare May 11, 2026 13:47
timvisher-dd and others added 3 commits May 12, 2026 10:28
GitHub Actions workflow with four jobs: readme-updated (PR-only,
guards the soft-fork features list), agent-symlinks (verifies the
multi-IDE plumbing), dependency-dag (require graph must be acyclic),
and test (byte-compile + ERT under emacs 29.4 with acp.el and
shell-maker as checkout deps).

bin/test parses ci.yml with yq and dispatches each step locally, so
CI changes are picked up automatically.  adapt_for_local rewrites
GitHub PR sha context to a single @{u}... three-dot range that the
git wrapper accepts.  CONTRIBUTING.org documents the runner and the
acp_root / shell_maker_root overrides.

.claude / .codex / .gemini and CODEX.md are symlinks pointing at
.agents and AGENTS.md so the same config works across Claude Code,
Codex, and Gemini CLI.  .agents/commands/live-validate.md describes
the live rendering-validation workflow.

README.org gets a "Features on top of agent-shell" section
enumerating the streaming-dedup work and follow-on polish.

agent-shell-devcontainer.el declares agent-shell-text-file-capabilities
to suppress a byte-compile warning for the cross-file reference.

Three send-command tests in tests/agent-shell-tests.el get :title
and :last-activity-time pre-seeded on their hand-rolled state alists
so the local runner produces a clean baseline.  Without the
placeholders, agent-shell--set-session-title's map-put! call fails
with map-not-inplace because the alists lack the keys.

Quote the keymap argument to shell-maker-define-major-mode.  Under
shell-maker 0.91.2 the macro expects a symbol it can resolve at mode
activation; passing the unquoted variable evaluates to the keymap
value before the macro can use it, and agent-shell-mode signals
(void-function keymap) when any test creates a fresh buffer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three tests cover the markdown table overlay pipeline end-to-end:
overlay structure for a static buffer, mid-stream cleanup so stale
overlays disappear when a row is rewritten, and a regression that
guards against table rows being split across visual lines.

The helpers inject ACP traffic via agent-shell--on-notification
and fire pending debounce timers when present, so the tests reflect
the real streaming path rather than direct markdown-overlays-put
calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lpers

Three new modules and the agent-shell.el integration that wires them
together:

- agent-shell-meta.el extracts toolResponse and terminal_output from
  the ACP meta envelope so streaming code can fold mixed-source tool
  output (terminal stream + final meta.toolResponse) without showing
  duplicate content.

- agent-shell-invariants.el is a runtime tracing and assertion
  library: a ring of recent ACP/UI events, process-mark and
  fragment-update guard wrappers, and a long-buffer head/tail
  snapshot included in violation reports.

- agent-shell-streaming.el is the streaming tool_call_update handler
  with dedup, including a label cache cleared on completion and the
  generalized title upgrade path that survives buffer-kill races.

agent-shell.el wires these in (require, on-notification dispatch,
process-mark/fragment guards, insert-cursor reset, defcustom for the
markdown-overlay debounce delay, and dropping session/update
handlers when the shell buffer has been killed).  agent-shell-ui.el
gains the invariants require and the UI plumbing the streaming
handler relies on.

Tests cover the dedup logic across mixed sources, the invariants
library's event ring and guard wrappers, and additional regression
coverage in agent-shell-tests.el (cancel with nil transcript-file,
markdown-overlay debounce buffer-kill race, "Thinking" label
restoration on agent_thought_chunk).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timvisher-dd timvisher-dd merged commit 25a3b75 into main May 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support streaming tool output and deduplication

1 participant