designs - typescript - graph by pgrayy · Pull Request #543 · strands-agents/docs

pgrayy · 2026-02-12T18:04:44Z

Description

Related Issues

Type of Change

New content
Content update/revision
Structure/organization improvement
Typo/formatting fix
Bug fix
Other (please describe):

Checklist

I have read the CONTRIBUTING document
My changes follow the project's documentation style
I have tested the documentation locally using mkdocs serve
Links in the documentation are valid and working

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

zastrowm · 2026-02-12T18:07:40Z

designs/typescript-graph.md

+  - The graph is marked `FAILED` if at least one node fails
+  - Concurrently running nodes are allowed to finish by default (graceful exit)
+  - Customers can override this for fail-fast behavior
+- `CANCELLED`: explicitly cancelled by the caller (e.g., via a hook that sets `cancelNode`)


Should we have INTERRUPTED for future usage?

It'll be included in v1. I only gave the blurb about it in the overview because there is already a lot of content in this doc to discuss.

JackYPCOnline · 2026-02-12T18:08:19Z

designs/typescript-graph.md

+enum Status {
+  PENDING = 'PENDING',
+  EXECUTING = 'EXECUTING',
+  COMPLETED = 'COMPLETED',


interrupt is a follow up?

Yes correct. It'll follow the same pattern as Python though which already went through a design review.

zastrowm · 2026-02-12T18:09:19Z

designs/typescript-graph.md

+```
+
+- Base state class shared across multi-agent patterns (Graph, Swarm, etc.)
+- `user`: mutable section for customer-defined state. Typed via Zod schema inference (`z.infer<typeof schema>`) so customers get compile-time type safety and runtime validation without writing generics manually. Customers update it directly (e.g., `state.user.counter += 1`).


so customers get compile-time type safety and runtime validation without writing generics

Can you expand on this a bit? Whereis/what is schema coming from in this case?

Wondering if AgentState should be using the same pattern

The schema is passed in by the user into the GraphBuilder. This will then give users a well typed state object they are free to manipulate in code through node functions, hook callbacks, etc.

An alternative approach I considered was using generics, but that became cumbersome because it would require making everything above MultiAgentState generic as well.

strands-agent · 2026-02-12T18:09:33Z

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-543/

zastrowm · 2026-02-12T18:09:44Z

designs/typescript-graph.md

+  readonly status: Status
+  readonly duration: number
+  readonly error?: Error
+  readonly output: ContentBlock[]


Might be worth having input on here for completeness TBH

We could. NodeInput, NodeState, and NodeResult are meant to be flexible and actually could share fields with one another. I actually wanted to consider combining everything into NodeState and will discuss that afterward. For now though, I think this is a good start because it mirrors what we did in Python.

JackYPCOnline · 2026-02-12T18:10:16Z

designs/typescript-graph.md

+class NodeResult {
+  readonly nodeId: string
+  readonly status: Status
+  readonly duration: number


Question : What is the trade off vs timestamp here

timestamp is a point in time where duration is a time delta. timestamp would be something like a DateTime type where as duration would be a number measuring seconds.

yes I mean we expose start & finish time instead of duration. user can have more extensibility if there is a use case.

dbschmigelski · 2026-02-12T18:10:35Z

designs/typescript-graph.md

+const graph = new GraphBuilder()
+  .addNode(researcher)
+  .addNode(writer)
+  .addNode(reviewer)


So does this mean in addNode we are going to add sugar so you can pass in Node or Agent and we'll just wrap as needed? Or are we making Agent actually implement Node

We wrap the Agent instance into an AgentNode instance on behalf of the customer.

zastrowm · 2026-02-12T18:10:54Z

designs/typescript-graph.md

+class NodeResult {
+  readonly nodeId: string
+  readonly status: Status
+  readonly duration: number


nit: we should normalize/standardize on a duration of some sort

Agreed. We did not do this in Python. There are multiple duration fields where some are measured in seconds, others in milliseconds. Here I figured seconds was a good standard.

mehtarac · 2026-02-12T18:11:35Z

designs/typescript-graph.md

+- Downstream nodes receive content blocks built from upstream dependency outputs
+
+```typescript
+class NodeResult {


should we track the parent node? would be None for the entry point node

I'm not sure we need to track it in result. Users can already see the graph structure directly going through graph.state and graph.config. But could be added here if there is a good reason. It is an additive thing which is backwards compatible.

Unshure

What proposed changes here are different than what we have in python?

Unshure · 2026-02-12T18:10:25Z

designs/typescript-graph.md

+### Node
+
+```typescript
+type NodeInput = string | ContentBlock[]


nit: Should this be more general?
https://github.com/strands-agents/sdk-typescript/blob/main/src/agent/agent.ts#L118

Yes this can be extended and made more general. We could use invokeArgs for the time being, but I figured I would consider a NodeInput incase we wanted to expand even further. But for now, probably worth replacing with InvokeArgs. Same for GraphInput. Could be replaced by InvokeArgs.

Unshure · 2026-02-12T18:12:23Z

designs/typescript-graph.md

+    maxNodeExecutions: 10,
+    executionTimeout: 60,
+    userSchema: z.object({
+      drafts: z.number().default(0),
+      approved: z.boolean().default(false),
+    }),


nit: Should all of these be their own add...() functions for the builder?

FWIW, I have not seen the builder pattern much in JS/TS.

IMHO the add methods would be just noise vs making it a config option

zastrowm · 2026-02-12T18:13:09Z

designs/typescript-graph.md

+  executionTimeout?: number
+  maxConcurrency?: number
+  hookProviders?: HookProvider[]
+  userSchema?: z.ZodObject<z.ZodRawShape>


Is there a way that external things can contribute? I'm thinking if you add a hook, what's the mechansim for a hook to add state as well that doesn't necessarily match this schema? Should that be allowed and/or is that desirable?

mehtarac · 2026-02-12T18:13:10Z

designs/typescript-graph.md

+
+```typescript
+interface NodeConfig {
+  timeout?: number


retry:? whether on timeout or error

Yeah we can add retry configs here. NodeConfig is setup to be additive so any more ideas you have pitch them.

JackYPCOnline · 2026-02-12T18:13:46Z

designs/typescript-graph.md

+- `multiAgentState`: reference to the current `MultiAgentState` (e.g., `GraphState`, `SwarmState`) for reading execution metadata and updating user state
+
+```typescript
+interface NodeConfig {


Agent as Node, Function and Multiagent as Node use same NodeConfig interface.
Anything else we should include?

We could create a MutliAgentNodeConfig or AgentNodeConfig if need be but right now I'm not seeing a use case. But they would extend NodeConfig if we decided on it.

mkmeral · 2026-02-12T18:14:06Z

designs/typescript-graph.md

+}
+```
+
+- Wraps a `MultiAgentBase` instance, enabling nested orchestration (e.g., a `Graph` as a node in another `Graph`)


Can we get the multiagent interface to match the agent one? From python, I think MultiAgentBase is pretty much the same as AgentBase, and I was thinking about getting rid of one

👍 had a similar question.

mkmeral · 2026-02-12T18:14:31Z

designs/typescript-graph.md

+- `state` is optional on `stream()`; if not provided, the node creates a default `NodeState` internally. Callers pass it to override node-scoped context (e.g., `executionCount` when a node is retried within the same invocation).
+
+```typescript
+class AgentNode extends Node {


Do we plan to do AgentBase for other remote agents?

dbschmigelski · 2026-02-12T18:14:36Z

designs/typescript-graph.md

+- Telemetry, hooks, and interrupts are planned for multi-agent patterns but follow closely from existing SDK patterns and should not require a separate design discussion
+- Session management will be reviewed separately
+
+## Interfaces


High level

This doc does a good job of telling us what we are doing. But it feels like the onus is on the reader to figure out how this differs from the python implementation. I think it would be helpful to call out early and explicitly exactly what we are changing (or potentially controversially keeping) compared to the existing impl.

For example, are we keeping the (questionable) OR edge logic we use in python, or are we going to use AND for consistency?

mkmeral · 2026-02-12T18:15:41Z

designs/typescript-graph.md

+
+Not covered in this document:
+
+- Swarm builds on the same core interfaces introduced here (Status, Node, streaming events) and should not require a separate design discussion


(outside scope of this doc) one thing i'd like to see is how we can actually get rid of the tool injection logic. I don't like that swarm is so intrusive.

Probably we should switch to structured output

zastrowm · 2026-02-12T18:16:05Z

designs/typescript-graph.md

+}
+```
+
+- Implements `MultiAgentBase` for uniform orchestration


What's in MultiAgentBase? Is the a world in which we ditch MultiAgentBase and just do AgentBase? In Python I think MultiAgentBase just includes serialize_state & deserialize_state and maybe those go on Agentbase instead? (almost like a snapshot)

mkmeral · 2026-02-12T18:17:55Z

designs/typescript-graph.md

+```typescript
+class GraphResult {
+  readonly status: Status
+  readonly results: Record<string, NodeResult>


Can we have a list of node results? This is one of the limitations we had with initial version of the graph. It didn't allow for cycles so each agent can have a single result.

With the cyclic graphs, each agent can generate multiple node results

mehtarac · 2026-02-12T18:18:15Z

designs/typescript-graph.md

@@ -0,0 +1,351 @@
+# TypeScript SDK - Graph - Design


high level q: Is there any deviation in the dev ex of using graphs in ts vs python?

Slight deviations yes. Apologies I didn't have time to include in the doc. I'll address after review.

mkmeral · 2026-02-12T18:19:17Z

designs/typescript-graph.md

+- Aggregate result from a graph execution
+
+```typescript
+class GraphState extends MultiAgentState {


again for cyclic graphs. what if one execution succeeded and one failed?

this is pretty annoying for Python bc we had to think about backwards compatibility with the already established data types/classes

zastrowm · 2026-02-12T18:20:01Z

designs/typescript-graph.md

+//                             |
+//                       formatOutput  (condition: approved)
+//
+const graph = new GraphBuilder()


Have we considered alternative/simpler devx's?

GraphBuilder.build( flows: [ [ researcher, writer], [ writer, reviewer], [ reviewer, writer, (state) => !state.user.approved], [ reviewer, formatOutput, (state) => !state.user.approved], ], options: { maxNodeExecutions: 10, executionTimeout: 60, userSchema: z.object({ drafts: z.number().default(0), approved: z.boolean().default(false), }), } )

or even

GraphBuilder.build( flows: [ [ researcher, writer, reviewer], [ reviewer, writer, (state) => !state.user.approved], [ reviewer, formatOutput, (state) => !state.user.approved], ], options: { maxNodeExecutions: 10, executionTimeout: 60, userSchema: z.object({ drafts: z.number().default(0), approved: z.boolean().default(false), }), } )

JackYPCOnline · 2026-02-12T18:20:03Z

designs/typescript-graph.md

+- Implements `MultiAgentBase` for uniform orchestration
+- `state` is optional; if not provided, the graph creates a fresh `GraphState` internally
+- Uses eager execution — as soon as a node completes, newly-ready downstream nodes start immediately (no batch waves)
+- A node is "ready" when all of its incoming edge sources have completed and all edge conditions evaluate to true


We get rid of any for sure right

mkmeral · 2026-02-12T18:20:51Z

designs/typescript-graph.md

+
+- Directed edge between two nodes
+- `handler`: optional function evaluated at runtime to determine whether the edge should be traversed
+- Unconditional edges are always traversed when the source node completes


so this is OR logic on edges, then?

mkmeral · 2026-02-12T18:22:01Z

designs/typescript-graph.md

+class GraphBuilder {
+  addNode(agent: Agent | MultiAgentBase | NodeHandler, config?: NodeConfig): GraphBuilder
+  addEdge(source: string, target: string, handler?: EdgeHandler): GraphBuilder
+  build(config?: GraphConfig): Graph


nit: if we are doing builder pattern, i'd expect the configs to be set-able on the builder directly, i.e. builder.set_timeout()

zastrowm · 2026-02-12T18:22:22Z

designs/typescript-graph.md

+
+Each graph invocation could be assigned a unique state ID. The graph engine would store and retrieve state through a session manager keyed by this ID, rather than holding state directly as fields on graph and node instances. When invoking a graph, customers could specify a state ID to resume from a previous invocation's state.
+
+This matters for concurrent isolation — if state lives as instance members on the graph or node objects, every invocation overwrites the same fields and concurrent calls interfere with each other. An in-memory session manager that stores state in a table (keyed by state ID) solves this without requiring external storage. It gives each invocation its own isolated state while keeping the same graph instance reusable across concurrent calls.


We can explore this, but I am curious on the repercussions; this would lean more towards returning a state "object" rather than having state on the graph/swarm

For graphs/swarm this might make a lot of sense to start with off the bat - though a POC would help here

zastrowm · 2026-02-12T18:23:27Z

designs/typescript-graph.md

+
+### Remote Node
+
+A remote node would execute on a different process or machine — wrapping an agent behind an API, an MCP server, or an A2A endpoint. It would serialize input, send it over the wire, and deserialize the response back into `ContentBlock[]`. The `Node` abstraction already supports this — a `RemoteNode` subclass would implement `_stream()` with network calls instead of local execution, and the graph engine wouldn't need to change since it calls `node.stream()` uniformly.


This differs from RemoteAgent in that it elevates the concept of remote execution outside of an agent? I like the idea.

There's nothing preventing this today in Python is there? Assuming we support it via FunctionNode?

zastrowm · 2026-02-12T18:23:51Z

designs/typescript-graph.md

+  .addNode(reviewer)
+  .addNode(formatOutput)
+  .addEdge('researcher', 'writer')
+  .addEdge('writer', 'reviewer')


Referencing by id/name is odd IMHO; why not pass the object itself

zastrowm · 2026-02-12T18:24:13Z

designs/typescript-graph.md

+  .addNode(formatOutput)
+  .addEdge('researcher', 'writer')
+  .addEdge('writer', 'reviewer')
+  .addEdge('reviewer', 'writer', (state) => !state.user.approved)


For extensibility I'd suggest:

Suggested change

.addEdge('reviewer', 'writer', (state) => !state.user.approved)

.addEdge('reviewer', 'writer', ({state}) => !state.user.approved)

But otherwise like elevating this here

mkmeral · 2026-02-12T18:26:28Z

TS vs Python Diff (what Kiro says)

Don't take it at face value, this is to augment the offline conversation right now

Comparison of the TS graph design against the current Python multiagent/graph.py implementation.

Node Architecture — Polymorphic hierarchy vs isinstance branching

Python uses a single GraphNode dataclass with executor: AgentBase | MultiAgentBase and dispatches via isinstance checks in _execute_node (~50 lines of branching).

TS introduces a polymorphic hierarchy: abstract Node → AgentNode, MultiAgentNode, FunctionNode. Each subclass implements _stream(), and the base stream() handles boilerplate (duration, status, error handling). The graph engine calls node.stream() uniformly — no type-checking needed.

FunctionNode is new — it wraps plain/async/generator functions via NodeHandler, enabling transformation, routing, and side-effect nodes without wrapping everything in an Agent. This is a gap in the Python SDK.

Execution Model — Eager vs batch-based

Python uses batch execution: nodes in a batch run in parallel via asyncio.Queue, then after the entire batch completes, newly-ready nodes are discovered and form the next batch.

TS uses eager execution: as soon as any node completes, downstream nodes that are now ready start immediately. No waiting for batch siblings.

This matters for graphs with mixed-latency nodes — in Python, a fast node in batch N+1 waits for all of batch N to finish, even if its specific dependency completed early.

TS also adds maxConcurrency to limit parallel node execution, which Python doesn't have.

Failure Semantics — Graceful vs fail-fast

Python is fail-fast: exceptions propagate up, sibling tasks in the same batch are cancelled immediately.

TS captures errors in NodeResult instead of throwing. Concurrently running nodes are allowed to finish (graceful exit). The graph is marked FAILED if any node fails. Downstream dependents of the failed node stay PENDING. Customers can override for fail-fast behavior.

TS also adds CANCELLED status and graph.cancel() for explicit graceful cancellation, which Python doesn't support.

User State — Typed via Zod vs no equivalent

TS introduces MultiAgentState.user typed via Zod schema inference (z.infer<typeof schema>), giving compile-time type safety and runtime validation. Customers pass userSchema in GraphConfig and mutate state.user directly from edge conditions and function nodes.

Python's GraphState has no user-defined state section. Edge conditions receive the full GraphState but there's no structured way to carry custom state across nodes.

Agent State Isolation

TS AgentNode always captures and restores agent messages/state per execution — the same agent instance can run multiple times without carrying over conversation history.

Python requires opting in via reset_on_revisit on the builder. Without it, revisited nodes accumulate state from prior executions.

Builder API — Config object vs setter methods

Python uses individual fluent setters: set_max_node_executions(), set_execution_timeout(), set_node_timeout(), set_graph_id(), etc.

TS consolidates into a single GraphConfig object passed to build():

.build({
  maxNodeExecutions: 10,
  executionTimeout: 60,
  maxConcurrency: 5,
  userSchema: z.object({ ... }),
})

Both auto-detect entry points (nodes with no incoming edges). Python additionally allows explicit set_entry_point().

Streaming Events — Typed objects vs dicts

Python yields plain dicts (events call .as_dict() before yielding).

TS yields typed event objects with discriminated unions via the type field, enabling pattern matching:

if (event.type === 'multiAgentNodeStartEvent') { ... }

Features in Python not covered by TS design (intentionally deferred)

Interrupts (_InterruptState, interrupt resumption, MultiAgentNodeInterruptEvent)
Session management (serialize_state / deserialize_state, SessionManager)
Hooks (BeforeNodeCallEvent, AfterNodeCallEvent, BeforeMultiAgentInvocationEvent, etc.)
OpenTelemetry tracing

The design doc explicitly states these follow existing SDK patterns and don't need separate design discussion.

Features in TS design not in Python

FunctionNode for arbitrary function/async/generator nodes
Typed user state via Zod schema
maxConcurrency limit
graph.cancel() for graceful cancellation
Graceful failure handling (errors in results, concurrent nodes finish)
Eager execution model
RemoteNode as a future consideration

zastrowm · 2026-02-12T18:27:20Z

designs/typescript-graph.md

+- Forwards input to the nested pattern and wraps its streaming events with the node's ID
+
+```typescript
+type NodeHandler = (input: NodeInput, state: MultiAgentState) => NodeHandlerResult | Promise<NodeHandlerResult> | AsyncGenerator<MultiAgentStreamEvent, NodeHandlerResult, undefined>


In general I think we should prefer objects as arguments instead of individual arguments:

Suggested change

type NodeHandler = (input: NodeInput, state: MultiAgentState) => NodeHandlerResult | Promise<NodeHandlerResult> | AsyncGenerator<MultiAgentStreamEvent, NodeHandlerResult, undefined>

type NodeHandler = ({input: NodeInput, state: MultiAgentState}) => NodeHandlerResult | Promise<NodeHandlerResult> | AsyncGenerator<MultiAgentStreamEvent, NodeHandlerResult, undefined>

Allows folks to pick and choose and allows us to extend in the future

designs - typescript - graph

58d340c