(Feat) Implement remote Agent type#821
Conversation
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
…luded in the final status update Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
168d68c to
74cf718
Compare
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
|
holy moly, there's a lot of merge conflicts. update: resolved merge conflicts locally. unsure if it's all still working. need to test it again + do a self code review to make sure I didn't remove any work from main during merging. need to find my reproduction setup again :rip: |
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
…t variety of token usage storage Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
…tore task information from remote agents, instead of handling its own Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
There was a problem hiding this comment.
Pull Request Overview
This PR implements support for remote agents in the kagent system, allowing for A2A (Agent-to-Agent) communication with agents hosted on external servers. The implementation includes the creation of a new "Remote" agent type that uses agent cards to discover and communicate with external agents.
Key changes include:
- Added "Remote" agent type with agent card URL and optional server URL configuration
- Implemented UI components for creating, editing, and previewing remote agents
- Created middleware for recording remote agent interactions in the database
- Extended the A2A protocol handling to support remote agent task management
Reviewed Changes
Copilot reviewed 35 out of 37 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/src/types/index.ts | Added Remote agent type and RemoteAgentSpec interface |
| ui/src/lib/messageHandlers.ts | Enhanced token usage extraction for remote agents and improved artifact streaming |
| ui/src/components/sidebars/AgentDetailsSidebar.tsx | Updated to handle remote agents without model display |
| ui/src/components/create/SelectToolsDialog.tsx | Added null safety for search terms and descriptions |
| ui/src/components/AgentsProvider.tsx | Added remote agent validation and form data types |
| ui/src/components/AgentCardPreview.tsx | New component for previewing remote agent cards |
| ui/src/components/AgentCard.tsx | Added preview functionality for remote agents |
| ui/src/app/agents/new/page.tsx | Extended agent creation form with remote agent fields |
| ui/src/app/actions/utils.ts | Fixed error response to include error field |
| ui/src/app/actions/agents.ts | Added getAgentCard API call and remote agent form handling |
| go/api/v1alpha2/agent_types.go | Added Remote agent type and RemoteAgentSpec to CRD |
| go/internal/controller/translator/adk_api_translator.go | Added remote agent card fetching and translation |
| go/internal/controller/reconciler/reconciler.go | Enhanced reconciler to handle remote agents and store agent cards |
| go/internal/a2a/recording_manager.go | New recording manager for remote agent task/session storage |
| go/internal/a2a/a2a_handler_mux.go | Updated A2A handler to use recording manager for remote agents |
| go/internal/httpserver/handlers/agents.go | Added agent card retrieval endpoint |
| go/internal/database/models.go | Extended Agent model with remote config and agent card storage |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
EItanya
left a comment
There was a problem hiding this comment.
The overall design makes sense, but there are some details that need to be worked out before we can merge this.
go/internal/a2a/recording_manager.go
Outdated
| if err := m.dbClient.StoreTask(task); err != nil { | ||
| logger.Error(err, "Failed to store sync task", "taskID", task.ID, "contextID", task.ContextID) | ||
| } |
There was a problem hiding this comment.
Why are we failing silently here? The whole point of this functionality is to record this
There was a problem hiding this comment.
Done, I did a more explicit get -> if not exist, create for the Session.
If a session did not exist (or get created successfully), then we do not store a task. This is assuming a session -> tasks relation. If we allow for Tasks to exists without a Session, I could change this to always create a Task.
|
|
||
| // Marshal remote agent's AgentCard to store in DB | ||
| var serializedCard string | ||
| if agent.Spec.Type == v1alpha2.AgentType_Remote && agentOutputs != nil { |
There was a problem hiding this comment.
How would a remote agent have a card stored from the translation? If we're going to store the agent card we should do it in a separate controller from this one since there's this required async logic of the card.
| Description: toolAgent.Spec.Description, | ||
| }) | ||
| case v1alpha2.AgentType_Remote: | ||
| /* TODO: Add support for remote agents. |
There was a problem hiding this comment.
The translator is a pure function, if we are going to make a network call we should do it in the reconciler
There was a problem hiding this comment.
Makes sense. Just need to figure out how the polling of agent card ties into this a bit better. I was/am planning on handling polling logic in a follow-up to avoid blowing up this PR with diffs.
Self-thought/note:
I think we do something along these lines for MCP tools, where there's a watcher that updates agents when mcp tools update. So something like that would be done for remote agent-as-a-tool, where updates would occur when they get re-polled (or when the remote agent reconciles).
Additionally, doing something like storing the agent card hashed in the remote agent annotation could kick off its reconciliation, which would should then update the usages of it as a tool from agents watching it.
Would probably be worth huddling (unless this is an easy yes/no). For using it as a tool we can:
- Add a watcher, so when remote agents reconcile, agents using them as a tool do so as well
- During reconciliation, remote agent urls/configs are updated based on latest data (from the database)
This way it stays up to date.
When we implement polling, we'll need to see how to handle it (in my opinion adding an annotation on remote agents that is a hash of the config should be good. this way it forces a reconciliation + caller agents update). I'm trying this locally. It works in theory, but getting a 503 adk error when the caller agent tries calling the remote agent 🤔
There was a problem hiding this comment.
Why do we need a whole separate agent for this? Can't we just treat it as remote in the tests?
There was a problem hiding this comment.
I'm going to play around with this idea a bit more.
Initially I was against it because it would mean we need the BYO kebab Agent to be created first, so that its deployment + service exists and the Remote Agent can reference the service URL. But should be simple to deal with.
Now, after trying it out: the remote agent test passes, but when I try to chat with the remote agent...
Response in chat:
Client error '403 Forbidden' for url 'http://kagent-controller.kagent:8083/api/sessions/ctx-a6f000bf-7e85-4465-ab06-f08dd9be04d7/events?user_id=admin@kagent.dev' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
Kebab agent's deployment:
INFO: 10.244.0.51:43074 - "POST / HTTP/1.1" 200 OK
ERROR:google_adk.kagent.adk._agent_executor:Error handling A2A request: Client error '403 Forbidden' for url 'http://kagent-controller.kagent:8083/api/sessions/ctx-a6f000bf-7e85-4465-ab06-f08dd9be04d7/events?user_id=admin@kagent.dev'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
Traceback (most recent call last):
File "/.kagent/python/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 125, in execute
await self._handle_request(context, event_queue, runner)
File "/.kagent/python/packages/kagent-adk/src/kagent/adk/_agent_executor.py", line 206, in _handle_request
async for adk_event in runner.run_async(**run_args):
...<4 lines>...
await event_queue.enqueue_event(a2a_event)
File "/.kagent/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 250, in run_async
async for event in agen:
yield event
File "/.kagent/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 228, in _run_with_trace
await self._append_new_message_to_session(
...<5 lines>...
)
File "/.kagent/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 357, in _append_new_message_to_session
await self.session_service.append_event(session=session, event=event)
File "/.kagent/python/packages/kagent-adk/src/kagent/adk/_session_service.py", line 168, in append_event
response.raise_for_status()
~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/.kagent/python/.venv/lib/python3.13/site-packages/httpx/_models.py", line 829, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '403 Forbidden' for url 'http://kagent-controller.kagent:8083/api/sessions/ctx-a6f000bf-7e85-4465-ab06-f08dd9be04d7/events?user_id=admin@kagent.dev'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
INFO: 10.244.0.1:51740 - "GET /health HTTP/1.1" 200 OK
INFO: 10.244.0.1:46628 - "GET /health HTTP/1.1" 200 OK
I'll need to dig into this, but it's probable that it's due to the remote agent a2a storing information in the DB and the other agent doing so as well (since byo + declarative agents also store in the db themselves). I'm not even sure if this would be a "real world" issue, since a user likely wouldn't be defining an agent created by kagent as a remote agent. (unless there's valid cross-cluster use cases for this, but in this case, it would be good to config remote agent with a bool field to not store in the db)
There was a problem hiding this comment.
Yeah, so while the test (somehow) passes, the session storage is a big issue if the remote agent writes to the database as well.
kagent >> get session
+---+------------------------------------------+--------------------+--------------------------------+----------------------+
| # | ID | NAME | AGENT | CREATED |
+---+------------------------------------------+--------------------+--------------------------------+----------------------+
| 1 | ctx-5b33e4b2-d2d6-4854-955c-0f6dea1a282f | remote-kebab-agent | kagent__NS__remote_kebab_agent | 2025-09-17T22:54:21Z |
+---+------------------------------------------+--------------------+--------------------------------+----------------------+
remote agent "owns" this session id, so when the agent it communicates to tries accessing/writing it, it fails. which makes sense.
So what happens is that
- The remote Agent handler creates a Session (as expected)
- It communicates with the other Agent
- The other agent was created by us, meaning it has DB access
- The other agent tries creating/getting the Session
- The other agent fails and returns an error, because it already exists and is owned by the Remote Agent.
If we want to allow the case where a remote agent is one which already writes its sessions to the DB, then we'll want to add a spec field to disable the database writing -- or do something else where it passes a different context/id to the remote agent. The main issue here would be figuring out how to handle their own sessions/display.
[...]
type: Remote
remote:
[urls...]
persistData: false # or persistTask/Session, persist, store, etc..…nt check Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
…ues as tool Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
* main: Fix UI/streaming timeouts for long running LLM requests (kagent-dev#907) fix helm value for env (kagent-dev#910) feat: allow per-agent header configuration for tools (kagent-dev#884) feat: Set system message from ConfigMap or Secrets (kagent-dev#894) Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>
51d1d2e to
15801b4
Compare
Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>
Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
| case v1alpha2.AgentType_Remote: | ||
| cfg.RemoteAgents = append(cfg.RemoteAgents, adk.RemoteAgentConfig{ | ||
| Name: utils.ConvertToPythonIdentifier(utils.GetObjectRef(toolAgent)), | ||
| Url: agent.Spec.Remote.DiscoveryURL, | ||
| Headers: headers, | ||
| Description: toolAgent.Spec.Description, | ||
| }) |
There was a problem hiding this comment.
Q: Since we're not using Watches, does this mean that if a remote agent's discovery URL is updated, any Agent referencing it as a tool won't get it until they manually reconcile?
(iiirc, the deployment for declarative agents have their agent-as-tools urls "baked in" their secrets, which update on reconciliation.)
|
When will this be merged into kagent? This is a feature that I would find very valuable |
|
This pull request has been marked as stale because of no activity in the last 15 days. It will be closed in the next 5 days unless it is tagged "no stalebot" or other activity occurs. |
|
When this feature will be merged, it is very important for us to use remote agents. |
leftovers (passing to peter):
look into why the watch + reconciliation isn't happening aa2c5f1it was working during this commit, which was right before making changes to simplify the CRD. maybe just need to re-spin up cluster, or forgot a change.OnSendMessageStream, this way we 100% have a finalized Task to store. A stream could end withStatusUpdate final:true, without aTaskobject, meaning we'd have to have built it during on our end during stream.Context
Adding support for a2a with agents hosted on a remote server.
My goal is for this work is to encompass the simple bare necessities 🐻 of remote agent support, which is usable (although requires manual reconciliation for agent card updates). Then immediately work on follow-ups (polling).
Changes
TODO: Update description based on new simpler CRD (discoveryUrl)
Remotetype for Agents. Thisremoteagent type allows for a2a communications with agents served elsewhere..errorfield to catch server errors on creation, but our error wrapper only set amessage, so I ensured it also seterror.remoteConfig: similar to existing config, but for remote agents. reused existing remote config to store the main remote agent information.agentCard: only using for remote agents. used for UI/displaying purposes if a user wants to see the agent card details.agentCardinstead ofremoteConfigin the db -- they hold similar information (name, description, url), the agent card just holds more.agent carddata on the agent status instead of the db -- this is similar to something done in gloo portal where it stores the api discovery information on the status of some(?) custom resource def.agentCardpreview, so not storing the fetched data -- good for saving storage, not ideal UX.remoteConfigwhich includes agent card as[]bytedata. I think this makes most sense for simplicity's sake.TODO/Unknown
note: i need to implement a new a2a server with task handling, it looks like an update to https://github.com/kagent-dev/a2a-go is required - as this is where we handle our a2a server logic. unless we want this stateful_a2a logic to live here, but this may be a bit weird to split. we wouldn't make updates to the upstream repo our a2a-go is forked from since these changes are kagent-specific.crossing off for now while looking into this to avoid a net-new service + cross-repo changes. maybe by creating a separate task manager for remote services, this can be handled? else, we'll need to expand on the a2a-go to implement a net-new a2a server.Follow-Ups
This is a list of work I believe would work best as a follow-up to this. This would be in order to keep this PR at a reasonable size (diffs). I'm planning on implementing this/these after this gets merged, or is close-to-merge.
Updates
resolves: #820