WIP: Add OCI skill image mounting to AgentRuntime#332
Conversation
b32eaa8 to
d3257c1
Compare
|
DO NOT MERGE at the current time. I would like feedback based on kagenti/kagenti#1342 |
d3257c1 to
274bd62
Compare
cwiklik
left a comment
There was a problem hiding this comment.
Solid implementation with proper feature gating, comprehensive tests (unit + E2E), clean separation of concerns (controller, webhook, config hash), and thorough docs. The ImageVolume K8s requirement (1.31+) is well-documented and the graceful degradation (condition + event when gate is disabled) is good UX.
Areas reviewed: Go (types, controller, webhook), Helm, CRD, Docs, Tests
Commits: 3 commits, all signed-off: yes
CI status: all passing (E2E pending manual trigger)
Suggestions (non-blocking)
1. PR body attribution (nit)
PR body ends with "Generated with Claude Code" — per repo conventions this should be "Assisted-By: Claude Code".
2. Commit hygiene (suggestion)
Commits 45e06bc ("include e2e tests for oci") and 4bd2f5a ("fixes due to code review") don't follow the imperative commit convention and are vague. Consider squashing into the main commit before merge.
3. Skill mounts applied to all containers (suggestion)
Skills are currently mounted into ALL containers including sidecars (envoy-proxy, spiffe-helper). For pods with AuthBridge injection, sidecars don't need skill files. Consider targeting only the agent container in a follow-up. Not a blocker for alpha — the extra read-only mounts are harmless — but worth tracking to avoid clutter in complex pod specs.
|
It's important to make sure that the mounted skills are listed in the AgentCard exposed by the agent running in Agent Runtime. There is a section in the AgentCard spec for that. https://agent2agent.info/docs/concepts/agentcard/ In my agent harness (https://github.com/redhat-et/docsclaw) it is implemented by the agent itself, but it would be good to have it implemented at the runtime level to make it agent-agnostic. Another important thing is ensure that images are mounted in containers read-only to avoid any risk of mutating them my malicious agents. If the Operator mounts them, it should be in its logic. |
bd605a8 to
2961db3
Compare
|
Please take a look at the SkillCard schema that I use in Skill Image: https://github.com/redhat-et/skillimage/blob/main/schemas/skillcard-v1.json |
2961db3 to
2f3bf03
Compare
Add kagenti.io/skills annotation on target workload metadata with a JSON array of mounted skill names for downstream discovery (agent card controllers, UI). The annotation is set when the skillImageVolumes feature gate is enabled and removed on skill clearing or AgentRuntime deletion. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ryan Cook <rcook@redhat.com>
2f3bf03 to
58f3815
Compare
|
@kevincogan one thing my brain is stuck on right now when we build the container image for an agent we build it with the agentcard and that agentcard is r/o. The stuck point I have is the OCI mounting for skills may be dynamic but the SKILL section of an agentcard is pretty much locked with our mechanism. Any advice or thoughts here? |
|
additionally @Ladas do you have any opinions here based on your work launching claude code and etc using the OCI mounting mechanisms |
@cooktheryan I don't think we should touch the signed card the agent serves. That stays locked down. But the Security-wise nothing changes. Verification (JWS or mTLS) still runs against the original card before any merging happens. The Your Let me know if I am missing anything. If not I can pick this up as a follow-up once yours lands. |
|
@cooktheryan should we set this PR as draft until ready to merge ? |
|
@pdettori yes for sure...i was feeling confident in the PR early then I realized how many pieces we have to tie in |
|
@cooktheryan @pavelanni @pdettori are you guys in sync with the initial community effort around OCI and skills here: https://github.com/agentskills/agentskills/discussions/292?ref=thomasvitale.com --- if will be best if we can make Kagnti as "generic" as possible and if we can join forces with the community effort and align the code it will be best. |
|
@eranra Yes, I reached out to Thomas Vitale on Slack and we are working on organizing a meeting. There is also a CNCF initiative around that: cncf/toc#1740 which I am participating in as well. |
@pavelanni Thanks for sharing ;-) I looked at the link/initiative, and it is indeed very interesting. I think we should also consider a more “shift-right” approach that automates processes and moves more of the intelligence and optimization into the runtime space. Focusing on the AI developer persona makes a lot of sense today, but as the skills and AI ecosystem evolves toward greater automation and iterative optimization, the outer loop will become just as important. In particular, the ability to automatically improve, adapt, and incorporate new skills over time will be critical for long-term scalability and operational efficiency. I think that dynamic interaction with skills is a characteristic we need to consider in the interface between agents and skills. |
Summary
skillsfield toAgentRuntimeSpecfor declaring OCI skill images to mount into agent pods as Kubernetes ImageVolumesskillImageVolumesfeature gate (default off), requires Kubernetes 1.31+FROM scratchimages withskill.yaml+SKILL.mdmountPath, making the feature framework-agnostic (Claude, Cursor, custom agents, etc.)Example
Changes
api/v1alpha1/agentruntime_types.goSkillImageRef,SkillPullPolicy,Skillsfieldinternal/webhook/config/feature_gates.goSkillImageVolumes(default false)internal/controller/agentruntime_controller.go,agentruntime_skills.gointernal/controller/agentruntime_config.gointernal/webhook/v1alpha1/agentruntime_webhook.gocmd/main.godocs/api-reference.md,docs/architecture.mdconfig/samples/agent_v1alpha1_agentruntime_skills.yaml, updated_full.yamlcharts/kagenti-operator/values.yaml, CRD YAMLagentruntime_skills_test.go,agentruntime_webhook_test.goRelationship to ConfigMap-based skill linking (kagenti/kagenti#1440)
This feature complements the ConfigMap-based skill mounting in kagenti/kagenti#1440. Both deliver skill files into agent pods, but target different maturity stages from kagenti/kagenti#1342:
/app/skills/<name>Integration opportunities for discussion
SKILL_FOLDERSenv var — #1440 setsSKILL_FOLDERSso agents discover mounted skills. This operator feature could inject the same env var so agents work transparently with both delivery mechanisms.kagenti.io/skillsannotation — #1440 stores linked skills in this annotation. The operator could write this annotation when skills are declared on the AgentRuntime CR, enabling the UI/backend to display OCI-mounted skills alongside ConfigMap-mounted ones.Coexistence — Both mechanisms can coexist on the same pod. ConfigMap volumes use names like
skill-0,skill-1; OCI ImageVolumes useskill-<name>. Different volume types, different names, no conflicts.Migration path — ConfigMap skills work today on any K8s version. OCI ImageVolume skills are the upgrade path when clusters reach K8s 1.31+. Teams can adopt incrementally.
Test plan
make manifests generate— CRD and deepcopy regeneratedgo build ./...— compiles cleanlygo test ./internal/controller/ ./internal/webhook/...— all tests passAssisted-By: Claude Code