Update write-kaggle-benchmarks skill for v0.5.0 SDK + CLI updates by kaggle-agent · Pull Request #11 · Kaggle/kaggle-skills

kaggle-agent · 2026-06-04T16:34:08Z

Syncs write-kaggle-benchmarks/SKILL.md with recent updates to its two dependency skills:

Changes

1. Model slug format (correctness fix — old guidance was inverted)

All -m / dict-key examples switched from google/gemini-3.5-flash / anthropic/claude-haiku-4-5 to bare canonical slugs (gemini-2.5-pro, claude-sonnet-4).
Slug gotcha rewritten to teach the CLI's normalization rules (prefix stripped, @ → -, output always canonical) and tell agents to standardize on the bare slug.

2. New "SDK features beyond the basics" section covering v0.5.0 additions absent from the recipe:

Return-type annotation rule for tasks that return a value.
store_task=False for sub-tasks.
Run object properties (run.passed, run.result, run.assertion_results).
reasoning= and temperature= on llm.prompt(), plus kbench.last_reasoning_traces().
Video / audio content types.
chats.fork() and contexts.enter() for branching / isolated histories.
ChatRoom + Participant multi-agent pattern (Pattern I.5).
Custom assertions via @assertion_handler() and assert_tool_was_invoked.
kbench.benchmark as an alias for kbench.task.

3. CLI nuances added to Gotchas:

Dataset detach warning when re-pushing without -d.
-f -s together to backfill source notebooks into cached runs.
Sequential (non-interleaved) log streaming behavior.

4. Left alone (already correct): pacing / checkpoint discipline, silent-no-op .run() gotcha, init/auth env vars, --wait/--poll-interval semantics, repeated-flag gotcha, delete note, publish flag.

Why a human PR

Filed by Marvin directly (rather than via sloppy) because the orchestrator currently hardcodes branch=ci when cloning, and this repo's default is main — sloppy task limagoog-20260604162052-ad482b13 failed for that reason.

google-cla · 2026-06-04T16:34:19Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

dolaameng · 2026-06-04T16:38:17Z


 #### LLM resolution precedence (highest → lowest):
-1. **Explicit model in code**: `task.run(llm=kbench.llms["google/gemini-3.5-flash"])`
+1. **Explicit model in code**: `task.run(llm=kbench.llms["gemini-2.5-pro"])`


Let's revert all "gemini-2.5-pro" to "gemini-3.5-flash". But we don't need the prefix like "google/" and "anthropic/" any more for these models.

Done in 342365b — reverted to gemini-3.5-flash / claude-haiku-4-5 (bare, no prefix). The slug-normalization gotcha already calls out that prefixes are unnecessary.

Sync the recipe with its two dependency skills (kaggle-benchmarks and kaggle-cli): - Switch model examples to bare canonical slugs (gemini-2.5-pro, claude-sonnet-4) and rewrite the slug gotcha to reflect the CLI's normalization rules. - Add CLI nuances: dataset-detach warning on re-push without -d, -f -s to backfill source notebooks into cached runs, sequential log streaming behavior. - New "SDK features beyond the basics" section covering v0.5.0 additions: return-type annotation rule, store_task=False for sub-tasks, run object properties, reasoning=/temperature= on llm.prompt(), video/audio content types, chats.fork()/contexts.enter(), ChatRoom + Participant multi-agent pattern, @assertion_handler() and assert_tool_was_invoked, kbench.benchmark alias.

@dolaameng

Per @dolaameng: keep the original example model names but drop the provider prefixes (google/, anthropic/) — the canonical-bare-slug guidance still holds.

nicholaskang-us · 2026-06-04T17:52:14Z

+
+### Multimodal inputs
+```python
+from kaggle_benchmarks.content_types import videos, audios


@dolaameng - 1) what about image inputs? 2) can we clarify that video input is for youtube only

Addressed in 887047d per the videos and images cookbook recipes:

Images — added the three images.from_url/from_path/from_base64 constructors and a note on the llm.prompt (auto-base64) vs user.send (raw URL pass-through) distinction.

Videos — made the constraint explicit: YouTube URLs only, select Gemini models only; local files / non-YouTube URLs will error.

@nicholaskang-us

Per @nicholaskang-us review comment on PR #11. Expands the Multimodal inputs section to (1) show the three images.from_* constructors plus the llm.prompt vs user.send URL-handling distinction, and (2) make the YouTube-only constraint for videos explicit.

Adds an **Advanced use cases** section to `write-kaggle-benchmarks/README.md` pointing agents at the two upstream skill files when they need surface area beyond the focused recipe: - [`kaggle-benchmarks/SKILL.md`](https://github.com/Kaggle/kaggle-benchmarks/blob/ci/skills/kaggle-benchmarks/SKILL.md) — full SDK reference - [`kaggle-cli/skills/references/benchmarks.md`](https://github.com/Kaggle/kaggle-cli/blob/main/skills/references/benchmarks.md) — full CLI reference Motivated by the [2026-06-04 skill-test rerun](#11) where many rubric misses came from SDK surface the focused recipe intentionally doesn't name (`assert_not_empty`, `kbench.system.send`, `extract_code` + `script_runner`, `assess_response_with_judge`, `.evaluate(llm=[...])`, `chats.new`). Pointing agents at the upstream files lets them self-serve when those advanced patterns come up. --------- Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

dolaameng reviewed Jun 4, 2026

View reviewed changes

kaggle-agent added 2 commits June 4, 2026 16:52

Use gemini-3.5-flash / claude-haiku-4-5 in examples per review

550de0c

Per @dolaameng: keep the original example model names but drop the provider prefixes (google/, anthropic/) — the canonical-bare-slug guidance still holds.

kaggle-agent force-pushed the marvin/update-write-kaggle-benchmarks-skill branch from 342365b to 550de0c Compare June 4, 2026 16:52

dolaameng approved these changes Jun 4, 2026

View reviewed changes

dolaameng requested a review from nicholaskang-us June 4, 2026 17:39

nicholaskang-us reviewed Jun 4, 2026

View reviewed changes

nicholaskang-us approved these changes Jun 4, 2026

View reviewed changes

dolaameng merged commit d823e96 into main Jun 4, 2026
6 checks passed

dolaameng deleted the marvin/update-write-kaggle-benchmarks-skill branch June 4, 2026 17:58

kaggle-agent mentioned this pull request Jun 4, 2026

Add advanced use cases section to write-kaggle-benchmarks README #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update write-kaggle-benchmarks skill for v0.5.0 SDK + CLI updates#11

Update write-kaggle-benchmarks skill for v0.5.0 SDK + CLI updates#11
dolaameng merged 3 commits into
mainfrom
marvin/update-write-kaggle-benchmarks-skill

kaggle-agent commented Jun 4, 2026

Uh oh!

google-cla Bot commented Jun 4, 2026

Uh oh!

dolaameng Jun 4, 2026

Uh oh!

kaggle-agent Jun 4, 2026

Uh oh!

nicholaskang-us Jun 4, 2026

Uh oh!

kaggle-agent Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

kaggle-agent commented Jun 4, 2026

Changes

Why a human PR

Uh oh!

google-cla Bot commented Jun 4, 2026

Uh oh!

dolaameng Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

kaggle-agent Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

nicholaskang-us Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

kaggle-agent Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants