Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@
"vector-search",
"data-engineering"
],
"skills": "./skills/"
"skills": "./skills/",
"commands": "./commands/"
}
6 changes: 6 additions & 0 deletions .github/workflows/validate-manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ on:
- 'scripts/skills.py'
- 'manifest.json'
- '.claude-plugin/**'
- 'hooks/**'
- 'commands/**'
- 'tests/**'
push:
branches:
- main
Expand All @@ -28,3 +31,6 @@ jobs:

- name: Validate manifest is up to date
run: python3 scripts/skills.py validate

- name: Test plugin hooks
run: python3 -m unittest discover -s tests -p '*_test.py' -v
35 changes: 35 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,41 @@ python3 scripts/skills.py sync # sync Codex metadata + icons only
python3 scripts/skills.py validate # check Codex metadata + icons + manifest are up to date (CI)
```

## Plugin components (hooks + commands)

Beyond skills, the Claude Code plugin ships two component dirs at the repo root.
`commands/` is declared via `"commands"` in `.claude-plugin/plugin.json`, but
**`hooks/hooks.json` is auto-loaded by Claude Code and must NOT be declared**
there. Declaring the standard path double-loads it and fails the plugin with a
"Duplicate hooks file" error.

- `hooks/`: a UserPromptSubmit prompt router (`databricks-router.py`) that
steers Databricks-related prompts into the skills, a SessionStart context
primer (`databricks-context.py`), and a PostToolUse auth-failure hinter
(`databricks-auth-helper.py`), wired via `hooks/hooks.json`. All
stdlib-only and fail-open. See [hooks/README.md](./hooks/README.md).
- `commands/`: friction-only slash commands (`/databricks:setup`,
`/databricks:doctor`). Product workflows stay in the skills, not commands, to
avoid shadowing a skill of the same name.

`python3 scripts/skills.py validate` checks these (hooks.json is valid and
references existing scripts, plugin.json does not double-declare hooks, every
command has frontmatter). After changing hook behavior, run the hook test
suite: `python3 -m unittest discover -s tests -p '*_test.py'`.
These ship via the plugin marketplace
(whole-repo source); `databricks aitools install` currently installs skills only.

**Marketplace entries are load-bearing for installed plugins.** Never remove a
shipped plugin's entry from `.claude-plugin/marketplace.json` (and never rename
the plugin or the marketplace). Claude Code re-resolves installed plugins
against the marketplace catalog at load time, so removing the entry does not
just stop updates: every existing install immediately fails to load ("Plugin
databricks not found in marketplace databricks-agent-skills") and those users
lose all skills, hooks, and commands until they manually uninstall and
reinstall from another source. Verified empirically (2026-06). Listing the
plugin on an additional marketplace, such as Anthropic's official directory,
is additive and never replaces the entry here.

## Security

When documenting examples, obfuscate sensitive info:
Expand Down
28 changes: 28 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,34 @@ python3 scripts/skills.py validate

If validation fails the error tells you which file is missing or stale; the fix is always `python3 scripts/skills.py generate` and committing the result.

## Plugin components (hooks + commands)

The Claude Code plugin ships more than skills:

- `hooks/`: `hooks.json` wires a UserPromptSubmit prompt router
(`databricks-router.py`) that steers Databricks-related prompts into the
skills, a SessionStart context primer (`databricks-context.py`), and a
PostToolUse auth-failure hinter (`databricks-auth-helper.py`). All
stdlib-only and fail-open. See [`hooks/README.md`](./hooks/README.md). Each
hook's behavior is pinned by its matching `tests/*_test.py` file; run the
suite with `python3 -m unittest discover -s tests -p '*_test.py'`.
**`hooks/hooks.json` is auto-loaded by Claude Code, so do NOT add a `"hooks"`
key to `.claude-plugin/plugin.json`, or the plugin fails to load with a
"Duplicate hooks file" error.**
- `commands/`: one `*.md` per slash command (`/databricks:<name>`), declared via
`"commands"` in `.claude-plugin/plugin.json`. Each needs frontmatter
(`description`, optional `argument-hint`, `allowed-tools`).

`scripts/skills.py validate` (run in CI) checks that `hooks/hooks.json` is valid
JSON referencing scripts that exist, that plugin.json does not double-declare the
standard hooks file, and that every command carries a `description` (quoted if it
contains a `:`, since strict YAML rejects unquoted colons). The validate
workflow also runs all hook test files.

These components ship via the plugin marketplace (the whole repo is the plugin).
`databricks aitools install` packages `skills/` only today; extending it to
hooks/commands is CLI-side follow-up work.

## Security

Please see [SECURITY](./SECURITY) for vulnerability reporting guidelines.
Expand Down
43 changes: 43 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ skill under [`./skills/`](./skills/)):
| Stable skills | ✅ (default) | ✅ |
| Experimental skills | ✅ (with `--experimental` or by name) | ❌ |
| Per-skill selection | ✅ (`databricks aitools install <name>`) | ❌ (all-or-nothing) |
| Commands & hooks | ❌ (skills only today, see below) | ✅ |
| Updates | `databricks aitools update` | Plugin marketplace update flow |
| Required outside the agent | Databricks CLI v1.0.0+ | None |

Expand Down Expand Up @@ -89,6 +90,48 @@ originally imported from
- See [`experimental/README.md`](./experimental/README.md) for the full list
and caveats.

## Commands and hooks (Claude Code)

When installed as a Claude Code plugin, the `databricks` plugin adds slash
commands and three hooks (prompt routing, session context, auth-failure hints)
on top of the skills.
(These are Claude-Code-specific and ship via the plugin marketplace; the CLI
`databricks aitools install` path installs skills only today; see the note at
the end.)

**Slash commands**: friction-only entry points; everyday work stays with the
auto-invoked skills.

- `/databricks:setup [workspace-url]`: auth/onboarding. Install check, then an
OAuth / PAT / service-principal profile, then verify.
- `/databricks:doctor [profile]`: read-only health check (CLI version, auth,
workspace reachability, compute, recent job failures).

(Product workflows such as apps, jobs, pipelines, DABs, etc. are handled by the
skills, not commands, so they aren't duplicated here.)

**Hooks** (`hooks/`, all fail-open):

- **Prompt router** (UserPromptSubmit): a fast keyword regex (sub-50ms, no LLM,
no network) over each prompt. When the prompt is Databricks-related, it injects
a note steering Claude to load `databricks-core` plus the matching product
skill before answering. The full note fires once per session; later Databricks
prompts get a one-line reminder. Unrelated prompts are untouched. No
permission gating, no cost warnings.
- **Context primer** (SessionStart, skipped on resume): injects the routing
rule, CLI version, configured profile names and any
`[__settings__].default_profile` (read locally, no network call, no token
values), and env/in-platform auth state.
- **Auth-failure hint** (PostToolUse on Bash): when a `databricks` command fails
with an auth-shaped error, adds one line suggesting `/databricks:doctor` or
`databricks auth login` before retrying. Never blocks or rewrites commands.

> **Distribution parity (follow-up).** The plugin marketplace ships the whole
> repo (`marketplace.json` `source: "./"`), so commands and hooks come with it.
> `databricks aitools install` currently packages only `skills/`, so CLI-install
> users don't yet get commands/hooks. Closing that gap is tracked as CLI-side
> work.

## Structure

Each skill follows the [Agent Skills Specification](https://agentskills.io/specification):
Expand Down
41 changes: 41 additions & 0 deletions commands/doctor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
description: "Read-only Databricks health check: CLI, profiles, auth validity via one API call. Pass `full` to also check compute and recent job failures."
argument-hint: "[profile] [full]"
allowed-tools: Bash(databricks:*), Read
---

# Databricks Doctor

Run a **read-only** health check and report a short status table. Make no
changes; every step below only reads. If a subcommand or flag is unfamiliar,
check `databricks <group> --help` first rather than guessing.

Run these in order. Don't stop on the first failure; collect what you can and
report the rest as unknown.

1. **CLI**: `databricks --version`. Flag only if it's missing; don't gate on a
specific version (the CLI surfaces its own update notice).
2. **Profiles**: `databricks auth profiles`. List configured profiles and
validity. If `$1` is given, use that profile for the rest. Otherwise, if more
than one profile exists, ask the user which to use (**never auto-select**).
3. **Auth method**: `databricks auth describe --profile <profile>` shows the
effective host, user, and credential source (never pass `--sensitive`).
4. **Auth validity**: `databricks current-user me --profile <profile>`. This
single API call proves the credentials work end to end (token valid,
workspace reachable, expected identity); don't probe other APIs for it.
For account-level profiles (an `accounts.*` host), `current-user me` does
not exist; report what `auth describe` resolved instead.

Stop here by default. Run the extended checks below only when the user passed
`full` or asked about compute or jobs:

5. **Compute**: `databricks warehouses list` and `databricks clusters list` for
the profile. Note what's running.
6. **Recent job failures**: list recent job runs (e.g.
`databricks jobs list-runs --limit 20 --profile <profile>`) and surface any
recent failures.

Then print a compact table: **check | status (✅/⚠️/❌) | detail**. End with the
single most useful next action (e.g. "run `/databricks:setup` to add a profile").

This is a status check; it only reads, so don't run anything that changes state.
53 changes: 53 additions & 0 deletions commands/setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
description: "Set up Databricks CLI auth: install check, then an OAuth / PAT / service-principal profile (workspace or account-level), then verify."
argument-hint: "[workspace-or-account-url]"
allowed-tools: Bash(databricks:*), Read
---

# Databricks Setup

Guide the user through Databricks CLI authentication. Use the **databricks-core**
skill for the authoritative auth details; this command is the step-by-step
wrapper around it.

1. **CLI present?** `databricks --version`. If it's missing,
follow the install steps in the databricks-core skill
(`databricks-cli-install.md`). In sandboxed environments (Cursor, containers),
print the install command and ask the user to run it in their own terminal.
Don't try to install into the sandbox.
2. **Existing profiles?** `databricks auth profiles`. Show what's already
configured. If a working profile exists, ask whether to reuse it or add a new
one.
3. **Pick an auth method** (ask the user; `$1` may be a workspace or account
console URL):
- **OAuth U2M** (default, interactive):
`databricks auth login --host <workspace-url> --profile <name>`. Opens a
browser. Best for laptops. If the user doesn't know their workspace URL,
plain `databricks auth login --profile <name>` opens login.databricks.com
to sign in and pick a workspace. URLs copied from the browser may carry
`?w=<workspace-id>` or `account_id=` query params; the CLI accepts them,
but quote the URL so the shell doesn't interpret the `?`.
- **Account-level**: when the host is an account console URL
(`accounts.cloud.databricks.com`, `accounts.azuredatabricks.net`,
`accounts.gcp.databricks.com`), also pass the account ID:
`databricks auth login --host <account-url> --account-id <uuid> --profile <name>`.
Ask for the account ID if it isn't in the URL (it's the UUID shown in the
account console address bar).
- **PAT**: `databricks configure --token --profile <name>`; the user pastes
a personal access token. This command prompts on stdin, so don't run it
yourself (it hangs without a TTY): ask the user to run it in their own
terminal, then continue once it's done. The same applies to
`databricks auth login` when no browser can open (headless or sandboxed
sessions).
- **Service principal (M2M)**: client id/secret via profile or env. Use for
CI/automation; never a personal PAT in CI.
- **In-platform** (notebook/cluster): `DATABRICKS_HOST`/`DATABRICKS_TOKEN`
are already injected, so no setup is needed.
4. **Confirm before writing** any profile; auth writes to `~/.databrickscfg`.
5. **Verify**: `databricks current-user me --profile <name>` returns the
expected user. For account-level profiles, `current-user me` doesn't exist;
use `databricks auth describe --profile <name>` and check the resolved host
and account ID.

Never echo tokens or secrets back. Never auto-select a profile. When done,
suggest `/databricks:doctor` for a full health check.
81 changes: 81 additions & 0 deletions hooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Plugin hooks

Three hooks make sure Databricks work flows through the skills. All are
stdlib-only Python and **fail open** (any error prints `{}` / no output and
exits 0, so a broken hook never blocks a prompt, session start, or tool call).
`hooks.json` wires them in; Claude Code expands `${CLAUDE_PLUGIN_ROOT}`. Claude
Code auto-loads `hooks/hooks.json`, so it is **not** declared in `plugin.json`
(declaring the standard path double-loads it and fails the plugin).

Each hook is pinned by a test file in `tests/` at the repo root; run the whole
suite with `python3 -m unittest discover -s tests -p '*_test.py'`.

## `databricks-router.py`: prompt router (UserPromptSubmit)

Runs a fast keyword regex (sub-50ms, no LLM, no network) over each user prompt.
When the prompt is Databricks-related, it injects an `additionalContext`
instruction telling Claude to load `databricks-core` plus the matching product
skill before answering. When it isn't, it prints `{}` and stays out of the way.

The full instruction is injected **once per session** (tracked by a marker file
in the temp dir keyed on the payload's `session_id`); later Databricks prompts
in the same session get a one-line reminder instead, so long sessions don't pay
the full routing block on every turn.

There's no second agent to delegate to. Claude itself drives the `databricks`
CLI through the skills, so "routing" just means "make sure the Databricks skills
are loaded." There is **no permission gating and no cost warning** here.

Precision is tuned to avoid over-routing:

- **STRONG** terms (`databricks`, `unity catalog`, `lakeflow`, `dbfs`,
`databricks.yml`, `spark declarative pipelines`, `delta live tables` (the
legacy name still routes), ...) always route, even alongside an
alternative-platform mention, so "migrate from redshift to databricks" routes.
- **AMBIGUOUS** terms (`declarative pipelines`, `model serving`, `vector
search`, `mlflow`, `pyspark`, `genie`, ...) route only when no **SUPPRESS**
term is present.
- **SUPPRESS** terms (alternative data platforms, Jenkins, and plainly-local
dev work like `git commit`, `read the file`, `unit test`, `npm`) hold back an
ambiguous match.
- **URLs**: code-hosting URLs are blanked before matching, so `databricks`
appearing only as a GitHub/GitLab org or repo name
(`github.com/databricks/...`) does not route. URLs whose hostname contains
`databricks` (workspace and docs hosts) still do.

Edit those three lists when the product surface changes. Behavior is pinned by
`tests/databricks_router_test.py`.

## `databricks-context.py`: context primer (SessionStart)

Injects a compact banner at session start: the routing rule (load
`databricks-core` + the product skill), CLI presence + version, configured
profile names plus any `[__settings__].default_profile` (parsed from
`~/.databrickscfg` locally, **no network call**, token values never printed),
and whether env/in-platform auth is set. If the CLI isn't installed it points
at `/databricks:setup`.
Covered by `tests/databricks_context_test.py`.

Its `hooks.json` entry uses `"matcher": "startup|clear|compact"`: the banner
fires for new sessions, `/clear`, and after compaction, but **not on resume**,
where the prior context already contains it.

## `databricks-auth-helper.py`: auth-failure hint (PostToolUse)

Watches Bash tool results (matcher: `Bash`). When a `databricks` command's
output matches a phrase-shaped auth-failure signal (missing default
credentials, `invalid_grant`, `401 unauthorized`, invalid/expired token), it
injects one line suggesting `/databricks:doctor` or `databricks auth login`
before any retry. It never blocks or rewrites tool calls; bare status codes in
ordinary output do not trigger it. Only commands that actually **invoke** the
`databricks` executable count: `databricks` appearing as a repo path, URL, or
argument (`gh pr view --repo databricks/cli`) does not, since such output can
legitimately quote auth-failure phrases without any auth problem.
Covered by `tests/databricks_auth_helper_test.py`.

## Distribution note

These ship with the Claude Code plugin (the whole repo is the plugin via
`marketplace.json` `source: "./"`). The Databricks CLI install path
(`databricks aitools install`) currently packages **skills only**. See the repo
README for the parity follow-up.
Loading
Loading