Skip to content

ryanbates99/genie-space-cicd-example

Repository files navigation

Genie Space CI/CD example

A reference repo for managing Databricks Genie Spaces as code and promoting them across dev -> staging -> prod with Git and CI/CD.

It shows two approaches side by side:

  1. SDK + DAB (recommended, works today) — export a Genie Space to a versioned JSON artifact, rewrite environment-specific references, and create/update it in the target workspace via the Genie SDK, all orchestrated by a Databricks Asset Bundle and GitHub Actions.
  2. Native genie_spaces DAB resource (preview / not GA) — what the YAML will look like once a Genie Space becomes a first-class bundle resource. See docs/future-native-resource.md.

Why this approach

As of mid-2026, a Genie Space is not yet a supported Databricks Asset Bundle resource type (jobs, pipelines, dashboards, apps, etc. are; Genie is not). The production-ready way to do CI/CD for Genie Spaces today is to drive the Genie REST/SDK API and wrap it in a bundle. That is exactly what this repo does. When the native resource ships, migrating is mostly mechanical (see the doc above).

The pattern: export -> transform -> deploy

  DEV workspace                Git repo                 STAGING / PROD workspace
  -------------                --------                 ------------------------
  build space in UI            spaces/*.json            create_space / update_space
        |                          ^                              ^
        |  export_space.py         | commit                       | deploy_space.py
        +--------------------------+------------------------------+
                                   |
                          config/<env>.yml rewrites
                          (dev_sales. -> prod_sales.)
  1. Export — build/tune a space in the dev workspace UI, then export its full definition to a checked-in artifact:

    python src/export_space.py --space-id <dev-space-id> --out spaces/sales_assistant.json

    The artifact holds the serialized_space JSON (instructions, sample questions, table mappings, benchmarks) plus title/description.

  2. Transformconfig/<env>.yml declares per-environment warehouse_id, parent_path, and substring replacements applied to the serialized JSON (e.g. rewrite dev_sales. table references to prod_sales.). This keeps one artifact promotable to every environment.

  3. Deploy — create the space if it doesn't exist in the target, else update it in place:

    python src/deploy_space.py --space spaces/sales_assistant.json --env prod --dry-run
    python src/deploy_space.py --space spaces/sales_assistant.json --env prod

Repo layout

databricks.yml                 bundle definition + dev/staging/prod targets
resources/deploy_job.yml       Databricks Job that runs the deploy on serverless
src/genie_ops.py               export / transform / deploy helpers (Genie SDK)
src/export_space.py            CLI: pull a space out of a workspace
src/deploy_space.py            CLI: promote an artifact to an environment
spaces/sales_assistant.json    example checked-in space artifact (source of truth)
config/{dev,staging,prod}.yml  per-environment warehouse + catalog rewrites
ci-github-actions-deploy.yml   PR validation + promote-on-merge + gated prod
                               (move to .github/workflows/deploy.yml to activate)
docs/future-native-resource.md the forthcoming native genie_spaces YAML

Quickstart (local)

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Authenticate the Databricks SDK to your dev workspace (any one of):
#   export DATABRICKS_HOST=https://your-dev-workspace.cloud.databricks.com
#   export DATABRICKS_TOKEN=dapi...
# or use a CLI profile and pass --profile <name> to the scripts.

# 1. Export an existing dev space
PYTHONPATH=src python src/export_space.py --space-id <dev-space-id> --out spaces/sales_assistant.json

# 2. Preview a prod deploy without mutating anything
PYTHONPATH=src python src/deploy_space.py --space spaces/sales_assistant.json --env prod --dry-run

# 3. Promote for real
PYTHONPATH=src python src/deploy_space.py --space spaces/sales_assistant.json --env prod

CI/CD with the bundle

Deploy the bundle (ships the code + the promote job), then run the job:

databricks bundle validate -t staging
databricks bundle deploy   -t staging
databricks bundle run deploy_genie_space -t staging

The included GitHub Actions workflow (ci-github-actions-deploy.yml — move it to .github/workflows/deploy.yml to activate):

  • on PR: bundle validate + a --dry-run deploy (no mutation)
  • on merge to main: deploy bundle to staging and run the promote job
  • manual dispatch: promote to prod, gated by a protected GitHub Environment

Set these secrets (a Databricks service principal using OAuth M2M): DATABRICKS_HOST, DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET.

Notes and gotchas

  • Get the serialized JSON by exporting, don't hand-author it. Build the space in the UI, then export_space.py. The artifact in spaces/ here is illustrative; a real export is richer.
  • The deploy identity needs permissions on the target: CAN_MANAGE on the spaces / parent folder, plus access to the target SQL warehouse.
  • Replacements are plain substring swaps applied to the JSON string. Keep keys specific (include the trailing dot) so you don't get partial matches.
  • Idempotency is by title within a parent path: deploy looks up an existing space by title and updates it, otherwise creates a new one.

References

About

Reference repo: CI/CD for Databricks Genie Spaces (SDK + Asset Bundles), with a preview of the future native genie_spaces DAB resource

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages