databricks-genie skill: add data-grounded Genie Space authoring playbook by ryanbates99 · Pull Request #560 · databricks-solutions/ai-dev-kit

ryanbates99 · 2026-06-09T21:14:29Z

What

Adds databricks-skills/databricks-genie/authoring.md — a playbook for authoring a high-quality Genie Space via a fully-curated serialized_space, and wires it into SKILL.md and spaces.md at the natural decision points.

Why

The databricks-genie skill's create guidance stops at the basics: display name, table list, description, and a few sample questions. The manage_genie tool already accepts a complete serialized_space payload, but the skill never teaches the model to author one. As a result, the curation that actually drives Genie answer quality gets skipped:

column synonyms / value vocabulary
structured instructions
certified example question→SQL pairs
join specs
reusable measures / filters / expressions
benchmarks

Spaces created via the skill therefore answer worse than they could, because the model improvises filter values from column names (e.g. status = 'ACTIVE' when the data holds 'active') and leaves the rich layers empty.

What the playbook adds

Golden rule: ground everything in real data. Inspect with get_table_stats_and_schema, then pull actual distinct values with execute_sql before generating any SQL/instructions — never invent status/category/tier values.
Per-layer authoring guidance with sensible default counts: table/column descriptions + synonyms, 5 sample questions, GSL-structured text instructions (the five canonical section headers, including the verbatim "Instructions you must follow when providing summaries" string), ~12 example question→SQL pairs with usage_guidance and parameters, 5 measures / 5 filters / 3 expressions, join specs for 2+ tables, 10 benchmarks.
Exact serialized_space field shapes for every layer (the array-of-lines SQL convention, the --rt=FROM_RELATIONSHIP_TYPE_*-- join tag, parameter type_hint normalization, excluded-column handling).
API constraints to respect while assembling: at most 1 text_instructions object, ≤25,000 chars per string, ≤30 tables, ≤3.5 MB total, 32-char lowercase-hex unique IDs, and array sorting rules.
A SQL-validation step — test every example query and benchmark with execute_sql and fix or drop failures before embedding.

This is documentation only and adapts the approach to the kit's existing MCP tool surface (get_table_stats_and_schema, execute_sql, manage_genie). No code changes: the skill installer auto-discovers extra files in the skill directory, and the test manifest asserts no specific file list.

Testing

Verified the installer (install_genie_code_skills.py) uploads all non-SKILL.md files in a skill directory via the git tree listing, so authoring.md ships automatically.
Verified the databricks-genie test manifest uses expected_files: [] and does not assert on a file list, so adding a reference file does not break the skill test baseline.
Cross-checked every documented serialized_space field shape and API constraint against the Genie Conversation API docs (serialized_space schema + validation rules) and the canonical GSL instruction section vocabulary.

This pull request and its description were written by Isaac.

…skill The databricks-genie skill's create guidance stops at the basics (name, tables, description, sample questions) and leaves Genie to infer the rest. The manage_genie tool already accepts a full serialized_space payload, but the skill never teaches the model to author one — so curated instructions, column synonyms, certified example SQL, join specs, reusable measures/filters, and benchmarks all get skipped, and spaces answer worse than they could. Add authoring.md: a playbook for building a high-quality serialized_space, grounded in the table's real values rather than invented ones. Covers the canonical GSL text-instruction sections, ~12 example question->SQL pairs with usage guidance and parameters, 5 measures / 5 filters / 3 expressions, join specs, 10 benchmarks, exact serialized_space field shapes, the API constraints (1 text instruction, 25k chars/field, 30 tables, 3.5MB, 32-char hex IDs, array sorting), and a SQL-validation step before embedding. Wire it into SKILL.md (when-to-use, Quick Start, Reference Files) and spaces.md (creation workflow, poor-query-generation troubleshooting). No code change needed: the installer auto-discovers extra skill files and the test manifest asserts no specific file list. Co-authored-by: Isaac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

databricks-genie skill: add data-grounded Genie Space authoring playbook#560

databricks-genie skill: add data-grounded Genie Space authoring playbook#560
ryanbates99 wants to merge 1 commit into
databricks-solutions:mainfrom
ryanbates99:enrich-genie-space-authoring

ryanbates99 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanbates99 commented Jun 9, 2026

What

Why

What the playbook adds

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant