databricks-genie skill: add data-grounded Genie Space authoring playbook#560
Open
ryanbates99 wants to merge 1 commit into
Open
databricks-genie skill: add data-grounded Genie Space authoring playbook#560ryanbates99 wants to merge 1 commit into
ryanbates99 wants to merge 1 commit into
Conversation
…skill The databricks-genie skill's create guidance stops at the basics (name, tables, description, sample questions) and leaves Genie to infer the rest. The manage_genie tool already accepts a full serialized_space payload, but the skill never teaches the model to author one — so curated instructions, column synonyms, certified example SQL, join specs, reusable measures/filters, and benchmarks all get skipped, and spaces answer worse than they could. Add authoring.md: a playbook for building a high-quality serialized_space, grounded in the table's real values rather than invented ones. Covers the canonical GSL text-instruction sections, ~12 example question->SQL pairs with usage guidance and parameters, 5 measures / 5 filters / 3 expressions, join specs, 10 benchmarks, exact serialized_space field shapes, the API constraints (1 text instruction, 25k chars/field, 30 tables, 3.5MB, 32-char hex IDs, array sorting), and a SQL-validation step before embedding. Wire it into SKILL.md (when-to-use, Quick Start, Reference Files) and spaces.md (creation workflow, poor-query-generation troubleshooting). No code change needed: the installer auto-discovers extra skill files and the test manifest asserts no specific file list. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
databricks-skills/databricks-genie/authoring.md— a playbook for authoring a high-quality Genie Space via a fully-curatedserialized_space, and wires it intoSKILL.mdandspaces.mdat the natural decision points.Why
The
databricks-genieskill's create guidance stops at the basics: display name, table list, description, and a few sample questions. Themanage_genietool already accepts a completeserialized_spacepayload, but the skill never teaches the model to author one. As a result, the curation that actually drives Genie answer quality gets skipped:Spaces created via the skill therefore answer worse than they could, because the model improvises filter values from column names (e.g.
status = 'ACTIVE'when the data holds'active') and leaves the rich layers empty.What the playbook adds
get_table_stats_and_schema, then pull actual distinct values withexecute_sqlbefore generating any SQL/instructions — never invent status/category/tier values.usage_guidanceand parameters, 5 measures / 5 filters / 3 expressions, join specs for 2+ tables, 10 benchmarks.serialized_spacefield shapes for every layer (the array-of-lines SQL convention, the--rt=FROM_RELATIONSHIP_TYPE_*--join tag, parametertype_hintnormalization, excluded-column handling).text_instructionsobject, ≤25,000 chars per string, ≤30 tables, ≤3.5 MB total, 32-char lowercase-hex unique IDs, and array sorting rules.execute_sqland fix or drop failures before embedding.This is documentation only and adapts the approach to the kit's existing MCP tool surface (
get_table_stats_and_schema,execute_sql,manage_genie). No code changes: the skill installer auto-discovers extra files in the skill directory, and the test manifest asserts no specific file list.Testing
install_genie_code_skills.py) uploads all non-SKILL.mdfiles in a skill directory via the git tree listing, soauthoring.mdships automatically.databricks-genietest manifest usesexpected_files: []and does not assert on a file list, so adding a reference file does not break the skill test baseline.serialized_spacefield shape and API constraint against the Genie Conversation API docs (serialized_spaceschema + validation rules) and the canonical GSL instruction section vocabulary.This pull request and its description were written by Isaac.