Fix/state differ metadata gaps#114
Conversation
- Introduced full support for `ALTER VIEW SET TAGS` and `ALTER VIEW UNSET TAGS` in the Python SDK, allowing users to add, update, and remove tags on views through the Designer or state files. Tags are now tracked in the changelog and included in SQL generation. - Added bulk operations for applying tags to all views within a catalog or schema scope. - Refactored the `state_differ.py` module into focused components for better maintainability, including `operation_builders.py`, `grant_differ.py`, `metadata_differ.py`, and `bulk_operations.py`. - Improved performance of live integration tests by optimizing existence checks to use single `information_schema` queries, resulting in approximately 7x faster test execution on large metastores.
- Bumped version number to 0.2.10 in package.json files for the main project, Python SDK, VS Code extension, and documentation site. - Updated CLI version in runtime_info.success.json to reflect the new version. - Revised databricks-asset-bundles.mdx and release-notes.mdx to include version 0.2.10 and highlight new features such as view tags support, multi-principal bulk grants, and improved Python environment detection. - Ensured consistency in versioning across all related files and documentation.
There was a problem hiding this comment.
Pull request overview
This PR expands Unity Catalog governance support and improves developer UX/performance across the SchemaX stack, including new view tag operations, a refactored state differ, VS Code extension Python interpreter resolution improvements, and faster live integration test helpers. It also updates marketplace/docs content and bumps versions to 0.2.10.
Changes:
- Add dedicated view tag operations end-to-end (differ → reducer → SQL generator → bulk ops + tests).
- Improve VS Code extension backend Python resolution (prefer
python.defaultInterpreterPath, run via shell). - Refactor Unity state differ into focused modules; add extensive new unit tests; update docs/packaging/version strings.
Reviewed changes
Copilot reviewed 35 out of 35 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/vscode-extension/src/backend/pythonBackendClient.ts | Adds interpreter candidate ordering and switches spawn to shell: true for env resolution. |
| packages/vscode-extension/tests/unit/pythonBackendClient.test.ts | Adds unit test coverage for configured interpreter path precedence. |
| packages/vscode-extension/tests/mocks/vscode.ts | Updates VS Code mock configuration getter behavior. |
| packages/vscode-extension/src/webview/components/BulkOperationsPanel.tsx | Adds comma-separated multi-principal grants and updates UI strings. |
| packages/python-sdk/src/schemax/providers/unity/state_reducer.py | Adds reducers for set_view_tag / unset_view_tag. |
| packages/python-sdk/src/schemax/providers/unity/state_differ.py | Refactors differ to extracted modules and adds view tag/property diffs + greenfield traversal. |
| packages/python-sdk/src/schemax/providers/unity/bulk_operations.py | Adds recursive “add all…” helpers, including view-tag emission on new views. |
| packages/python-sdk/src/schemax/providers/unity/metadata_differ.py | Adds metadata/property/tag/constraint/column diff logic in a dedicated module. |
| packages/python-sdk/src/schemax/providers/unity/grant_differ.py | Extracts grant diffing into dedicated module. |
| packages/python-sdk/src/schemax/providers/unity/sql_generator.py | Adds SQL generation handlers for ALTER VIEW SET/UNSET TAGS. |
| packages/python-sdk/tests/integration/live_helpers.py | Replaces full discovery with lightweight information_schema existence queries. |
| packages/python-sdk/tests/unit/test_state_reducer.py | Adds reducer tests for view tag operations. |
| packages/python-sdk/tests/unit/test_sql_generator.py | Adds SQL generation tests for view tag ops and escaping behavior. |
| packages/vscode-extension/src/webview/styles.css | Fixes modal spacing/alignment. |
| Version/doc files (package.json, pyproject.toml, READMEs, CHANGELOGs, docs) | Content rewrites and version bump to 0.2.10 across packages/docs/contracts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…hon backend - Integrated `diff_view_properties` into view operations to track changes in view properties when adding views in schemas and during state differencing. - Updated SQL generation methods in `UnitySQLGenerator` to escape tag names consistently, ensuring proper SQL syntax for `ALTER VIEW` and `ALTER TABLE` operations. - Refactored command candidate handling in `PythonBackendClient` to support absolute interpreter paths, improving compatibility with environments that have spaces in their paths. - Adjusted tests to verify the correct behavior of command execution and ensure that configured interpreter paths utilize the appropriate shell settings.
- Updated helper functions in `state_differ_helpers.py` to include type annotations for better type safety and clarity. - Enhanced the `_make_op`, `_base_state`, `_catalog`, `_schema`, `_table`, `_col`, `_view`, `_volume`, `_function`, `_mv`, `_constraint`, `_grant`, `_op_types`, and `_ops_of_type` functions to specify parameter and return types. - Improved overall code maintainability and readability by enforcing type hints across the module.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Enhanced the `_quick_query` function in `live_helpers.py` to raise clear errors on query failure or timeout, ensuring that callers receive explicit feedback instead of silent failures. - Increased the wait timeout from 30 seconds to 60 seconds to accommodate longer-running queries. - Updated the test for catalog mutations in `test_state_differ_basic.py` to assert the correct handling of catalog comment updates, ensuring that the new comment is properly reflected in the generated operations.
- Added type annotations to the `detect_rename_fn` parameter in the `diff_existing_column` function for improved type safety. - Integrated `create_column_tag_ops` into the `UnityStateDiffer` class to handle column tag operations when adding new columns, ensuring that tags are properly managed during state differencing.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if (parsedPrincipals.length > 0 && privileges.length > 0) { | ||
| for (const p of parsedPrincipals) { | ||
| ops = ops.concat( | ||
| buildBulkGrantOps(scopeResult, p, privileges, GRANT_OP_TO_TARGET[operationType]) | ||
| ); | ||
| } |
There was a problem hiding this comment.
Building grant ops via repeated ops = ops.concat(...) inside the principals loop creates a new array on every iteration. Use push/spread into a single array (or flatMap) to avoid unnecessary allocations and keep the code simpler as principal counts grow.
| test('shows no-scope message for grants when no principal is entered', () => { | ||
| render( | ||
| <BulkOperationsPanel scope="catalog" catalogId="cat_1" onClose={onClose} /> | ||
| ); | ||
| // Default op is add_table_grants; scope has 2 tables | ||
| // Default op is add_table_grants; no principal entered yet → 0 count → empty message | ||
| expect(screen.getByText(/No objects in scope for this operation/)).toBeInTheDocument(); | ||
| }); |
There was a problem hiding this comment.
The new test asserts that the empty-scope message is shown when no principal is entered, but the scope can still contain grant targets (the count becomes 0 only because principals are empty). If the UI message is updated to distinguish "no targets" vs "missing principals", this test should be adjusted to match the more accurate status text/behavior.
| response = client.statement_execution.execute_statement( | ||
| warehouse_id=config.warehouse_id, | ||
| statement=sql, | ||
| wait_timeout="60s", | ||
| ) | ||
| if not response.status: | ||
| raise RuntimeError(f"_quick_query: no status returned for: {sql}") | ||
| state = response.status.state | ||
| if state != StatementState.SUCCEEDED: | ||
| error_msg = "" | ||
| if response.status.error: | ||
| error_msg = f": {response.status.error.message}" | ||
| raise RuntimeError(f"_quick_query: statement {state}{error_msg} for: {sql}") | ||
| if not response.result or not response.result.data_array: |
There was a problem hiding this comment.
_quick_query treats any non-SUCCEEDED state (including PENDING/RUNNING from wait_timeout expiry or a cold warehouse) as a hard failure. This can make live tests flaky even when the object exists. Consider polling get_statement until a terminal state (or using a longer/parameterized wait timeout) before failing, so transient execution delays don't surface as false negatives.
| def _set_view_tag(self, operation: Operation) -> str: | ||
| """Generate ALTER VIEW SET TAGS""" | ||
| view_fqn = self.id_name_map.get(operation.payload["viewId"], "unknown") | ||
| parts = view_fqn.split(".") | ||
| catalog_name = parts[0] if len(parts) > 0 else "unknown" | ||
|
|
||
| # Apply catalog name mapping (logical → physical) | ||
| catalog_name = self.catalog_name_mapping.get(catalog_name, catalog_name) | ||
| parts[0] = catalog_name | ||
|
|
||
| view_esc = self._build_fqn(*parts) |
There was a problem hiding this comment.
View tag SQL generation manually applies catalog_name_mapping before calling _build_fqn, but id_name_map is already built with catalog name mapping applied. This extra mapping is redundant and makes view-tag ops inconsistent with other view/table operations. Consider using the same pattern as other operations (split FQN → _build_fqn) and let the existing mapping logic handle catalog translation.
Summary
Motivation / Context
ALTER VIEW SET/UNSET TAGS) were not supported despite being available in Databricks DBR 13.3+spawnwithshell: falsedidn't resolve Python from virtual environments. Bulk operations only supported a single principal. Live integration tests were taking 35+ minutes on large metastores due to fulldiscover_statecalls for simple existence checks.Type of Change
What Changed
set_view_tag/unset_view_tagoperations, builders, SQL generation, state reduction, and bulk operationsstate_differ.pyintooperation_builders.py,grant_differ.py,metadata_differ.py,bulk_operations.pyPythonBackendClientto useshell: trueand prepend VS Code'spython.defaultInterpreterPathas first candidateBulkOperationsPanelto accept comma-separated principalsdiscover_statecalls inlive_helpers.pywith singleinformation_schemaSQL querieslive_helpers.pyFROM clauses (backtick-escaped identifiers)Provider / Surface Area Impact
Affected Surfaces
Providers
Behavior Changes
Before
ALTER VIEW SET/UNSET TAGSSQL generatedspawn python ENOENTin conda/venv/uv environmentsphysical_catalogunescaped in SQL FROM clausesAfter
ALTER VIEW SET/UNSET TAGS, reducer applies to state, bulk ops panel supports view tagsdata_engineers, analysts, ml_team)information_schemaqueries (~1-2s each)Examples
Backward Compatibility
Breaking Change Details (if any)
_set_table_tag/_unset_table_tagfor older operations.Data / State / Migration Notes
Details:
set_view_tagandunset_view_tagare additive; existing projects unaffectedTesting
Automated
make allpassed (format, lint, pylint 10/10, mypy, 738 unit + 169 extension tests)Manual
python.defaultInterpreterPathto a conda/venv Python, open Designer — should load withoutENOENTALTER VIEW SET TAGSoutputTest Evidence
Observability / UX
e.g. data_engineers, analysts, ml_teamSecurity / Privacy / Compliance
Details:
live_helpers.py—physical_catalognow backtick-escaped in FROM clauses via_ident()helper. Only affects test infrastructure, not production code.Performance Considerations
Details:
information_schemaquery vs fulldiscover_state)Risks & Mitigations
shell: truein spawn could behave differently across OS shellsinformation_schemacolumn names may vary across Databricks runtimesfunction_existsvalidated against live workspaceFollow-ups / Out of Scope
<!-- TODO -->comments remain)_set_table_tag/_unset_table_tagonce all projects have migrated to dedicated view tag opsChecklist
Reviewer Notes
pythonBackendClient.ts(shell: true+ interpreter path resolution),live_helpers.pySQL queries (identifier escaping, correct column names)shell: truedoesn't help if the user hasn't activated their env and hasn't setpython.defaultInterpreterPathschemaxon system PATH; create a view with tags and verify SQL output