Fix/state differ metadata gaps by vb-dbrks · Pull Request #114 · vb-dbrks/schemax

vb-dbrks · 2026-03-10T15:57:08Z

Summary

View tags support across the full pipeline (differ, SQL generator, reducer, bulk operations)
VS Code extension Python environment detection fix for conda, venv, uv, pyenv, poetry
Multi-principal bulk grants in the Designer
Live integration test performance optimization (~7x faster)
Marketplace content rewrite for PyPI, VS Code Marketplace, and Open VSX
State differ refactored by SRP into focused modules
Version bump to 0.2.10

Motivation / Context

Issue: View tags (ALTER VIEW SET/UNSET TAGS) were not supported despite being available in Databricks DBR 13.3+
Context: Users in conda/venv/uv environments couldn't use the extension because spawn with shell: false didn't resolve Python from virtual environments. Bulk operations only supported a single principal. Live integration tests were taking 35+ minutes on large metastores due to full discover_state calls for simple existence checks.
Scope: Python SDK, VS Code extension, integration tests, docs, marketplace content

Type of Change

What Changed

Added set_view_tag / unset_view_tag operations, builders, SQL generation, state reduction, and bulk operations
Split state_differ.py into operation_builders.py, grant_differ.py, metadata_differ.py, bulk_operations.py
Changed PythonBackendClient to use shell: true and prepend VS Code's python.defaultInterpreterPath as first candidate
Updated BulkOperationsPanel to accept comma-separated principals
Replaced discover_state calls in live_helpers.py with single information_schema SQL queries
Rewrote READMEs for PyPI and VS Code Marketplace with user-focused content
Fixed bulk operations modal label alignment (CSS gap/margin)
Bumped all version references from 0.2.9 to 0.2.10
Fixed SQL injection in live_helpers.py FROM clauses (backtick-escaped identifiers)

Provider / Surface Area Impact

Affected Surfaces

Providers

Provider-agnostic/core
Unity Catalog
Other: ______

Behavior Changes

Before

View tags not supported — no ALTER VIEW SET/UNSET TAGS SQL generated
Extension failed with spawn python ENOENT in conda/venv/uv environments
Bulk grants only accepted a single principal
Live test existence checks ran full catalog discovery (30-60s each on large metastores)
physical_catalog unescaped in SQL FROM clauses

After

View tags fully supported: differ detects tag changes, SQL generator emits ALTER VIEW SET/UNSET TAGS, reducer applies to state, bulk ops panel supports view tags
Extension resolves Python from VS Code interpreter setting first, then shell PATH (works with conda, venv, uv, pyenv, poetry)
Bulk grants accept comma-separated principals (e.g. data_engineers, analysts, ml_team)
Existence checks use single information_schema queries (~1-2s each)
Catalog identifiers backtick-escaped in FROM clauses

Examples

-- New SQL generation for view tags
ALTER VIEW `my_catalog`.`my_schema`.`my_view` SET TAGS ('env' = 'prod')
ALTER VIEW `my_catalog`.`my_schema`.`my_view` UNSET TAGS ('env')

Backward Compatibility

No backward compatibility concerns
Backward compatible
Breaking (details below)

Breaking Change Details (if any)

N/A. State reducer retains backward-compatible view fallback in _set_table_tag/_unset_table_tag for older operations.

Data / State / Migration Notes

No data/state schema changes
Data/state schema changed (describe)
Migration required: [ ] Yes [x] No
Manual steps required: [ ] Yes [x] No

Details:

New operation types set_view_tag and unset_view_tag are additive; existing projects unaffected

Testing

Automated

make all passed (format, lint, pylint 10/10, mypy, 738 unit + 169 extension tests)
Unit tests added/updated (view tag: 5 state reducer, 3 SQL generator, 4 state differ; extension: 1 interpreter path test)
Integration tests added/updated (lightweight existence checks)
Live/integration env-gated tests validated — 14/14 passed

Manual

Install extension, set python.defaultInterpreterPath to a conda/venv Python, open Designer — should load without ENOENT
Open bulk operations, select "Add view grants", enter comma-separated principals — verify operation count multiplies
Add tags to a view in the Designer, generate SQL — verify ALTER VIEW SET TAGS output

Test Evidence

# Python SDK
======================== 738 passed, 1 warning in 2.09s ========================

# VS Code Extension
Test Suites: 17 passed, 17 total
Tests:       169 passed, 169 total

# make all quality gates
All checks passed!                          # ruff
Success: no issues found in 72 source files # mypy
Your code has been rated at 10.00/10        # pylint

# Live integration (14/14)
5 passed in 1003.44s   # final batch

Observability / UX

Bulk operations label alignment fixed — labels now sit flush with input fields
Multi-principal placeholder updated: e.g. data_engineers, analysts, ml_team
Marketplace descriptions rewritten for engineering leads/architects — explains schema management alongside DLT

Security / Privacy / Compliance

No security impact
Security-relevant changes (describe)
Secrets/credentials handling reviewed
PII/data governance impact reviewed

Details:

Fixed SQL injection in live_helpers.py — physical_catalog now backtick-escaped in FROM clauses via _ident() helper. Only affects test infrastructure, not production code.

Performance Considerations

No material perf impact
Perf improved
Perf risk introduced (mitigation below)

Details:

Live integration test existence checks reduced from ~30-60s to ~1-2s each (single information_schema query vs full discover_state)
Overall test suite: 35 min for 5 tests → ~19 min for 14 tests (~7x faster per test)

Risks & Mitigations

Risk	Severity	Mitigation
`shell: true` in spawn could behave differently across OS shells	Low	Candidate fallback chain tries multiple commands; shell init is standard for conda/pyenv
`information_schema` column names may vary across Databricks runtimes	Low	Used standard column names; `function_exists` validated against live workspace
View tags require DBR 13.3+	Low	Same requirement as existing table tags; Databricks documents this

Follow-ups / Out of Scope

Add screenshots to VS Code extension README (placeholder  comments remain)
Speed up live tests further by parallelizing independent existence checks
Remove backward-compatible view fallback in _set_table_tag/_unset_table_tag once all projects have migrated to dedicated view tag ops

Checklist

Code follows project architecture (provider-extensible, no hardcoded provider leakage in core)
SOLID/SRP/DRY considerations applied
Errors are deterministic; warnings are intentional and actionable
Docs/comments updated where needed
No unrelated changes included
Reviewer notes added for tricky areas

Reviewer Notes

Focus areas: pythonBackendClient.ts (shell: true + interpreter path resolution), live_helpers.py SQL queries (identifier escaping, correct column names)
Known limitations: shell: true doesn't help if the user hasn't activated their env and hasn't set python.defaultInterpreterPath
Suggested test paths: Open extension in a fresh conda env without schemax on system PATH; create a view with tags and verify SQL output

- Introduced full support for `ALTER VIEW SET TAGS` and `ALTER VIEW UNSET TAGS` in the Python SDK, allowing users to add, update, and remove tags on views through the Designer or state files. Tags are now tracked in the changelog and included in SQL generation. - Added bulk operations for applying tags to all views within a catalog or schema scope. - Refactored the `state_differ.py` module into focused components for better maintainability, including `operation_builders.py`, `grant_differ.py`, `metadata_differ.py`, and `bulk_operations.py`. - Improved performance of live integration tests by optimizing existence checks to use single `information_schema` queries, resulting in approximately 7x faster test execution on large metastores.

- Bumped version number to 0.2.10 in package.json files for the main project, Python SDK, VS Code extension, and documentation site. - Updated CLI version in runtime_info.success.json to reflect the new version. - Revised databricks-asset-bundles.mdx and release-notes.mdx to include version 0.2.10 and highlight new features such as view tags support, multi-principal bulk grants, and improved Python environment detection. - Ensured consistency in versioning across all related files and documentation.

Copilot

Pull request overview

This PR expands Unity Catalog governance support and improves developer UX/performance across the SchemaX stack, including new view tag operations, a refactored state differ, VS Code extension Python interpreter resolution improvements, and faster live integration test helpers. It also updates marketplace/docs content and bumps versions to 0.2.10.

Changes:

Add dedicated view tag operations end-to-end (differ → reducer → SQL generator → bulk ops + tests).
Improve VS Code extension backend Python resolution (prefer python.defaultInterpreterPath, run via shell).
Refactor Unity state differ into focused modules; add extensive new unit tests; update docs/packaging/version strings.

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
packages/vscode-extension/src/backend/pythonBackendClient.ts	Adds interpreter candidate ordering and switches spawn to `shell: true` for env resolution.
packages/vscode-extension/tests/unit/pythonBackendClient.test.ts	Adds unit test coverage for configured interpreter path precedence.
packages/vscode-extension/tests/mocks/vscode.ts	Updates VS Code mock configuration getter behavior.
packages/vscode-extension/src/webview/components/BulkOperationsPanel.tsx	Adds comma-separated multi-principal grants and updates UI strings.
packages/python-sdk/src/schemax/providers/unity/state_reducer.py	Adds reducers for `set_view_tag` / `unset_view_tag`.
packages/python-sdk/src/schemax/providers/unity/state_differ.py	Refactors differ to extracted modules and adds view tag/property diffs + greenfield traversal.
packages/python-sdk/src/schemax/providers/unity/bulk_operations.py	Adds recursive “add all…” helpers, including view-tag emission on new views.
packages/python-sdk/src/schemax/providers/unity/metadata_differ.py	Adds metadata/property/tag/constraint/column diff logic in a dedicated module.
packages/python-sdk/src/schemax/providers/unity/grant_differ.py	Extracts grant diffing into dedicated module.
packages/python-sdk/src/schemax/providers/unity/sql_generator.py	Adds SQL generation handlers for `ALTER VIEW SET/UNSET TAGS`.
packages/python-sdk/tests/integration/live_helpers.py	Replaces full discovery with lightweight information_schema existence queries.
packages/python-sdk/tests/unit/test_state_reducer.py	Adds reducer tests for view tag operations.
packages/python-sdk/tests/unit/test_sql_generator.py	Adds SQL generation tests for view tag ops and escaping behavior.
packages/vscode-extension/src/webview/styles.css	Fixes modal spacing/alignment.
Version/doc files (package.json, pyproject.toml, READMEs, CHANGELOGs, docs)	Content rewrites and version bump to 0.2.10 across packages/docs/contracts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…hon backend - Integrated `diff_view_properties` into view operations to track changes in view properties when adding views in schemas and during state differencing. - Updated SQL generation methods in `UnitySQLGenerator` to escape tag names consistently, ensuring proper SQL syntax for `ALTER VIEW` and `ALTER TABLE` operations. - Refactored command candidate handling in `PythonBackendClient` to support absolute interpreter paths, improving compatibility with environments that have spaces in their paths. - Adjusted tests to verify the correct behavior of command execution and ensure that configured interpreter paths utilize the appropriate shell settings.

- Updated helper functions in `state_differ_helpers.py` to include type annotations for better type safety and clarity. - Enhanced the `_make_op`, `_base_state`, `_catalog`, `_schema`, `_table`, `_col`, `_view`, `_volume`, `_function`, `_mv`, `_constraint`, `_grant`, `_op_types`, and `_ops_of_type` functions to specify parameter and return types. - Improved overall code maintainability and readability by enforcing type hints across the module.

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Enhanced the `_quick_query` function in `live_helpers.py` to raise clear errors on query failure or timeout, ensuring that callers receive explicit feedback instead of silent failures. - Increased the wait timeout from 30 seconds to 60 seconds to accommodate longer-running queries. - Updated the test for catalog mutations in `test_state_differ_basic.py` to assert the correct handling of catalog comment updates, ensuring that the new comment is properly reflected in the generated operations.

- Added type annotations to the `detect_rename_fn` parameter in the `diff_existing_column` function for improved type safety. - Integrated `create_column_tag_ops` into the `UnityStateDiffer` class to handle column tag operations when adding new columns, ensuring that tags are properly managed during state differencing.

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-10T17:25:29Z

+      if (parsedPrincipals.length > 0 && privileges.length > 0) {
+        for (const p of parsedPrincipals) {
+          ops = ops.concat(
+            buildBulkGrantOps(scopeResult, p, privileges, GRANT_OP_TO_TARGET[operationType])
+          );
+        }


Building grant ops via repeated ops = ops.concat(...) inside the principals loop creates a new array on every iteration. Use push/spread into a single array (or flatMap) to avoid unnecessary allocations and keep the code simpler as principal counts grow.

Copilot · 2026-03-10T17:25:29Z

+  test('shows no-scope message for grants when no principal is entered', () => {
    render(
      <BulkOperationsPanel scope="catalog" catalogId="cat_1" onClose={onClose} />
    );
-    // Default op is add_table_grants; scope has 2 tables
+    // Default op is add_table_grants; no principal entered yet → 0 count → empty message
+    expect(screen.getByText(/No objects in scope for this operation/)).toBeInTheDocument();
+  });


The new test asserts that the empty-scope message is shown when no principal is entered, but the scope can still contain grant targets (the count becomes 0 only because principals are empty). If the UI message is updated to distinguish "no targets" vs "missing principals", this test should be adjusted to match the more accurate status text/behavior.

Copilot · 2026-03-10T17:25:29Z

+    response = client.statement_execution.execute_statement(
+        warehouse_id=config.warehouse_id,
+        statement=sql,
+        wait_timeout="60s",
+    )
+    if not response.status:
+        raise RuntimeError(f"_quick_query: no status returned for: {sql}")
+    state = response.status.state
+    if state != StatementState.SUCCEEDED:
+        error_msg = ""
+        if response.status.error:
+            error_msg = f": {response.status.error.message}"
+        raise RuntimeError(f"_quick_query: statement {state}{error_msg} for: {sql}")
+    if not response.result or not response.result.data_array:


_quick_query treats any non-SUCCEEDED state (including PENDING/RUNNING from wait_timeout expiry or a cold warehouse) as a hard failure. This can make live tests flaky even when the object exists. Consider polling get_statement until a terminal state (or using a longer/parameterized wait timeout) before failing, so transient execution delays don't surface as false negatives.

Copilot · 2026-03-10T17:25:30Z

+    def _set_view_tag(self, operation: Operation) -> str:
+        """Generate ALTER VIEW SET TAGS"""
+        view_fqn = self.id_name_map.get(operation.payload["viewId"], "unknown")
+        parts = view_fqn.split(".")
+        catalog_name = parts[0] if len(parts) > 0 else "unknown"
+
+        # Apply catalog name mapping (logical → physical)
+        catalog_name = self.catalog_name_mapping.get(catalog_name, catalog_name)
+        parts[0] = catalog_name
+
+        view_esc = self._build_fqn(*parts)


View tag SQL generation manually applies catalog_name_mapping before calling _build_fqn, but id_name_map is already built with catalog name mapping applied. This extra mapping is redundant and makes view-tag ops inconsistent with other view/table operations. Consider using the same pattern as other operations (split FQN → _build_fqn) and let the existing mapping logic handle catalog translation.

vb-dbrks added 3 commits March 10, 2026 15:43

code fmt

ca18302

vb-dbrks marked this pull request as ready for review March 10, 2026 16:16

Copilot AI review requested due to automatic review settings March 10, 2026 16:16

Copilot started reviewing on behalf of vb-dbrks March 10, 2026 16:17 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

vb-dbrks added 2 commits March 10, 2026 16:44

Copilot AI review requested due to automatic review settings March 10, 2026 16:50

Copilot started reviewing on behalf of vb-dbrks March 10, 2026 16:50 View session

vb-dbrks self-assigned this Mar 10, 2026

vb-dbrks added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Mar 10, 2026

vb-dbrks linked an issue Mar 10, 2026 that may be closed by this pull request

VS Code extension fails to start when SchemaX Python library is installed in a virtual environment #115

Closed

3 tasks

Copilot AI reviewed Mar 10, 2026

View reviewed changes

vb-dbrks added 2 commits March 10, 2026 17:04

Copilot AI review requested due to automatic review settings March 10, 2026 17:12

Copilot started reviewing on behalf of vb-dbrks March 10, 2026 17:12 View session

vb-dbrks merged commit 79f7044 into main Mar 10, 2026
12 checks passed

vb-dbrks deleted the fix/state-differ-metadata-gaps branch March 10, 2026 17:15

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/state differ metadata gaps#114

Fix/state differ metadata gaps#114
vb-dbrks merged 7 commits into
mainfrom
fix/state-differ-metadata-gaps

vb-dbrks commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vb-dbrks commented Mar 10, 2026

Summary

Motivation / Context

Type of Change

What Changed

Provider / Surface Area Impact

Affected Surfaces

Providers

Behavior Changes

Before

After

Examples

Backward Compatibility

Breaking Change Details (if any)

Data / State / Migration Notes

Testing

Automated

Manual

Test Evidence

Observability / UX

Security / Privacy / Compliance

Performance Considerations

Risks & Mitigations

Follow-ups / Out of Scope

Checklist

Reviewer Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants