Skip to content

Fix/state differ metadata gaps#114

Merged
vb-dbrks merged 7 commits into
mainfrom
fix/state-differ-metadata-gaps
Mar 10, 2026
Merged

Fix/state differ metadata gaps#114
vb-dbrks merged 7 commits into
mainfrom
fix/state-differ-metadata-gaps

Conversation

@vb-dbrks

Copy link
Copy Markdown
Owner

Summary

  • View tags support across the full pipeline (differ, SQL generator, reducer, bulk operations)
  • VS Code extension Python environment detection fix for conda, venv, uv, pyenv, poetry
  • Multi-principal bulk grants in the Designer
  • Live integration test performance optimization (~7x faster)
  • Marketplace content rewrite for PyPI, VS Code Marketplace, and Open VSX
  • State differ refactored by SRP into focused modules
  • Version bump to 0.2.10

Motivation / Context

  • Issue: View tags (ALTER VIEW SET/UNSET TAGS) were not supported despite being available in Databricks DBR 13.3+
  • Context: Users in conda/venv/uv environments couldn't use the extension because spawn with shell: false didn't resolve Python from virtual environments. Bulk operations only supported a single principal. Live integration tests were taking 35+ minutes on large metastores due to full discover_state calls for simple existence checks.
  • Scope: Python SDK, VS Code extension, integration tests, docs, marketplace content

Type of Change

  • Feature
  • Bug fix
  • Refactor
  • Performance improvement
  • Documentation
  • Test-only
  • Build/CI
  • Breaking change

What Changed

  • Added set_view_tag / unset_view_tag operations, builders, SQL generation, state reduction, and bulk operations
  • Split state_differ.py into operation_builders.py, grant_differ.py, metadata_differ.py, bulk_operations.py
  • Changed PythonBackendClient to use shell: true and prepend VS Code's python.defaultInterpreterPath as first candidate
  • Updated BulkOperationsPanel to accept comma-separated principals
  • Replaced discover_state calls in live_helpers.py with single information_schema SQL queries
  • Rewrote READMEs for PyPI and VS Code Marketplace with user-focused content
  • Fixed bulk operations modal label alignment (CSS gap/margin)
  • Bumped all version references from 0.2.9 to 0.2.10
  • Fixed SQL injection in live_helpers.py FROM clauses (backtick-escaped identifiers)

Provider / Surface Area Impact

Affected Surfaces

  • CLI
  • Python SDK
  • VSCode Extension UI
  • SQL generation
  • State differ/reducer
  • Storage/project schema
  • Integration tests
  • Docs

Providers

  • Provider-agnostic/core
  • Unity Catalog
  • Other: ______

Behavior Changes

Before

  • View tags not supported — no ALTER VIEW SET/UNSET TAGS SQL generated
  • Extension failed with spawn python ENOENT in conda/venv/uv environments
  • Bulk grants only accepted a single principal
  • Live test existence checks ran full catalog discovery (30-60s each on large metastores)
  • physical_catalog unescaped in SQL FROM clauses

After

  • View tags fully supported: differ detects tag changes, SQL generator emits ALTER VIEW SET/UNSET TAGS, reducer applies to state, bulk ops panel supports view tags
  • Extension resolves Python from VS Code interpreter setting first, then shell PATH (works with conda, venv, uv, pyenv, poetry)
  • Bulk grants accept comma-separated principals (e.g. data_engineers, analysts, ml_team)
  • Existence checks use single information_schema queries (~1-2s each)
  • Catalog identifiers backtick-escaped in FROM clauses

Examples

-- New SQL generation for view tags
ALTER VIEW `my_catalog`.`my_schema`.`my_view` SET TAGS ('env' = 'prod')
ALTER VIEW `my_catalog`.`my_schema`.`my_view` UNSET TAGS ('env')

Backward Compatibility

  • No backward compatibility concerns
  • Backward compatible
  • Breaking (details below)

Breaking Change Details (if any)

  • N/A. State reducer retains backward-compatible view fallback in _set_table_tag/_unset_table_tag for older operations.

Data / State / Migration Notes

  • No data/state schema changes
  • Data/state schema changed (describe)
  • Migration required: [ ] Yes [x] No
  • Manual steps required: [ ] Yes [x] No

Details:

  • New operation types set_view_tag and unset_view_tag are additive; existing projects unaffected

Testing

Automated

  • make all passed (format, lint, pylint 10/10, mypy, 738 unit + 169 extension tests)
  • Unit tests added/updated (view tag: 5 state reducer, 3 SQL generator, 4 state differ; extension: 1 interpreter path test)
  • Integration tests added/updated (lightweight existence checks)
  • Live/integration env-gated tests validated — 14/14 passed

Manual

  1. Install extension, set python.defaultInterpreterPath to a conda/venv Python, open Designer — should load without ENOENT
  2. Open bulk operations, select "Add view grants", enter comma-separated principals — verify operation count multiplies
  3. Add tags to a view in the Designer, generate SQL — verify ALTER VIEW SET TAGS output

Test Evidence

# Python SDK
======================== 738 passed, 1 warning in 2.09s ========================

# VS Code Extension
Test Suites: 17 passed, 17 total
Tests:       169 passed, 169 total

# make all quality gates
All checks passed!                          # ruff
Success: no issues found in 72 source files # mypy
Your code has been rated at 10.00/10        # pylint

# Live integration (14/14)
5 passed in 1003.44s   # final batch

Observability / UX

  • Bulk operations label alignment fixed — labels now sit flush with input fields
  • Multi-principal placeholder updated: e.g. data_engineers, analysts, ml_team
  • Marketplace descriptions rewritten for engineering leads/architects — explains schema management alongside DLT

Security / Privacy / Compliance

  • No security impact
  • Security-relevant changes (describe)
  • Secrets/credentials handling reviewed
  • PII/data governance impact reviewed

Details:

  • Fixed SQL injection in live_helpers.pyphysical_catalog now backtick-escaped in FROM clauses via _ident() helper. Only affects test infrastructure, not production code.

Performance Considerations

  • No material perf impact
  • Perf improved
  • Perf risk introduced (mitigation below)

Details:

  • Live integration test existence checks reduced from ~30-60s to ~1-2s each (single information_schema query vs full discover_state)
  • Overall test suite: 35 min for 5 tests → ~19 min for 14 tests (~7x faster per test)

Risks & Mitigations

Risk Severity Mitigation
shell: true in spawn could behave differently across OS shells Low Candidate fallback chain tries multiple commands; shell init is standard for conda/pyenv
information_schema column names may vary across Databricks runtimes Low Used standard column names; function_exists validated against live workspace
View tags require DBR 13.3+ Low Same requirement as existing table tags; Databricks documents this

Follow-ups / Out of Scope

  • Add screenshots to VS Code extension README (placeholder <!-- TODO --> comments remain)
  • Speed up live tests further by parallelizing independent existence checks
  • Remove backward-compatible view fallback in _set_table_tag/_unset_table_tag once all projects have migrated to dedicated view tag ops

Checklist

  • Code follows project architecture (provider-extensible, no hardcoded provider leakage in core)
  • SOLID/SRP/DRY considerations applied
  • Errors are deterministic; warnings are intentional and actionable
  • Docs/comments updated where needed
  • No unrelated changes included
  • Reviewer notes added for tricky areas

Reviewer Notes

  • Focus areas: pythonBackendClient.ts (shell: true + interpreter path resolution), live_helpers.py SQL queries (identifier escaping, correct column names)
  • Known limitations: shell: true doesn't help if the user hasn't activated their env and hasn't set python.defaultInterpreterPath
  • Suggested test paths: Open extension in a fresh conda env without schemax on system PATH; create a view with tags and verify SQL output

- Introduced full support for `ALTER VIEW SET TAGS` and `ALTER VIEW UNSET TAGS` in the Python SDK, allowing users to add, update, and remove tags on views through the Designer or state files. Tags are now tracked in the changelog and included in SQL generation.
- Added bulk operations for applying tags to all views within a catalog or schema scope.
- Refactored the `state_differ.py` module into focused components for better maintainability, including `operation_builders.py`, `grant_differ.py`, `metadata_differ.py`, and `bulk_operations.py`.
- Improved performance of live integration tests by optimizing existence checks to use single `information_schema` queries, resulting in approximately 7x faster test execution on large metastores.
- Bumped version number to 0.2.10 in package.json files for the main project, Python SDK, VS Code extension, and documentation site.
- Updated CLI version in runtime_info.success.json to reflect the new version.
- Revised databricks-asset-bundles.mdx and release-notes.mdx to include version 0.2.10 and highlight new features such as view tags support, multi-principal bulk grants, and improved Python environment detection.
- Ensured consistency in versioning across all related files and documentation.
@vb-dbrks vb-dbrks marked this pull request as ready for review March 10, 2026 16:16
Copilot AI review requested due to automatic review settings March 10, 2026 16:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Unity Catalog governance support and improves developer UX/performance across the SchemaX stack, including new view tag operations, a refactored state differ, VS Code extension Python interpreter resolution improvements, and faster live integration test helpers. It also updates marketplace/docs content and bumps versions to 0.2.10.

Changes:

  • Add dedicated view tag operations end-to-end (differ → reducer → SQL generator → bulk ops + tests).
  • Improve VS Code extension backend Python resolution (prefer python.defaultInterpreterPath, run via shell).
  • Refactor Unity state differ into focused modules; add extensive new unit tests; update docs/packaging/version strings.

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/vscode-extension/src/backend/pythonBackendClient.ts Adds interpreter candidate ordering and switches spawn to shell: true for env resolution.
packages/vscode-extension/tests/unit/pythonBackendClient.test.ts Adds unit test coverage for configured interpreter path precedence.
packages/vscode-extension/tests/mocks/vscode.ts Updates VS Code mock configuration getter behavior.
packages/vscode-extension/src/webview/components/BulkOperationsPanel.tsx Adds comma-separated multi-principal grants and updates UI strings.
packages/python-sdk/src/schemax/providers/unity/state_reducer.py Adds reducers for set_view_tag / unset_view_tag.
packages/python-sdk/src/schemax/providers/unity/state_differ.py Refactors differ to extracted modules and adds view tag/property diffs + greenfield traversal.
packages/python-sdk/src/schemax/providers/unity/bulk_operations.py Adds recursive “add all…” helpers, including view-tag emission on new views.
packages/python-sdk/src/schemax/providers/unity/metadata_differ.py Adds metadata/property/tag/constraint/column diff logic in a dedicated module.
packages/python-sdk/src/schemax/providers/unity/grant_differ.py Extracts grant diffing into dedicated module.
packages/python-sdk/src/schemax/providers/unity/sql_generator.py Adds SQL generation handlers for ALTER VIEW SET/UNSET TAGS.
packages/python-sdk/tests/integration/live_helpers.py Replaces full discovery with lightweight information_schema existence queries.
packages/python-sdk/tests/unit/test_state_reducer.py Adds reducer tests for view tag operations.
packages/python-sdk/tests/unit/test_sql_generator.py Adds SQL generation tests for view tag ops and escaping behavior.
packages/vscode-extension/src/webview/styles.css Fixes modal spacing/alignment.
Version/doc files (package.json, pyproject.toml, READMEs, CHANGELOGs, docs) Content rewrites and version bump to 0.2.10 across packages/docs/contracts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread packages/vscode-extension/src/backend/pythonBackendClient.ts Outdated
Comment thread packages/vscode-extension/src/backend/pythonBackendClient.ts Outdated
Comment thread packages/python-sdk/src/schemax/providers/unity/state_differ.py
Comment thread packages/python-sdk/src/schemax/providers/unity/bulk_operations.py
Comment thread packages/python-sdk/src/schemax/providers/unity/sql_generator.py
…hon backend

- Integrated `diff_view_properties` into view operations to track changes in view properties when adding views in schemas and during state differencing.
- Updated SQL generation methods in `UnitySQLGenerator` to escape tag names consistently, ensuring proper SQL syntax for `ALTER VIEW` and `ALTER TABLE` operations.
- Refactored command candidate handling in `PythonBackendClient` to support absolute interpreter paths, improving compatibility with environments that have spaces in their paths.
- Adjusted tests to verify the correct behavior of command execution and ensure that configured interpreter paths utilize the appropriate shell settings.
- Updated helper functions in `state_differ_helpers.py` to include type annotations for better type safety and clarity.
- Enhanced the `_make_op`, `_base_state`, `_catalog`, `_schema`, `_table`, `_col`, `_view`, `_volume`, `_function`, `_mv`, `_constraint`, `_grant`, `_op_types`, and `_ops_of_type` functions to specify parameter and return types.
- Improved overall code maintainability and readability by enforcing type hints across the module.
Copilot AI review requested due to automatic review settings March 10, 2026 16:50
@vb-dbrks vb-dbrks self-assigned this Mar 10, 2026
@vb-dbrks vb-dbrks added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Mar 10, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread packages/vscode-extension/src/backend/pythonBackendClient.ts
Comment thread packages/python-sdk/tests/integration/live_helpers.py
Comment thread packages/python-sdk/tests/unit/test_state_differ_basic.py Outdated
- Enhanced the `_quick_query` function in `live_helpers.py` to raise clear errors on query failure or timeout, ensuring that callers receive explicit feedback instead of silent failures.
- Increased the wait timeout from 30 seconds to 60 seconds to accommodate longer-running queries.
- Updated the test for catalog mutations in `test_state_differ_basic.py` to assert the correct handling of catalog comment updates, ensuring that the new comment is properly reflected in the generated operations.
- Added type annotations to the `detect_rename_fn` parameter in the `diff_existing_column` function for improved type safety.
- Integrated `create_column_tag_ops` into the `UnityStateDiffer` class to handle column tag operations when adding new columns, ensuring that tags are properly managed during state differencing.
Copilot AI review requested due to automatic review settings March 10, 2026 17:12
@vb-dbrks vb-dbrks merged commit 79f7044 into main Mar 10, 2026
12 checks passed
@vb-dbrks vb-dbrks deleted the fix/state-differ-metadata-gaps branch March 10, 2026 17:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +133 to +138
if (parsedPrincipals.length > 0 && privileges.length > 0) {
for (const p of parsedPrincipals) {
ops = ops.concat(
buildBulkGrantOps(scopeResult, p, privileges, GRANT_OP_TO_TARGET[operationType])
);
}

Copilot AI Mar 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building grant ops via repeated ops = ops.concat(...) inside the principals loop creates a new array on every iteration. Use push/spread into a single array (or flatMap) to avoid unnecessary allocations and keep the code simpler as principal counts grow.

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +173
test('shows no-scope message for grants when no principal is entered', () => {
render(
<BulkOperationsPanel scope="catalog" catalogId="cat_1" onClose={onClose} />
);
// Default op is add_table_grants; scope has 2 tables
// Default op is add_table_grants; no principal entered yet → 0 count → empty message
expect(screen.getByText(/No objects in scope for this operation/)).toBeInTheDocument();
});

Copilot AI Mar 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test asserts that the empty-scope message is shown when no principal is entered, but the scope can still contain grant targets (the count becomes 0 only because principals are empty). If the UI message is updated to distinguish "no targets" vs "missing principals", this test should be adjusted to match the more accurate status text/behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +135
response = client.statement_execution.execute_statement(
warehouse_id=config.warehouse_id,
statement=sql,
wait_timeout="60s",
)
if not response.status:
raise RuntimeError(f"_quick_query: no status returned for: {sql}")
state = response.status.state
if state != StatementState.SUCCEEDED:
error_msg = ""
if response.status.error:
error_msg = f": {response.status.error.message}"
raise RuntimeError(f"_quick_query: statement {state}{error_msg} for: {sql}")
if not response.result or not response.result.data_array:

Copilot AI Mar 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_quick_query treats any non-SUCCEEDED state (including PENDING/RUNNING from wait_timeout expiry or a cold warehouse) as a hard failure. This can make live tests flaky even when the object exists. Consider polling get_statement until a terminal state (or using a longer/parameterized wait timeout) before failing, so transient execution delays don't surface as false negatives.

Copilot uses AI. Check for mistakes.
Comment on lines +2136 to +2146
def _set_view_tag(self, operation: Operation) -> str:
"""Generate ALTER VIEW SET TAGS"""
view_fqn = self.id_name_map.get(operation.payload["viewId"], "unknown")
parts = view_fqn.split(".")
catalog_name = parts[0] if len(parts) > 0 else "unknown"

# Apply catalog name mapping (logical → physical)
catalog_name = self.catalog_name_mapping.get(catalog_name, catalog_name)
parts[0] = catalog_name

view_esc = self._build_fqn(*parts)

Copilot AI Mar 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View tag SQL generation manually applies catalog_name_mapping before calling _build_fqn, but id_name_map is already built with catalog name mapping applied. This extra mapping is redundant and makes view-tag ops inconsistent with other view/table operations. Consider using the same pattern as other operations (split FQN → _build_fqn) and let the existing mapping logic handle catalog translation.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VS Code extension fails to start when SchemaX Python library is installed in a virtual environment

2 participants