chore: Metric Views typegen#433
Conversation
There was a problem hiding this comment.
Pull request overview
Adds build-time type generation for Unity Catalog Metric Views, emitting a MetricRegistry module augmentation plus a frontend-safe semantic metadata bundle when config/queries/metric-views.json is present.
Changes:
- Introduces metric-view config parsing/validation + DESCRIBE-driven schema extraction, emitting
metric.d.tsandmetrics.metadata.json. - Extends the Vite typegen plugin with metric output options and watcher support for
metric-views.jsonchanges. - Adds extensive unit + snapshot coverage for metric registry generation and plugin option plumbing.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/appkit/src/type-generator/vite-plugin.ts | Adds metric output options and triggers regeneration on metric-views.json edits. |
| packages/appkit/src/type-generator/index.ts | Wires metric-view generation into generateFromEntryPoint and exports metric artifact constants/types. |
| packages/appkit/src/type-generator/metric-registry.ts | Implements metric config resolution, DESCRIBE parsing, type/metadata emission, and sync failure reporting. |
| packages/appkit/src/type-generator/tests/vite-plugin.test.ts | Tests watcher behavior for metric-views.json and option plumbing for metric outputs. |
| packages/appkit/src/type-generator/tests/index.test.ts | Tests end-to-end emission/dormancy behavior for metric artifacts in generateFromEntryPoint. |
| packages/appkit/src/type-generator/tests/metric-registry.test.ts | Adds comprehensive unit tests for config validation, extraction, time grains, and metadata formatting. |
| packages/appkit/src/type-generator/tests/snapshots/metric-registry.test.ts.snap | Snapshot coverage for emitted metric.d.ts and metrics.metadata.json. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…c failure logs Copilot review response (#433): (1) the metric emit block now gates DESCRIBEs on warehouse state in non-blocking mode — one read-only status GET (never starts a warehouse); when not RUNNING (or the probe fails) it skips all DESCRIBEs and emits degraded artifacts (every configured key with empty measures/dimensions) that the vite plugin's warehouse-watch regen (blocking mode) refreshes once the warehouse is up. Blocking mode and injected metricFetcher bypass the gate. (2) syncMetrics is now log-free: the three internal per-failure warns are removed and the generateFromEntryPoint caller owns surfacing failures exactly once. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
2748795 to
f36a66c
Compare
|
Tried to post the comments separately but encountering some issues with Claude, so here's an aggregated version: [blocking] commentsB1 —
|
… rules The source FQN was validated by a regex hand-copied across config.ts and the Zod schema that was stricter than Unity Catalog (rejecting names UC accepts in a quoted identifier — flagged in PR #433 review) yet looser in spots. Replace it with a single shared UC_FQN_PATTERN (zod-free module) referenced by the Zod schema, the generated JSON schema, and the typegen runtime. The runtime validates exactly three non-empty dot-separated segments, rejecting malformed FQNs (wrong arity, dot-in-segment, UC-illegal chars) with clear messages; a well-formed but nonexistent FQN still degrades at the warehouse. Phase 2 of metric-fqn-validation; decoupled from the Phase 1 injection escaper. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
… rules The source FQN was validated by a regex hand-copied across config.ts and the Zod schema that was stricter than Unity Catalog (rejecting names UC accepts in a quoted identifier — flagged in PR #433 review) yet looser in spots. Replace it with a single shared UC_FQN_PATTERN (zod-free module) referenced by the Zod schema, the generated JSON schema, and the typegen runtime. The runtime validates exactly three non-empty dot-separated segments, rejecting malformed FQNs (wrong arity, dot-in-segment, UC-illegal chars) with clear messages; a well-formed but nonexistent FQN still degrades at the warehouse. Phase 2 of metric-fqn-validation; decoupled from the Phase 1 injection escaper. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
|
re: B3 — Following up with a concrete repro of what we miss today, plus a fix that doesn't need the live-warehouse verification I originally flagged — so it's landable in this PR without growing it much. What the current impl can miss
m.includes("merge_json_arrays") ||
m.includes("must be json_array") ||
(m.includes("disposition") && m.includes("format"))Any genuine DESCRIBE failure whose message trips one of these is misclassified as a format rejection → The realistic one is
All fully diagnosable, all retried + masked. For contrast, the true positives that should retry — Reyden's Fix (no live-warehouse dependency)
function isFormatRejection(
message: string | undefined,
errorCode?: string,
): boolean {
if (!message) return false;
// It ran and produced a SQL error — never a format rejection, whatever the
// message text happens to contain.
if (errorCode && errorCode !== "INVALID_PARAMETER_VALUE") return false;
const m = message.toLowerCase();
return (
m.includes("merge_json_arrays") ||
m.includes("must be json_array") ||
(m.includes("disposition") && m.includes("format"))
);
}Pass the code at the FAILED-state call site ( if (
normalized.status?.state === "FAILED" &&
isFormatRejection(
normalized.status.error?.message,
normalized.status.error?.error_code,
)
) {The catch-path call ( Add the missing dangerous-case test — the existing "non-format failure" test uses test("genuine SQL error whose message mentions disposition+format: not a format rejection", async () => {
const memo: DescribeFormatMemo = {};
const { client, formats } = stubClient((format) => {
if (format === "JSON_ARRAY") {
return {
statement_id: "stmt",
status: {
state: "FAILED",
error: {
error_code: "TABLE_OR_VIEW_NOT_FOUND",
message: "table x has no disposition column; format unknown",
},
},
result: {},
} as DatabricksStatementExecutionResponse;
}
throw new Error("ARROW must not be tried for a genuine SQL error");
});
const result = await describeAdaptive(client, "DESCRIBE QUERY x", "wh", memo);
expect(result.status.state).toBe("FAILED");
expect(memo.format).toBeUndefined();
expect(formats).toEqual(["JSON_ARRAY"]); // no fallback attempted
});~7 lines of source + one test. The message-token tightening (dropping |
pkosiec
left a comment
There was a problem hiding this comment.
Before the merge, please address the comment above 🙏 Thanks!
isFormatRejection matched message substrings only. DESCRIBE runs over user-supplied SQL and analysis errors echo the offending SQL back, so a source referencing columns named 'disposition'/'format' produced a genuine SQL error (e.g. TABLE_OR_VIEW_NOT_FOUND) whose text tripped the 'disposition && format' clause — describeAdaptive then retried it under ARROW_STREAM and the real diagnostic was masked by the second attempt's error. Gate on status.error.error_code at the FAILED-state call site: a statement that ran and failed carries a SQL code, while a true request-shape rejection comes back as INVALID_PARAMETER_VALUE, so any other code short-circuits to 'not a format rejection' regardless of message text. The catch-path call stays message-only (a thrown SDK error has no reliable structured code). Message tokens are untouched, so the proven Reyden/JSON_ARRAY fallback never regresses. Adds a regression test (genuine SQL error mentioning disposition+format is not retried) and a positive guard (INVALID_PARAMETER_VALUE rejection still falls back). Per pkosiec's B3 follow-up on #433; the message-token tightening that needs a live PRO-warehouse string remains a tracked follow-up. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Build-time type generator for UC Metric Views: reads config/queries/metric-views.json, runs DESCRIBE TABLE EXTENDED per declared view, and emits the MetricRegistry .d.ts augmentation (metric.d.ts) plus the metrics.metadata.json semantic bundle. Non-blocking-mode aware (degraded types when the warehouse is unavailable), blocking-mode warehouse preflight, bounded-concurrency DESCRIBEs, and a retry-driven describe cache with last-known-good degradation. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Config caps (200 metric views, 255 chars per FQN segment, 767 total, 100 decimal places), reject metricViews:null, backtick-quote validated FQN segments in DESCRIBE, null-prototype metadata bundle, exact-basename watcher match, and locale-independent artifact key order. Cache: sticky vs transient retry classification, pruning to the configured key set, and structural validation of revived entries. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Failure-outcome helper, unified renderer block builders, currency-symbol map, parallel-array removal, hoisted allowlist sets, and relocation of the revival validator + cache-hash helper into cache.ts. The Vite plugin now defers metric artifact defaults to the generator so plugin- and CLI-driven runs agree under a custom outFile. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
SDK executeStatement returns ARROW_STREAM by default (rows in result.attachment, data_array empty), so the metric and query DESCRIBE fetchers silently degraded on warehouses that don't default to JSON_ARRAY. Add normalizeResultRows (apache-arrow tableFromIPC) and request ARROW_STREAM + INLINE in both fetchers; downstream parsers read the populated data_array unchanged. Verified live against a real warehouse: real measure/dimension unions, cache no longer degraded. Hardening: refuse to emit partial types when a DESCRIBE result is multi-chunk (next_chunk_* present) — fail loud rather than cache a truncated schema; extract row values via the positional StructRow iterator ([...row]) rather than Object.values, which reorders integer-named columns. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Split the ~1.4k-line metric-registry.ts into focused mv-registry/ modules (config, describe, metadata, render-types, sync, types). Consumers import directly from the relevant submodule; there is no aggregating barrel. The package's public type surface is unchanged — type-generator/index.ts still re-exports the same metric types from mv-registry/types. Behavior-preserving: 2962 tests green and the dogfood live run still emits real metric types. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
The prior Arrow fix hardcoded INLINE+ARROW_STREAM for DESCRIBE, which only the Reyden engine accepts. Standard DBSQL (PRO/CLASSIC) rejects that pairing and requires JSON_ARRAY, so metric and query typegen broke on real warehouses. The two engines have opposite requirements — no single hardcoded format works on both. describeAdaptive tries JSON_ARRAY first and falls back to ARROW_STREAM only when the warehouse rejects the format (merge_json_arrays / disposition mismatch), memoizing the accepted format per run. SQL errors, degrades, and connectivity failures pass through unchanged. The Arrow decoder stays as the Reyden branch. Covers the metric (describe.ts) and query (query-registry.ts) paths. Verified live on revenue_arr_demo (PRO) and a type=REYDEN warehouse. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
The metric DESCRIBE fetcher wrapped each FQN segment in backticks without escaping, so a segment containing a backtick could break out of the quoted identifier. Add quoteFqnForSql: double embedded backticks (the only break-out from a backtick-quoted identifier) and reject control/newline characters — a standalone, unit-tested escaper decoupled from FQN naming validation. Phase 1 of metric-fqn-validation (split injection-safety from UC naming). Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
… rules The source FQN was validated by a regex hand-copied across config.ts and the Zod schema that was stricter than Unity Catalog (rejecting names UC accepts in a quoted identifier — flagged in PR #433 review) yet looser in spots. Replace it with a single shared UC_FQN_PATTERN (zod-free module) referenced by the Zod schema, the generated JSON schema, and the typegen runtime. The runtime validates exactly three non-empty dot-separated segments, rejecting malformed FQNs (wrong arity, dot-in-segment, UC-illegal chars) with clear messages; a well-formed but nonexistent FQN still degrades at the warehouse. Phase 2 of metric-fqn-validation; decoupled from the Phase 1 injection escaper. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Move the connectivity-classification helpers (isConnectivityError, getErrorDiagnostic, getErrorMessage and their internals) out of query-registry.ts into a new errors.ts so both describe paths can share one source of truth. Drop the duplicate errorMessageOf in statement-result.ts in favour of the shared getErrorMessage. Pure refactor — no behavior change. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
… warehouse errors syncMetrics treated every thrown DESCRIBE failure as transient, so auth errors, bad warehouse ids, and the deterministic multi-chunk truncation throw were cached retry:true and re-described forever. Classify thrown failures via the shared isConnectivityError: connectivity blips stay transient (retry), everything else is deterministic and pinned sticky. At the warehouse level the status probe and the blocking preflight swallowed every error into "not running"; a 403 or a timed-out wait then degraded silently on every pass. Mirror the query path: connectivity degrades, while a deterministic failure (auth, bad id, timed-out wait) sets a fatal message so the build fails after artifacts are written. Per-key DESCRIBE failures stay degrade-open (sticky + warn) so a single bad FQN never breaks the build. Also folded in: type the cache revival guard's input as unknown rather than the already-valid type, warn instead of silently degrading on the unreachable schema fallback, and fix the MV__PREFLIGHT_WAIT_MAX_MS double-underscore typo. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
The directory is mv-registry/ and the public params are mvOutFile/ mvMetadataOutFile, but several internals still used the old metric* prefix. Rename the local variables (mvConfig, mvCacheSection, mvClient/getMvClient, mvSchemas, mvFile, mvDeclarations, mvMetadataFile) and the MV_DESCRIBE_CONCURRENCY constant, and move tests/metric-registry.test.ts (+ its snapshot) to mv-registry.test.ts. Public surface is deliberately untouched: the Metric* types, the METRIC_*_FILE exports, the metricFetcher option, and the imported readMetricConfig/ resolveMetricConfig/metricCacheHash helpers keep their names. Pure rename, no behavior change. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
…mv doc comments
Review cleanups — two small safety fixes plus doc tidy, no behavior change
otherwise:
- escape "*/" in the generated @sqlType JSDoc sinks (render-types.ts and
query-registry.ts) so a SQL type/comment containing "*/" can't close the
comment early and corrupt the emitted .d.ts (S3).
- warn instead of silently swallowing an ARROW_STREAM decode failure in
normalizeResultRows, so a Reyden user whose decode failed gets a breadcrumb
rather than mysteriously-empty types (S2).
Doc tidy: MAX_METRIC_VIEWS rationale (N5), fix the self-referential
"legacy {@link MV_CONFIG_FILE}" comment (N2), drop the non-resolving @link in
sync.ts (N3), correct the stale "metric.json" header in the shared schema +
regenerate the JSON Schema (N4), un-strand a doc comment inside
resolveMetricConfig (N10), and replace the non-standard @note + redundant path
literals (N11).
Co-authored-by: Isaac
Signed-off-by: Atila Fassina <atila@fassina.eu>
isFormatRejection matched message substrings only. DESCRIBE runs over user-supplied SQL and analysis errors echo the offending SQL back, so a source referencing columns named 'disposition'/'format' produced a genuine SQL error (e.g. TABLE_OR_VIEW_NOT_FOUND) whose text tripped the 'disposition && format' clause — describeAdaptive then retried it under ARROW_STREAM and the real diagnostic was masked by the second attempt's error. Gate on status.error.error_code at the FAILED-state call site: a statement that ran and failed carries a SQL code, while a true request-shape rejection comes back as INVALID_PARAMETER_VALUE, so any other code short-circuits to 'not a format rejection' regardless of message text. The catch-path call stays message-only (a thrown SDK error has no reliable structured code). Message tokens are untouched, so the proven Reyden/JSON_ARRAY fallback never regresses. Adds a regression test (genuine SQL error mentioning disposition+format is not retried) and a positive guard (INVALID_PARAMETER_VALUE rejection still falls back). Per pkosiec's B3 follow-up on #433; the message-token tightening that needs a live PRO-warehouse string remains a tracked follow-up. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Adds an appkit mv sync command that fetches Unity Catalog metric-view schemas and emits metric.d.ts plus metrics.metadata.json outside the Vite dev loop (CI, non-Vite builds, manual refresh). The command lives in shared and reaches appkit's sync core via dynamic import of the type-generator entry with an ambient declaration and a graceful appkit-absent fallback, so shared keeps no static appkit dependency. A new appkit syncMetricViewsTypes export reuses the existing metric writers, adaptive describe fetcher and persistent cache helpers, so the emitted bundle matches the Vite plugin output. Config is validated against metricSourceSchema before sync, an absent default file exits zero for dormancy while error modes exit non-zero with distinct messages, and interactive and non-interactive flows mirror plugin create. Flags are --warehouse-id, --metric-views-json-path, --output-dir and --no-cache. Fourth change in the metric-views decomposition after #427, #429 and #433. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Adds an appkit mv sync command that fetches Unity Catalog metric-view schemas and emits metric.d.ts plus metrics.metadata.json outside the Vite dev loop (CI, non-Vite builds, manual refresh). The command lives in shared and reaches appkit's sync core via dynamic import of the type-generator entry with an ambient declaration and a graceful appkit-absent fallback, so shared keeps no static appkit dependency. A new appkit syncMetricViewsTypes export reuses the existing metric writers, adaptive describe fetcher and persistent cache helpers, so the emitted bundle matches the Vite plugin output. Config is validated against metricSourceSchema before sync, an absent default file exits zero for dormancy while error modes exit non-zero with distinct messages, and interactive and non-interactive flows mirror plugin create. Flags are --warehouse-id, --metric-views-json-path, --output-dir and --no-cache. Fourth change in the metric-views decomposition after #427, #429 and #433. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Adds an appkit mv sync command that fetches Unity Catalog metric-view schemas and emits metric.d.ts plus metrics.metadata.json outside the Vite dev loop (CI, non-Vite builds, manual refresh). The command lives in shared and reaches appkit's sync core via dynamic import of the type-generator entry with an ambient declaration and a graceful appkit-absent fallback, so shared keeps no static appkit dependency. A new appkit syncMetricViewsTypes export reuses the existing metric writers, adaptive describe fetcher and persistent cache helpers, so the emitted bundle matches the Vite plugin output. Config is validated against metricSourceSchema before sync, an absent default file exits zero for dormancy while error modes exit non-zero with distinct messages, and interactive and non-interactive flows mirror plugin create. Flags are --warehouse-id, --metric-views-json-path, --output-dir and --no-cache. Fourth change in the metric-views decomposition after #427, #429 and #433. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Adds an appkit mv sync command that fetches Unity Catalog metric-view schemas and emits metric.d.ts plus metrics.metadata.json outside the Vite dev loop (CI, non-Vite builds, manual refresh). The command lives in shared and reaches appkit's sync core via dynamic import of the type-generator entry with an ambient declaration and a graceful appkit-absent fallback, so shared keeps no static appkit dependency. A new appkit syncMetricViewsTypes export reuses the existing metric writers, adaptive describe fetcher and persistent cache helpers, so the emitted bundle matches the Vite plugin output. Config is validated against metricSourceSchema before sync, an absent default file exits zero for dormancy while error modes exit non-zero with distinct messages, and interactive and non-interactive flows mirror plugin create. Flags are --warehouse-id, --metric-views-json-path, --output-dir and --no-cache. Fourth change in the metric-views decomposition after #427, #429 and #433. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
When
config/queries/metric-views.jsonis present, the type generator runsDESCRIBE TABLE EXTENDED ... AS JSONper declared metric view and emits two artifacts:shared/appkit-types/metric.d.ts— theMetricRegistrymodule augmentation. Each entry carries typedmeasures/dimensionsrow fields plusmeasureKeys/dimensionKeys/timeGrainsliteral unions, the base theMeasureKey<K>/DimensionKey<K>/MetricRow<K>/TimeGrain<K>helpers derive from on the appkit-ui side.shared/appkit-types/metrics.metadata.json— the semantic-metadata bundle, entries shaped{ measures, dimensions }(display names, format specs, descriptions, time-grain hints). Frontend-safe by construction: UC FQNs and execution lanes are deliberately excluded.Note
Dormancy invariant: absent config means nothing executes — zero artifacts, zero logs, no fallback to any legacy filename. Apps that never adopt metric views see no change, which is what keeps merging this incrementally safe ahead of the runtime.
Output
The schema already exists in
main, so it's not part of this PR.This feature creates the artifact according to the defined schema.
config/queries/metric-views.json, entity-firstmetricViewsmap per the #429metric-sourceschema:{ "metricViews": { "revenue": { "source": "main.finance.revenue_metrics" }, "customer_metrics": { "source": "main.cs.customer_metrics", "executor": "user" } } }executorisapp_service_principal(default) oruser(per-user OBO); the internal sp/obo lane is derived at the parse boundary, so downstream code only ever sees lanes.Failure semantics
Vite plugin grows
mvOutFile/mvMetadataOutFileoptions, and the dev watcher regenerates onmetric-views.jsonedits through the exact same single-flight regen flow as.sqlfiles.Adaptive
DESCRIBEformat per warehouseSome warehouses have opposite requirements for
DESCRIBE … AS JSON:disposition+formatINLINE+JSON_ARRAYdata_arraymerge_json_arraysINLINE+ARROW_STREAMEXTERNAL_LINKS+ARROW_STREAMManual Testing
fallback:
databricks warehouses list -p DEFAULT -o json \ | jq -r '.[] | select(.creator_name=="<your_account_email>") | "\(.id)\t\(.warehouse_type)\t\(.name)"' # → 1075664542a32710 PRO revenue_arr_demoThe metric views referenced in the config must exist in that workspace.
Build the SDK (so the CLI runs the patched describeAdaptive)
(Point source at metric views that exist in your workspace.)
DATABRICKS_CONFIG_PROFILE=DEFAULT DATABRICKS_WAREHOUSE_ID=1075...2710 \ pnpm exec tsx packages/shared/src/cli/index.ts generate-types \ /tmp/mv-typegen-test /tmp/mv-typegen-test/shared/appkit-types/analytics.d.ts \ --no-cache --wait - --no-cache → forces a fresh DESCRIBE (actually hits the warehouse). - --wait → blocks until RUNNING (auto-starts a stopped serverless warehouse).var.
(a) no format errors:
the run output should NOT contain INVALID_PARAMETER_VALUE / merge_json_arrays / "metric sync failed"
(b) real unions (not
string) in the generated types:(c) cache shows not-degraded (cache lives under the cwd = worktree):