Skip to content

Fix charmap error on non-ASCII output and add warehouse ID setup step#9

Merged
vmariiechko merged 1 commit into
mainfrom
feature/dbx-ro-query-unicode-encoding
May 13, 2026
Merged

Fix charmap error on non-ASCII output and add warehouse ID setup step#9
vmariiechko merged 1 commit into
mainfrom
feature/dbx-ro-query-unicode-encoding

Conversation

@vmariiechko

@vmariiechko vmariiechko commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

Two small improvements to the dbx-ro-query asset triggered by real usage:

  1. Encoding fix: query results containing non-ASCII characters (Greek, Cyrillic, emoji) caused a charmap codec error on Windows when printed to stdout. configure_text_streams() reconfigures both stdout and stderr to UTF-8 with errors="replace" at startup. The subprocess call in run_query already used UTF-8; this closes the remaining gap at the print() call in main().

  2. Install message: first-time users had no prompt to set DATABRICKS_WAREHOUSE_ID before running the smoke check. A new step 2 in the success_message shows the databricks warehouses list lookup and the export pattern. Placed here rather than in SKILL.md because SKILL.md is read by agents during task execution, not by humans during initial setup.

Changes

  • scripts/dbx-ro-query.py: add configure_text_streams(), call it as first line of main()
  • databricks_template_schema.json: insert warehouse ID setup step (step 2) in success_message, renumber smoke check to step 3
  • tests/assets/test_dbx_ro_query.py: three new tests — configure_text_streams smoke test, non-ASCII scalar output, non-ASCII TSV output
  • CHANGELOG.md: add [1.7.1] - 2026-05-13 entry

Change Area

  • Asset Library (assets/<name>/)

Configuration Axes Affected

  • Asset Library (new asset, asset schema, or framework changes)

Testing

  • All tests pass (pytest tests/ -V)
  • New tests added for new functionality (if applicable)

Asset Changes (if applicable)

  • Asset installs standalone via databricks bundle init . --template-dir assets/<name> --output-dir <dir>
  • Asset is self-contained (no references to library/helpers.tmpl or other assets)

Checklist

  • Documentation updated (if behavior changed)

Reconfigure stdout and stderr to UTF-8 at startup via
configure_text_streams(), preventing charmap codec errors when query
results contain Greek, Cyrillic, emoji, or other non-ASCII characters.
The subprocess call in run_query already used UTF-8; this closes the
remaining gap at the print() call in main().

Add a "Set your warehouse ID" step to the asset success_message so
first-time users see the databricks warehouses list lookup and
DATABRICKS_WAREHOUSE_ID export pattern immediately after install,
before the smoke-check step. The instruction belongs here rather than
in SKILL.md, which is read by agents, not by humans during setup.

Three new tests: configure_text_streams smoke test, and two format_rows
tests covering non-ASCII scalar and TSV output.
@vmariiechko vmariiechko merged commit 6688827 into main May 13, 2026
1 check passed
@vmariiechko vmariiechko deleted the feature/dbx-ro-query-unicode-encoding branch May 13, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant