Incremental improvements: refactors, edge cases, robustness by defnalk · Pull Request #1 · defnalk/energyemissions

defnalk · 2026-04-09T17:55:43Z

Five small, independent fixes uncovered during a read-only review pass. Each is its own atomic commit.

Commits

fix(transform): warn on unknown units in normalize_units — previously any unit outside UNIT_FACTORS was silently coerced to factor 1.0 via fillna, producing miscalibrated tonnes without any log signal.
fix(transform): tolerate missing columns in coerce_numeric — the shared cleaner choke point raised KeyError instead of logging and skipping, so a single dropped upstream field aborted the whole clean task.
fix(transform): scale anomaly critical cutoff with configured threshold — severity used a hard-coded z>4.0 critical cutoff while the warning cutoff was config-driven, so tuning anomaly_z_threshold produced lopsided severity distributions.
fix(ingest): atomically replace cached CSV on download — _download wrote response bytes directly onto the same path used by the local fallback, so a partial/failed download would corrupt the cached sample. Now writes to a .part sibling and renames on success.
fix(orchestration): surface dbt stdout on failure — dbt prints test and model failures to stdout; run_dbt only logged stderr on non-zero exit, hiding the actual diagnostic from Prefect logs and the raised RuntimeError.

Test plan

pytest tests/test_clean.py tests/test_forecast.py
Run the Prefect flow end-to-end against a dev Postgres
Manually trigger a dbt failure and confirm stdout appears in logs

Previously, any unit string outside UNIT_FACTORS was silently mapped to a factor of 1.0 via fillna, meaning a file reporting in e.g. "Gt" or a typo like "tonne" would be passed through unchanged without any indication of data loss. Emit a structured warning listing the unknown unit strings and affected row count so ingestion loudly surfaces mismatches instead of producing silently miscalibrated tonnes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coerce_numeric would raise KeyError on the first df[col] access if a caller passed a column that did not exist in the DataFrame (e.g. an upstream source that stopped emitting a field, or a rename). Since the function is the single choke point used by every cleaner and is meant to be defensive about messy inputs, log a structured warning and skip the missing column instead of aborting the whole clean task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The severity label hard-coded a critical z-score cutoff of 4.0 while the warning cutoff is driven by ForecastConfig.anomaly_z_threshold. If a caller tightened the threshold (say, 2.0) the critical band would become disproportionately wide, and if they loosened it beyond 4.0 every flagged row would be critical. Derive the critical cutoff as 1.6x the configured threshold so the two labels scale together and callers get predictable severity distributions when tuning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

_download wrote response bytes directly to the destination path, which is the same path the local-fallback loader reads from. If the HTTP call raised after partial write, or the process was killed mid-write, the cached sample file would be left truncated and the subsequent "download_failed_falling_back" branch would then read corrupt data from disk. Stream to a .part sibling and atomically rename on success so a failed download never clobbers a previously good cached copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dbt writes test failures, compilation errors, and model runtime errors to stdout, not stderr. The previous run_dbt only logged stderr on non-zero exit, so the actual diagnostic output was silently dropped from both the Prefect logs and the RuntimeError message, leaving operators to rerun dbt manually to see what broke. Log both streams on failure, include the return code, and fall back to stdout in the exception message when stderr is empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

defnalk and others added 5 commits April 9, 2026 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental improvements: refactors, edge cases, robustness#1

Incremental improvements: refactors, edge cases, robustness#1
defnalk wants to merge 5 commits into
mainfrom
review/incremental-improvements

defnalk commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

defnalk commented Apr 9, 2026

Commits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant