Incremental improvements: refactors, edge cases, robustness#1
Open
defnalk wants to merge 5 commits into
Open
Conversation
Previously, any unit string outside UNIT_FACTORS was silently mapped to a factor of 1.0 via fillna, meaning a file reporting in e.g. "Gt" or a typo like "tonne" would be passed through unchanged without any indication of data loss. Emit a structured warning listing the unknown unit strings and affected row count so ingestion loudly surfaces mismatches instead of producing silently miscalibrated tonnes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
coerce_numeric would raise KeyError on the first df[col] access if a caller passed a column that did not exist in the DataFrame (e.g. an upstream source that stopped emitting a field, or a rename). Since the function is the single choke point used by every cleaner and is meant to be defensive about messy inputs, log a structured warning and skip the missing column instead of aborting the whole clean task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The severity label hard-coded a critical z-score cutoff of 4.0 while the warning cutoff is driven by ForecastConfig.anomaly_z_threshold. If a caller tightened the threshold (say, 2.0) the critical band would become disproportionately wide, and if they loosened it beyond 4.0 every flagged row would be critical. Derive the critical cutoff as 1.6x the configured threshold so the two labels scale together and callers get predictable severity distributions when tuning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_download wrote response bytes directly to the destination path, which is the same path the local-fallback loader reads from. If the HTTP call raised after partial write, or the process was killed mid-write, the cached sample file would be left truncated and the subsequent "download_failed_falling_back" branch would then read corrupt data from disk. Stream to a .part sibling and atomically rename on success so a failed download never clobbers a previously good cached copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dbt writes test failures, compilation errors, and model runtime errors to stdout, not stderr. The previous run_dbt only logged stderr on non-zero exit, so the actual diagnostic output was silently dropped from both the Prefect logs and the RuntimeError message, leaving operators to rerun dbt manually to see what broke. Log both streams on failure, include the return code, and fall back to stdout in the exception message when stderr is empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Five small, independent fixes uncovered during a read-only review pass. Each is its own atomic commit.
Commits
Test plan
pytest tests/test_clean.py tests/test_forecast.py