Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 14 additions & 11 deletions dbt/models/model/model.training_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,20 @@ def model(dbt, session):
on_schema_change="append_new_columns",
)

# Build the base metadata DataFrame
base_query = """
SELECT
run_id,
year,
assessment_year,
dvc_md5_training_data
FROM model.metadata
WHERE run_type = 'final'
"""
metadata_df = session.sql(base_query)
# Get model metadata for every final model. We do this by inner joining
# The `metadata` table to the `final_model` table instead of filtering
# the metadata table by `run_type == 'final'` to make it easier to run
# tests on this table, since we can control the contents of `final_model`
# via a dbt seed
Comment on lines +14 to +18
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came to realize this quirk of the model while testing a staging HomeVal deployment, so I figured I'd persist the change to make future HomeVal staging deployments easier.

metadata_df = (
dbt.source("model", "metadata")
.join(
dbt.ref("model.final_model").select("run_id"),
Comment on lines +20 to +22
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to dbt.source() and dbt.ref() for our queries here has the effect of including these upstream models in the directed graph that dbt builds for this model. That means that when we merge any changes to the model.final_model_raw seed in the future, our workflow will also rebuild model.training_data when it builds all children of modified resources.

on="run_id",
how="inner",
)
.select("run_id", "year", "assessment_year", "dvc_md5_training_data")
)

if dbt.is_incremental:
# anti-join out any run_ids already in the target
Expand Down
2 changes: 1 addition & 1 deletion dbt/models/pinval/pinval.assessment_card.sql
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ SELECT
WHEN
pin_cd.class_code IS NULL -- Class is not in our class dict
OR NOT pin_cd.regression_class
OR (pin_cd.modeling_group NOT IN ('SF', 'MF'))
OR (pin_cd.modeling_group NOT IN ('SF', 'MF', 'BB'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do include B&Bs in the training and assessment sets for the model, even though they usually get modeled by hand. That means that a B&B will wind up with is_report_eligible == TRUE and reason_report_ineligible == 'non_regression_class' unless we allow the 'BB' modeling group here. This is not really a huge deal, since it doesn't affect the HomeVal reports that we generate for these PINs, but it means that all B&Bs fail the pinval_assessment_card_reason_report_ineligible_is_null_when_is_report_eligible data test, which adds unnecessary noise to that test.

THEN 'non_regression_class'
WHEN LOWER(uni.triad_name) != LOWER(uni.assessment_triad) THEN 'non_tri'
WHEN ac.meta_card_num IS NULL THEN 'missing_card'
Expand Down
3 changes: 2 additions & 1 deletion dbt/models/pinval/pinval.comp.sql
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ training_data AS (
-- that have multiple final models.
WHERE train.run_id IN (
'2024-03-17-stupefied-maya',
'2025-02-11-charming-eric'
'2025-02-11-charming-eric',
'2026-02-11-recursing-rob'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Thought, non-blocking]: I wonder if it would make sense to add another column in pinval.model_run.csv such that these could be algorithmically picked from that seed. It irks me we can't use it for this filter here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I've been thinking about that too! I added a note to myself to discuss during our 2026 modeling retrospective.

)
),

Expand Down
2 changes: 1 addition & 1 deletion dbt/models/pinval/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ models:
# There are a handful of known PINs that had the wrong class
# at modeling time, and so got a model value even though it
# should not have
error_if: ">28"
error_if: ">6"
- not_accepted_values:
name: pinval_assessment_card_reason_report_ineligible_not_non_tri_for_tri
values:
Expand Down
2 changes: 2 additions & 0 deletions dbt/seeds/model/model.final_model_raw.csv
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ year,run_id,triad_name,type,is_final,date_chosen,date_emailed,date_finalized,rea
2024,2024-03-11-pensive-manasi,City,condo,TRUE,3/12/2024,3/12/2024,3/12/2024,Added sales missing from 2024-02-16-silly-billy,"[70,71,72,73,74,76,77]",2024-02-16-silly-billy,2024 Initial Model Values (Condos),Re-run version of 2024-02-16-silly-billy with missing sales added. See data-architecture PR #334.
2025,2025-02-11-charming-eric,North,res,TRUE,2/14/2025,2/14/2025,2/14/2025,Best performance,"[10,16,17,18,20,22,23,24,25,26,29,35,38]",,2025 Initial Model Values (Residential),"Full COMPS run for standard hyperparams, highway traffic and school rating features dropped"
2025,2025-02-10-cattywampus-christian,North,condo,TRUE,2/14/2025,2/14/2025,2/14/2025,Best performance,"[10,16,17,18,20,22,23,24,25,26,29,35,38]",,2025 Initial Model Values (Condos),"Final condo model candidate with CV params from 2025-02-08-youthful-carly, removed highway traffic"
2026,2026-02-11-recursing-rob,South,res,TRUE,2/13/2026,2/13/2026,2/13/2026,Best performance,"[11,12,13,14,15,19,21,27,28,30,31,32,33,34,36,37,39]",,2026 Initial Model Values (Residential),"Try for final model with CV hyperparams, fixed Pace stop data, and new proration flag"
2026,2026-02-12-gallant-bowen,South,condo,TRUE,2/13/2026,2/13/2026,2/13/2026,Best performance,"[11,12,13,14,15,19,21,27,28,30,31,32,33,34,36,37,39]",,2026 Model Values (Condo),CV and shap run with fixed pace data
3 changes: 3 additions & 0 deletions dbt/seeds/pinval/pinval.model_run.csv
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@ assessment_year,type,run_id
2025,card,2025-02-11-charming-eric
2025,shap,2025-04-25-fancy-free-billy
2025,comp,2025-06-14-flamboyant-rob
2026,card,2026-02-11-recursing-rob
2026,shap,2026-02-11-recursing-rob
2026,comp,2026-02-13-peaceful-rina
Loading