-
Notifications
You must be signed in to change notification settings - Fork 118
feat: add bigconfig files for braze data #7890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Successfully generated a bigeye plan using |
This comment has been minimized.
This comment has been minimized.
{{ min_row_count(1) }} | ||
|
||
#fail | ||
{{ is_unique(["mozilla_subscription_id", "firefox_subscription_id", "mozilla_dev_subscription_id"]) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will check that the combination of all of those columns is unique:
https://github.com/mozilla/bigquery-etl/blob/main/tests/checks/is_unique.jinja
the is_unique
saved metric in bigeye config would check each column individually therefore this is not exactly what you need to translate this check to bigeye metric. You will need to use this:
#7862 (once it is merged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we expect each column to be individually unique in this table, so i don't think a grouped uniqueness check is necessary here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, in that case the bigConfig configuration should be correct.
-- macro checks | ||
|
||
#fail | ||
{{ min_row_count(85000000) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Freshness and volume are not quite the same as this metric. You probably want to use the predefined COUNT_ROWS
metric with a minimum threshold defined:
https://docs.bigeye.com/docs/available-metrics (search for COUNT_ROWS
)
{{ min_row_count(1) }} | ||
|
||
#warn | ||
{{ is_unique(["external_id", "email"]) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly this will check that the combination of the two is unique and not that each of these columns has unique values which is what the bigeye config would check. So slightly different behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on this one, we do want it grouped. i think a user could theoretically have multiple accounts using the same email
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but, also, external_id
should be individually unique - assuming there is a way to do both :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might need to leverage the custom rules workaround here in order to do combined column uniqueness. I didn't see any way to concatenate or combine these fields as a native functionality within a bigconfig file in the docs 😢
{{ not_null(["external_id", "email", "email_subscribe", "has_fxa", "create_timestamp", "update_timestamp"]) }} | ||
|
||
#fail | ||
{{ min_row_count(75000000) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Freshness and volume are not quite the same as this metric. You probably want to use the predefined COUNT_ROWS metric with a minimum threshold defined:
https://docs.bigeye.com/docs/available-metrics (search for COUNT_ROWS)
{{ min_row_count(75000000) }} | ||
|
||
#fail | ||
{{ is_unique(["external_id", "email", "fxa_id_sha256"]) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these provide a unique value as a set or should each of these columns contain unique values irrespectively of the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
external_id
should be individually unique and fxa_id_sha256
should be individually unique, email may not always be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LiamMcFall I wonder if it would make sense to just remove email from the is_unique list in the bigConfig and leave it at that since we expect the same email to potentially appear more than once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too, if the other 2 are individually unique then it would still satisfy the uniqueness criteria even if emails can appear multiple times.
{{ not_null(["external_id", "email", "email_subscribe", "has_fxa", "create_timestamp", "update_timestamp"]) }} | ||
|
||
#fail | ||
{{ min_row_count(85000000) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comments related to min_row_count.
{{ min_row_count(85000000) }} | ||
|
||
#fail | ||
{{ is_unique(["external_id", "email", "fxa_id_sha256"]) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comments related to is_unique containing more than one column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
external_id
should be individually unique and fxa_id_sha256
should be individually unique, email may not always be
Integration report for "Merge branch 'main' into DENG-8898"
|
Description
This PR add bigconfig files for
braze_derived
andbraze_external
datasets.Related Tickets & Documents
Reviewer, please follow this checklist