Remove deployment logic from build script#5235
Conversation
|
|
||
| workflow_run: | ||
| workflows: ["build-pudl"] | ||
| branches: |
There was a problem hiding this comment.
Could be cool to have this automatically kick off a staging deployment if a build completes from a non-main branch, although there wouldn't be an associated tag, so we'd have to rework some things to handle that.
| echo "Copying outputs to GCP bucket $PUDL_GCS_OUTPUT" && | ||
| gcloud storage --quiet cp -r "$PUDL_OUTPUT" "$PUDL_GCS_OUTPUT" && | ||
| gcloud storage --quiet cp -r "dbt/seeds/etl_full_row_counts.csv" "$PUDL_GCS_OUTPUT" && | ||
| gcloud storage --quiet cp "$LOGFILE" "$PUDL_GCS_OUTPUT" && |
There was a problem hiding this comment.
We used to wait to copy the logfile, but copying outputs to GCS is now the last step, so we can add the logfile to this function
There was a problem hiding this comment.
Any thoughts on what we want to do with the deploy-pudl logs? I guess we can just go look at them directly in Batch 🤷
jdangerx
left a comment
There was a problem hiding this comment.
Looks mostly good, and ripping out all that code is so satisfying :)
A few blocking concerns about the wiring of the workflow, plus a few changes we need to make to pudl.scripts.deploy itself before we can fully rip out the functionality in pudl_batch.sh.
One option we have, actually, is just to have both the pudl.scripts.deploy flow and the pudl_batch.sh flow running in parallel first - don't delete anything out of pudl_batch yet, and just force pudl.scripts.deploy to deploy to staging. Then we can see if the GHA wiring is working properly.
If we do that, it might be worth adding a staging version of the data.catalyst.coop service if we wanted to really be able to verify that things work. Something that only logged-in cats can access, and auto-scales down to 0 (like our user metrics internal app).
Finally, we should probably do something to zip up the new fercX_xbrl/* files (probably in pudl.scripts.deploy) before triggering the zenodo release so we don't run into the subdirectories/50 file limit - but I'm ok with punting that to another PR.
| --container-arg="${{ inputs.git_tag }}" \ | ||
| --container-arg="${{ env.GIT_TAG }}" \ | ||
| --container-arg="--staging" \ | ||
| --container-arg="${{ inputs.staging }}" \ |
There was a problem hiding this comment.
blocking: by my read, this would call pudl_deploy $GIT_TAG --staging with no actual staging value in the workflow_run case - does that actually deploy to s3://.../staging? or /nightly? or something else?
If we have no workflow_run trigger because we end up switching to workflow_dispatch from the pudl_batch.sh, then this is moot 😅
| return 1 | ||
| } | ||
|
|
||
| function deploy_data_viewer() { |
There was a problem hiding this comment.
blocking: looks like update_pudl_viewer function in deploy script isn't called, and uses the gcloud cli directly instead of triggering the workflow - we should update.
| echo "Copying outputs to GCP bucket $PUDL_GCS_OUTPUT" && | ||
| gcloud storage --quiet cp -r "$PUDL_OUTPUT" "$PUDL_GCS_OUTPUT" && | ||
| gcloud storage --quiet cp -r "dbt/seeds/etl_full_row_counts.csv" "$PUDL_GCS_OUTPUT" && | ||
| gcloud storage --quiet cp "$LOGFILE" "$PUDL_GCS_OUTPUT" && |
There was a problem hiding this comment.
Any thoughts on what we want to do with the deploy-pudl logs? I guess we can just go look at them directly in Batch 🤷
There was a problem hiding this comment.
Hmm. As main changed out from under us we probably need to do another re-lock. If you use pixi self-update && git checkout main -- pixi.lock && pixi lock that should get you the cleanest diff (remember to get your local main up to date!)
| branches: | ||
| - main | ||
| types: | ||
| - completed |
There was a problem hiding this comment.
Doesn't build-pudl complete right after submitting the batch job? In which case when we run this workflow via this trigger, the build artifacts won't be there yet... should we switch to workflow_dispatch from the pudl_batch.sh script like we do for zenodo?
|
Zach & I talked synchronously, here are the decisions
|
|
@jdangerx I've been thinking about creating a Plan
Open questions
|
Great! I'm going to answer out of order. Mostly feels right 😄
Re: Postgres:
Re: Auth0:
I wonder if making
These settings sound great to me!
Looks like we already tag the image with the git ref in Pointing viewer at staging parquet files in S3 is a good catch - we should definitely make that S3 base path configurable at runtime, via env var or something. |
Oh cool, I didn't realize they'd started to support compose! It looks like this is a new feature and there's not any terraform support yet, which is not ideal, but be ok to have a staging env that's not fully managed by terraform. I definitely think the simplest path would be to use compose with no cloud sql / auth0, and we can always add auth0 down the line if needed. |
|
Ok @zaneselvans this is finally ready for another look. The big changes I've made since @jdangerx's review are:
|
Overview
Closes #5116.
What problem does this address?
This PR removes deployment logic from the nightly build script, so we can have isolate deployment and builds into their own workflows.
What did you change?
build-deploy-pudlworkflow tobuild-pudlbuild-pudlcompletes successfullyDocumentation
Make sure to update relevant aspects of the documentation:
Good to do, but out of scope