Remove deployment logic from build script by zschira · Pull Request #5235 · catalyst-cooperative/pudl

zschira · 2026-05-13T14:45:34Z

Overview

Closes #5116.

What problem does this address?

This PR removes deployment logic from the nightly build script, so we can have isolate deployment and builds into their own workflows.

What did you change?

Removes deployment logic from build script
Rename build-deploy-pudl workflow to build-pudl
Sets up deployment workflow to run when build-pudl completes successfully

Documentation

Make sure to update relevant aspects of the documentation:

Update the release notes: reference the PR and related issues.
Update dev docs

Good to do, but out of scope

Migrate build logic to python / dagster
Send notifications with zulip instead of importing from slack
Separate EQR build / deployment logic

zschira · 2026-05-13T15:59:42Z


+  workflow_run:
+    workflows: ["build-pudl"]
+    branches:


Could be cool to have this automatically kick off a staging deployment if a build completes from a non-main branch, although there wouldn't be an associated tag, so we'd have to rework some things to handle that.

zschira · 2026-05-13T16:27:28Z

    echo "Copying outputs to GCP bucket $PUDL_GCS_OUTPUT" &&
        gcloud storage --quiet cp -r "$PUDL_OUTPUT" "$PUDL_GCS_OUTPUT" &&
        gcloud storage --quiet cp -r "dbt/seeds/etl_full_row_counts.csv" "$PUDL_GCS_OUTPUT" &&
+        gcloud storage --quiet cp "$LOGFILE" "$PUDL_GCS_OUTPUT" &&


We used to wait to copy the logfile, but copying outputs to GCS is now the last step, so we can add the logfile to this function

Any thoughts on what we want to do with the deploy-pudl logs? I guess we can just go look at them directly in Batch 🤷

jdangerx

Looks mostly good, and ripping out all that code is so satisfying :)

A few blocking concerns about the wiring of the workflow, plus a few changes we need to make to pudl.scripts.deploy itself before we can fully rip out the functionality in pudl_batch.sh.

One option we have, actually, is just to have both the pudl.scripts.deploy flow and the pudl_batch.sh flow running in parallel first - don't delete anything out of pudl_batch yet, and just force pudl.scripts.deploy to deploy to staging. Then we can see if the GHA wiring is working properly.

If we do that, it might be worth adding a staging version of the data.catalyst.coop service if we wanted to really be able to verify that things work. Something that only logged-in cats can access, and auto-scales down to 0 (like our user metrics internal app).

Finally, we should probably do something to zip up the new fercX_xbrl/* files (probably in pudl.scripts.deploy) before triggering the zenodo release so we don't run into the subdirectories/50 file limit - but I'm ok with punting that to another PR.

jdangerx · 2026-05-13T20:55:58Z

-            --container-arg="${{ inputs.git_tag }}" \
+            --container-arg="${{ env.GIT_TAG }}" \
            --container-arg="--staging" \
            --container-arg="${{ inputs.staging }}" \


blocking: by my read, this would call pudl_deploy $GIT_TAG --staging with no actual staging value in the workflow_run case - does that actually deploy to s3://.../staging? or /nightly? or something else?

If we have no workflow_run trigger because we end up switching to workflow_dispatch from the pudl_batch.sh, then this is moot 😅

jdangerx · 2026-05-13T20:59:10Z

    return 1
 }

-function deploy_data_viewer() {


blocking: looks like update_pudl_viewer function in deploy script isn't called, and uses the gcloud cli directly instead of triggering the workflow - we should update.

jdangerx · 2026-05-13T21:01:37Z

    echo "Copying outputs to GCP bucket $PUDL_GCS_OUTPUT" &&
        gcloud storage --quiet cp -r "$PUDL_OUTPUT" "$PUDL_GCS_OUTPUT" &&
        gcloud storage --quiet cp -r "dbt/seeds/etl_full_row_counts.csv" "$PUDL_GCS_OUTPUT" &&
+        gcloud storage --quiet cp "$LOGFILE" "$PUDL_GCS_OUTPUT" &&


Any thoughts on what we want to do with the deploy-pudl logs? I guess we can just go look at them directly in Batch 🤷

jdangerx · 2026-05-13T21:03:13Z

Hmm. As main changed out from under us we probably need to do another re-lock. If you use pixi self-update && git checkout main -- pixi.lock && pixi lock that should get you the cleanest diff (remember to get your local main up to date!)

jdangerx · 2026-05-13T21:06:44Z

+    branches:
+      - main
+    types:
+      - completed


Doesn't build-pudl complete right after submitting the batch job? In which case when we run this workflow via this trigger, the build artifacts won't be there yet... should we switch to workflow_dispatch from the pudl_batch.sh script like we do for zenodo?

jdangerx · 2026-05-14T17:42:24Z

Zach & I talked synchronously, here are the decisions

The safest/most confident path is: add pudl-viewer staging env, then make pudl_batch.sh deploy to staging with the deploy-pudl workflow as well as deploying to production with the legacy workflow, then once the staging deploy is verified, we can cut over to only using deploy-pudl. We will do this.
we do need to trigger the deploy-pudl workflow via GH API request, not via workflow_run because of the timing / GCP delegation.
we should remember to use the workflow trigger for deploying pudl viewer in deploy-pudl workflow.
we are OK with just having deploy-pudl logs on Batch until further notice
We probably need to put in a Zulip notification for deploy-pudl status.
we'll do the XBRL parquet cleanup for Zenodo as a separate change to deploy-pudl after this works.

zschira · 2026-05-15T14:32:42Z

@jdangerx I've been thinking about creating a pudl-viewer staging environment and wanted to run my current plan by you before leaving.

Plan

Refactor pudl-viewer.tf to a reusable terraform module
Use module to create prod/staging resources where staging has a few changes from prod:
- iap_enabled = true so only catalysters have access
- min_instance_count = 0 to scale down when not in use
Modify build-deploy workflow with a staging path that will produce a docker image tagged as staging. Also needs to point viewer at staging parquet files in s3

Open questions

Do we reuse the prod cloud sql instance / auth0 instances?
- I lean towards yes for simplicity, although it could be safer to have separate instances. Could mitigate danger by making the staging instance have read only access to cloud sql

jdangerx · 2026-05-15T15:57:07Z

@jdangerx I've been thinking about creating a pudl-viewer staging environment and wanted to run my current
plan by you before leaving.

Great! I'm going to answer out of order. Mostly feels right 😄

Open questions

* Do we reuse the prod cloud sql instance / auth0 instances?
  
  * I lean towards yes for simplicity, although it could be safer to have separate instances. Could mitigate danger by making the staging instance have read only access to cloud sql

Re: Postgres:

Definitely don't think it's worth a new cloud sql instance.
Looks like Cloud Run supports docker compose, so we can use a similar local-postgres setup as we do in dev: https://docs.cloud.google.com/run/docs/deploy-run-compose

Re: Auth0:

it's not too annoying to set up a new auth0 app
the simplest thing to do is just set PUDL_INTEGRATION_TEST=true in the env, but then we won't be able to test login/logout flow on staging, but also maybe that doesn't matter too much.

Plan

* Refactor `pudl-viewer.tf` to a reusable terraform module

I wonder if making pudl-viewer-staging.tf as a separate module would be easier honestly - if we're not doing CloudSQL or Auth0 then we don't need all that secret-wiring machinery, and also if we're doing a docker-compose setup then it will be kind of different anyways.

* Use module to create prod/staging resources where staging has a few changes from prod:
  
  * `iap_enabled = true` so only catalysters have access
  * `min_instance_count = 0` to scale down when not in use

These settings sound great to me!

* Modify `build-deploy` workflow with a staging path that will produce a docker image tagged as `staging`. Also needs to point viewer at staging parquet files in s3

Looks like we already tag the image with the git ref in build-deploy, and then refer to that same git ref when we're running deploy, so we probably don't need to add a special staging tag for build. I sort of think of staging as "special low stakes deploy target" but build probably doesn't need to know about it...

Pointing viewer at staging parquet files in S3 is a good catch - we should definitely make that S3 base path configurable at runtime, via env var or something.

zschira · 2026-05-15T18:17:54Z

Looks like Cloud Run supports docker compose

Oh cool, I didn't realize they'd started to support compose! It looks like this is a new feature and there's not any terraform support yet, which is not ideal, but be ok to have a staging env that's not fully managed by terraform. I definitely think the simplest path would be to use compose with no cloud sql / auth0, and we can always add auth0 down the line if needed.

zaneselvans · 2026-05-21T06:53:39Z

@zschira Given that @jdangerx did a pretty thorough review, what role would you like me to play on this PR now? Should I do my own separate review before you work on addressing the things DX brought up? Or wait for the first round of changes to land and then look at it?

…udl into rewire-deployment

zschira · 2026-06-11T20:21:38Z

Ok @zaneselvans this is finally ready for another look. The big changes I've made since @jdangerx's review are:

Kick off deployment in build script so it happens after build is fully complete
Actually trigger eel-hole deployment (including staging deployment, which I need to create a PR for over there, but it is ready)
Zip FERC XBRL files to stay up-to-date with old build script
Added a branch build/deploy pipeline
- Now if you manually trigger a build with the build-pudl action, it will run a "branch build" will trigger the deployment with staging=True so we can fully inspect outputs without overwriting our nightly or stable data

zschira added 2 commits May 12, 2026 17:02

Remove deploy logic from build workflow

04b616f

Trigger deploy-pudl workflow automatically on build success

f719097

zschira added this to Catalyst Megaproject May 13, 2026

github-project-automation Bot moved this to New in Catalyst Megaproject May 13, 2026

zschira moved this from New to In progress in Catalyst Megaproject May 13, 2026

zschira self-assigned this May 13, 2026

zschira added 2 commits May 13, 2026 10:54

Update docs to reflect deployment/build decoupling

2fed041

Merge branch 'main' into rewire-deployment

fba1a09

zschira marked this pull request as ready for review May 13, 2026 15:00

zschira commented May 13, 2026

View reviewed changes

zschira requested review from jdangerx and zaneselvans May 13, 2026 16:01

chore: re-lock pixi for new pixi format

eb42076

zschira commented May 13, 2026

View reviewed changes

jdangerx requested changes May 13, 2026

View reviewed changes

Merge branch 'main' into rewire-deployment

9b5b023

zaneselvans added dagster Issues related to our use of the Dagster orchestrator nightly-builds Anything having to do with nightly builds or continuous deployment. labels May 15, 2026

zschira marked this pull request as draft May 28, 2026 15:43

zschira added 3 commits June 4, 2026 10:01

Merge branch 'main' into rewire-deployment

ac81120

Merge branch 'rewire-deployment' of github.com:catalyst-cooperative/p…

7f487eb

…udl into rewire-deployment

Fix kicking off deployment

4dd144d

zaneselvans mentioned this pull request Jun 5, 2026

Migrate to Zulip notifications #5298

Merged

13 tasks

zschira added 2 commits June 5, 2026 15:58

Merge branch 'main' into rewire-deployment

5176f7a

Merge branch 'main' into rewire-deployment

7b692ce

zschira added 12 commits June 9, 2026 09:50

Fix json/curl for launching deployment

9194df9

Fix setting git_tag env variable

6827ac3

Actually fix setting git_tag env variable

27cd4fb

Make sure to always tag builds

acf6f02

Merge branch 'main' into rewire-deployment

df06a4f

Fix handling of branch deployments

f61d6a4

Merge in build changes to use zulip notifications

028e2ca

Update deployment / build docs

81572d7

Remove unhelpful logs

c53afb3

Fix eel-hole deployment

6c88fdd

Zip ferc parquet files for distribution

81a644f

Merge branch 'main' into rewire-deployment

144e0c0

zschira marked this pull request as ready for review June 11, 2026 20:16

Uh oh!

Conversation

zschira commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What problem does this address?

What did you change?

Documentation

Good to do, but out of scope

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdangerx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdangerx commented May 14, 2026

Uh oh!

zschira commented May 15, 2026

Plan

Open questions

Uh oh!

jdangerx commented May 15, 2026

Open questions

Plan

Uh oh!

zschira commented May 15, 2026

Uh oh!

zaneselvans commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zschira commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zschira commented May 13, 2026 •

edited

Loading

zaneselvans commented May 21, 2026 •

edited

Loading