Skip to content

feat(ci): automate the full release pipeline — auto-tag, S3 upload, and CFN auto-deploy (with AMI bump) #44

@kaio6fellipe

Description

@kaio6fellipe

Why is this needed:

The current release-and-deploy flow for jit-runners has three pieces that each ship the same version but are wired together by hand:

  1. Tagging: vX.Y.Z tags are created and pushed manually by an operator.
  2. Lambda artifact placement: after release.yml (GoReleaser) publishes the GitHub Release, an operator downloads the three *-linux-amd64.zip files, renames them to webhook.zip / scaleup.zip / scaledown.zip (the basenames the live stack expects), and aws s3 cps each into s3://jit-runners-lambda-s3/vX.Y.Z/.
  3. CloudFormation rollout: an operator runs aws cloudformation update-stack against the jit-runners stack to repoint the three *LambdaS3Key parameters at the new keys, and — when a new AMI is produced by ami-build.yml — also updates DefaultAMI to the freshly-published jit-runner-vX.Y.Z-... AMI.

This was just exercised end-to-end for v0.3.0 and v0.3.1. The full procedure is documented in docs/release.md, which makes the steps reproducible but not self-running. The manual flow is error-prone (asset names, key paths, --capabilities CAPABILITY_NAMED_IAM, AMI ID lookup), it requires an operator with active AWS credentials at release time, and it is the main reason why a release isn't a one-click event today.

What would you like to be added:

A CI-driven release pipeline that fully replaces the runbook in docs/release.md. Three coordinated additions:

1. Auto-tag on merge (new workflow auto-tag.yml)

Trigger: pull_request closed on main with merged == true.

Behavior — modeled on kaio6fellipe/event-driven-bookinfo auto-tag.yml:

  • Read the merge commit SHA from the PR event.
  • Determine bump type from priority list:
    1. PR labels (major, minor) — explicit override.
    2. Conventional-commit prefix in the squash-merge commit message: BREAKING CHANGE or ^[a-z]+(\(.+\))?!:major; ^feat(\(.+\))?:minor; otherwise → patch.
  • Resolve the latest existing v* tag and bump it accordingly.
  • Create + push the new annotated tag.
  • The existing release.yml and ami-build.yml already fire on v* tag pushes, so no further dispatch is needed at this step.

This drops the need to ever run git tag -a v… && git push origin v… by hand.

2. Post-release Lambda S3 upload (extend release.yml)

After the existing GoReleaser job completes successfully, add a follow-up job (or step) that:

  • Authenticates to AWS via OIDC role assumption (aws-actions/configure-aws-credentials), using a new repository secret LAMBDA_DEPLOY_ROLE_ARN.
  • Renames the three GoReleaser archives — which .goreleaser.yml currently produces as webhook-linux-amd64.zip, scaleup-linux-amd64.zip, scaledown-linux-amd64.zip (per the existing archives.name_template: <id>-{{ .Os }}-{{ .Arch }}) — to the basenames the live stack expects: webhook.zip, scaleup.zip, scaledown.zip. (Alternative: change name_template so GoReleaser emits the bare basenames directly and skip the rename. Either approach is acceptable.)
  • Uploads them to s3://jit-runners-lambda-s3/${TAG}/ where ${TAG} is the pushed tag (e.g. v0.3.1/webhook.zip).
  • Verifies post-upload listing matches the three expected keys.

The OIDC role's IAM policy should be scoped tightly:

  • s3:PutObject on arn:aws:s3:::jit-runners-lambda-s3/v*/* (trailing /* so only objects under tag-prefixes can be written, never bucket-level metadata or non-v* prefixes).
  • s3:ListBucket on arn:aws:s3:::jit-runners-lambda-s3 with a Condition on s3:prefix matching v*/* if a stricter list is desired.

3. Auto-update the CFN stack (new workflow deploy-stack.yml)

Recommended trigger pattern: a separate deploy-stack.yml triggered by workflow_run.completed on ami-build.yml (which is the slower of the two upstream pipelines and therefore the natural last gate). Inside the workflow's first step, assert that release.yml for the same tag has also completed successfully — query via gh api repos/${{ github.repository }}/actions/workflows/release.yml/runs?head_sha=${{ github.event.workflow_run.head_sha }}. If release.yml is still running, the deploy job should wait/poll up to a bounded timeout; if it failed, abort with a clear error. This keeps release.yml and ami-build.yml independent (no workflow_call coupling) and uses the longer pipeline as the explicit gate.

Behavior:

  • Determine the tag (e.g. via github.event.workflow_run.head_branch for tag pushes, or via workflow_dispatch input).
  • Resolve the new AMI ID by querying EC2: aws ec2 describe-images --owners self --filters Name=name,Values=jit-runner-${TAG}-* in us-east-2, sorted by CreationDate, take the latest. If no matching AMI exists (because ami-build.yml was skipped or failed for non-Packer reasons), keep the previous DefaultAMI via UsePreviousValue=true.
  • Capture pre-update CodeSha256 for the three Lambda functions for later verification.
  • Run aws cloudformation update-stack against jit-runners in us-east-2:
    • --use-previous-template
    • --capabilities CAPABILITY_NAMED_IAM
    • 3 new *LambdaS3Key values pointing at ${TAG}/{webhook,scaleup,scaledown}.zip
    • DefaultAMI=ami-xxxxxxxx (or UsePreviousValue=true when no AMI was built)
    • All other parameters set to UsePreviousValue=true.
  • Wait via aws cloudformation wait stack-update-complete.
  • Validate the three Lambda functions' CodeSha256 differ from the pre-update capture and post the result to the workflow summary.
  • On any failure, do not attempt automatic recovery — let CloudFormation's own rollback handle stack errors, surface the failure as a workflow annotation, and require human re-trigger after diagnosis.

The OIDC role for this job needs:

  • cloudformation:UpdateStack, cloudformation:DescribeStacks, cloudformation:DescribeStackEvents on the jit-runners stack ARN.
  • lambda:GetFunction on the three function ARNs (for SHA capture/verification).
  • s3:GetObject / s3:HeadObject on arn:aws:s3:::jit-runners-lambda-s3/v*/* (for the optional per-key existence pre-check before submitting update-stack, avoiding wasted CFN cycles).
  • ec2:DescribeImages (read-only, no resource scope available) for AMI resolution.
  • iam:PassRole on the existing Lambda execution role ARN (required when CloudFormation re-passes it during a Lambda code update).

Documentation

Update docs/release.md to be the failure-mode runbook (manual recovery if any of the above breaks). Keep the rollback procedure intact — the manual stack rollback update-stack against the previous version's keys is still the right escape hatch.

Reference

  • auto-tag.yml from kaio6fellipe/event-driven-bookinfo is the closest existing pattern in this org (single-repo variant — no service scoping needed).
  • For OIDC patterns and IAM scoping, look at how the existing ami-build.yml workflow already assumes AMI_BUILD_ROLE_ARN for AWS access.

Who is this feature for?

Maintainers of jit-runners and operators of the live jit-runners CloudFormation stack. The current manual flow is fine for an occasional release, but it doesn't scale, requires AWS credentials handled by a human, and accumulates risk every time a step is skipped or done in the wrong order. Automating the loop turns "merge a PR" into "production has the new code + AMI" with no out-of-band steps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions