Skip to content

docs: mark ssm-deploys-retire-ssh as shipped#173

Open
prog-strength-developer[bot] wants to merge 1 commit into
mainfrom
feat/ssm-deploys-retire-ssh
Open

docs: mark ssm-deploys-retire-ssh as shipped#173
prog-strength-developer[bot] wants to merge 1 commit into
mainfrom
feat/ssm-deploys-retire-ssh

Conversation

@prog-strength-developer

Copy link
Copy Markdown
Contributor

Shipped: ssm-deploys-retire-ssh

Production deploys no longer ride an SSH channel: a standing, long-lived key that granted interactive shell as ubuntu (effectively root) and required port 22 open to the world. Deploys now run through AWS SSM Run Command authenticated by the existing OIDC role, app secrets live in infra-owned AWS Secrets Manager (seeded from GitHub, never on a deploy), and inbound port 22 is closed.

SOW: sows/ssm-deploys-retire-ssh.md

Implementation PRs

  • prog-strength-infra#46 — Secrets Manager containers (prog-strength-backend/prod/{api,mcp,agent}, values never in TF state) + instance-role GetSecretValue & AmazonSSMManagedInstanceCore + OIDC create/seed grants; on-host deploy/*.sh scripts that render .env from Secrets Manager; seed-secrets.yml; SSM-based deploy-caddy.yml; SSH (port 22) ingress removed; jq + SSM agent in bootstrap.
  • prog-strength-api#64release.yml + manual-deploy.yml deploy jobs → OIDC + aws ssm send-command invoking deploy/api.sh; dropped the 16-secret envs: forwarding; DEPLOYMENT.md scrubbed of EC2_HOST/EC2_SSH_KEY and host-layout corrected.
  • prog-strength-mcp#14release.yml + manual-deploy.yml → SSM Run Command invoking deploy/mcp.sh; README secrets table removed (no repo-level deploy secret needed).
  • prog-strength-agent#21release.yml + manual-deploy.yml → SSM Run Command invoking deploy/agent.sh; dropped the 4-secret envs: forwarding; README deploy note added.

Deployment

  1. prog-strength-infra (Add SOW: live workout logging session #46) — Merge and let apply.yml run first. This creates the Secrets Manager containers, grants the instance role GetSecretValue + SSM managed-node registration, grants the OIDC role create/seed, lands the deploy/*.sh scripts on the next infra pull, and closes port 22. Then run seed-secrets.yml to populate the blobs from the current GitHub secrets, and confirm the host shows as a managed node (aws ssm describe-instance-information). On the live host (it predates the bootstrap change): apt-get install -y jq and confirm the SSM agent is registered. Merges first because until the scripts exist on the host and the secrets are seeded, every service repo's SSM deploy would fail (no script to invoke, no secrets to read) — and the IAM/managed-node registration the service deploys assume comes from here.
  2. prog-strength-api (docs: mark planned-workout-activity-reconciliation as shipped #64), prog-strength-mcp (docs: mark bodyweight-goal-and-page-polish as shipped #14), and prog-strength-agent (docs: add system architecture diagram #21) — can merge in parallel because they don't depend on each other; each only depends on infra being deployed and seeded. Each repo's next release (or a Manual Deploy workflow_dispatch) then runs over SSM. Until infra deploys, these workflows' aws ssm send-command would 4xx (no script / no secret / role not yet granted), so do not merge them ahead of step 1.
  3. After all seven workflows have a green SSM run and port 22 is confirmed closed without breakage: delete the EC2_SSH_KEY and EC2_HOST org secrets in GitHub settings (out of band of these PRs). The app-config GitHub secrets stay — they seed Secrets Manager.

Verify Session Manager break-glass (aws ssm start-session) works before relying on the closed port — that is the safety gate against locking yourself out.

Verification after rollout

  • Host appears as a managed node: aws ssm describe-instance-information lists the instance (tag Name=prog-strength-prod-backend).
  • seed-secrets.yml run is green; aws secretsmanager get-secret-value --secret-id prog-strength-backend/prod/api returns the expected keys, and the jq render produces a .env byte-equivalent to the previous SSH-written one.
  • Convert-one-first: trigger agent Manual Deploy over SSM and confirm a green deploy with the service healthy and .env correct, before relying on the rest.
  • aws ssm start-session --target <instance-id> opens a shell (break-glass verified) before port 22 is closed.
  • With 22 closed: https://api.progstrength.fitness/health returns 200 (80/443 still serve) and an SSM deploy still succeeds.
  • deploy-caddy.yml (workflow_dispatch) reloads Caddy green and the Let's Encrypt certs persist.

Merging this PR flips ssm-deploys-retire-ssh to status: shipped in
prog-strength-docs/sows/ssm-deploys-retire-ssh.md — that is the canonical
signal the work is complete.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants