Skip to content

ci(os-49): migrate VM release builds to shared runners#1177

Closed
jtoelke2 wants to merge 5 commits intomainfrom
os-49-vm-build-package-migration/jt
Closed

ci(os-49): migrate VM release builds to shared runners#1177
jtoelke2 wants to merge 5 commits intomainfrom
os-49-vm-build-package-migration/jt

Conversation

@jtoelke2
Copy link
Copy Markdown
Collaborator

@jtoelke2 jtoelke2 commented May 5, 2026

Summary

  • Move VM release build/package workflows from old ARC runner labels to shared CPU runner labels.
  • Remove remaining VM release SCCACHE_MEMCACHED_ENDPOINT usage.
  • Force local Buildx for VM macOS Docker builds and document that VFIO GPU passthrough remains dedicated-host validation.
  • Add a manual release_tag input, defaulting to vm-dev, so the real release upload/prune path can be validated against a scratch release tag without mutating production vm-dev.

Related Issue

  • OS-49
  • OS-131

Changes

  • release-vm-dev.yml: uses linux-amd64-cpu8 / linux-arm64-cpu8, local Buildx for VM macOS Docker builds, no memcached sccache env, configurable manual release tag, and scratch-safe Cargo version handling.
  • release-vm-kernel.yml: uses shared CPU labels for Linux runtime and release upload jobs, uploads the Linux ARM64 kernel bundle needed by the macOS runtime build, and supports a configurable manual release tag.
  • Architecture docs now distinguish VM build/package migration from actual VFIO GPU passthrough validation.

Testing

  • ruby -e 'require "yaml"; ARGV.each { |f| YAML.load_file(f); puts "OK #{f}" }' .github/workflows/release-vm-dev.yml .github/workflows/release-vm-kernel.yml
  • rg -n "gh release download vm-dev|tag_name: vm-dev|git tag -fa vm-dev|git push --force origin vm-dev|tag: 'vm-dev'|tag: \"vm-dev\"" .github/workflows/release-vm-dev.yml .github/workflows/release-vm-kernel.yml
  • git diff --check
  • Scratch Release VM Kernel with release_tag=vm-dev-os49-1177: success, https://github.com/NVIDIA/OpenShell/actions/runs/25405053839
  • Scratch Release VM Dev with release_tag=vm-dev-os49-1177: success, https://github.com/NVIDIA/OpenShell/actions/runs/25415223478
  • Scratch release vm-dev-os49-1177 contains 11 expected assets: Linux/macOS VM binaries, Linux/macOS driver VM binaries, Linux/macOS runtime tarballs, and checksum files.
  • mise run pre-commit failed on existing unrelated Rust unit test ssh::tests::launch_editor_returns_friendly_error_when_binary_missing; it also reported markdownlint failures in untracked .codex-learning/*.md files that are not part of this PR. Rust check, fmt, clippy, Python checks/tests, Helm lint, license check, and Mermaid lint completed before that failure.

Checklist

  • Commits are signed and signed off
  • Architecture docs updated
  • Scratch release validation before touching production vm-dev

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

drew
drew previously approved these changes May 5, 2026
@jtoelke2 jtoelke2 force-pushed the os-49-vm-build-package-migration/jt branch from 080e5ea to d214bd3 Compare May 6, 2026 13:53
@jtoelke2 jtoelke2 marked this pull request as ready for review May 6, 2026 14:09
@jtoelke2
Copy link
Copy Markdown
Collaborator Author

jtoelke2 commented May 6, 2026

Rebased on latest main (f17806ca) and resolved the only conflict in architecture/ci-e2e.md by keeping the new mise-lockfile note from main plus the #1177 VM build/package migration wording.

Post-rebase validation:

  • Workflow YAML parse passed for release-vm-dev.yml and release-vm-kernel.yml.
  • git diff --check origin/main...HEAD passed.
  • Targeted grep found no conflict markers or old build-amd64 / build-arm64 / EKS BuildKit / VM memcached references in the touched workflow/doc paths.
  • mise run pre-commit was rerun and still fails only on unrelated local issues: untracked .codex-learning/*.md markdownlint findings plus existing Rust unit test ssh::tests::launch_editor_returns_friendly_error_when_binary_missing.

Scratch release validation from before the rebase still applies to the same workflow changes:

The PR is ready for review again. It still intentionally excludes VFIO passthrough validation.

@jtoelke2 jtoelke2 requested a review from drew May 6, 2026 14:09
@jtoelke2
Copy link
Copy Markdown
Collaborator Author

jtoelke2 commented May 6, 2026

Closing as superseded by #1186, #1195, and #1210. #1186 moved VM driver publishing into the normal dev/tag release path, #1195 removed remaining EKS-specific assumptions from VM/release-adjacent workflows, and #1210 handled the Release Canary bootstrap issue seen afterward. The scratch validation evidence remains captured in OS-49.

@jtoelke2 jtoelke2 closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants