Skip to content

feat: archive build/deploy logs to MinIO for post-eviction retrieval#119

Open
vigneshrajsb wants to merge 4 commits intomainfrom
feat/archive-build-logs
Open

feat: archive build/deploy logs to MinIO for post-eviction retrieval#119
vigneshrajsb wants to merge 4 commits intomainfrom
feat/archive-build-logs

Conversation

@vigneshrajsb
Copy link
Contributor

Problem

Build and deploy job logs are permanently lost once k8s Job pods are evicted or TTL-expired (~24h):

  • Job history is fetched entirely from live k8s (getNativeBuildJobs, getDeploymentJobs)
  • Logs are streamed from live pods via WebSocket
  • When pods disappear the UI renders a broken NotFound state with no recovery path

Solution

Add MinIO as an optional in-cluster S3-compatible object store. Logs are archived at job completion time and served back transparently — the UI sees a new Archived status instead of NotFound.

Architecture

Job completes → archive logs.txt + metadata.json to MinIO
                 └── {namespace}/{jobType}/{serviceName}/{jobName}/

Pod evicted after TTL...

UI requests log stream info → backend returns status='Archived' + archivedLogs text
UI requests job list       → archived jobs merged into live k8s results (deduplicated by jobName)

Changes

New files

File Purpose
src/server/lib/objectStore/s3Client.ts MinIO client singleton (env-configured)
src/server/services/logArchival.ts LogArchivalService: archiveLogs, getArchivedLogs, listArchivedJobs, ensureBucket, configureRetention
src/server/services/types/logArchival.ts ArchivedJobMetadata interface

Modified files

File Change
src/shared/config.ts Export MINIO_* env vars with safe defaults
next.config.js Add MinIO vars to serverRuntimeConfig
src/server/services/types/globalConfig.ts Add logArchival?: { enabled, retentionDays }
src/server/services/types/logStreaming.ts Add 'Archived' status; add archivedLogs? field
src/server/lib/nativeBuild/engines.ts Archive build logs after job completes (success + failure paths)
src/server/lib/nativeHelm/helm.ts Archive deploy logs after job completes
src/server/lib/kubernetes/getNativeBuildJobs.ts Merge archived build jobs; add source field to BuildJobInfo
src/server/lib/kubernetes/getDeploymentJobs.ts Merge archived deploy jobs; add source field to DeploymentJobInfo
src/server/services/logStreaming.ts Fall back to archived log lookup when k8s returns NotFound
helm/web-app/Chart.yaml Add MinIO subchart dependency (disabled by default)
helm/environments/local/lifecycle.yaml Add minio: config section

Key design decisions

Feature-gated: all MinIO calls check globalConfig.logArchival?.enabled. Enabling the infra (MinIO pod) is safe — nothing archives until the flag is set in DB.

Non-blocking: archival failures are caught and logged as warnings — they never fail the build/deploy flow.

Deduplication: merged archived jobs are deduplicated by jobName against live k8s results, so a completing job never appears twice.

Enabling

  1. Deploy MinIO via the companion helm-charts PR
  2. Insert into global_config:
{ "logArchival": { "enabled": true, "retentionDays": 14 } }

Related PRs

Test plan

  • pnpm lint passes ✅
  • pnpm ts-check — no new errors (3 pre-existing in engines.ts) ✅
  • pnpm test — 951/951 pass ✅
  • With logArchival.enabled=false (default): system behavior identical to before, no MinIO calls
  • With logArchival.enabled=true: trigger a build, verify logs.txt + metadata.json appear in MinIO bucket
  • Delete the job pod manually, verify it still appears in the build job list with source='archived'
  • Click the archived job in the UI — logs render via staticContent (not WebSocket)

🤖 Generated with Claude Code

vigneshrajsb and others added 3 commits March 1, 2026 14:38
Adds a pino formatters.level option so logs include string severity
labels (e.g. "level":"info") rather than numeric codes (e.g. "level":30).
This fixes log severity mapping in Groundcover.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Build and deploy job logs are permanently lost once k8s Job pods are
evicted or TTL-expired (~24h). This adds MinIO as an optional in-cluster
object store to archive logs at completion time, serving them back to the
UI even after the live pods are gone.

## New files
- src/server/lib/objectStore/s3Client.ts
  MinIO client singleton configured via MINIO_* env vars

- src/server/services/logArchival.ts
  LogArchivalService with archiveLogs, getArchivedLogs, getArchivedMetadata,
  listArchivedJobs, ensureBucket, configureRetention

- src/server/services/types/logArchival.ts
  ArchivedJobMetadata interface

## Modified files
- src/shared/config.ts / next.config.js
  Export MINIO_ENDPOINT, MINIO_PORT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY,
  MINIO_BUCKET, MINIO_USE_SSL (all with safe defaults)

- src/server/services/types/globalConfig.ts
  Add logArchival?: { enabled: boolean; retentionDays: number } to GlobalConfig

- src/server/services/types/logStreaming.ts
  Add 'Archived' to status union; add archivedLogs?: string field

- src/server/lib/nativeBuild/engines.ts
  After waitForJobAndGetLogs(), archive logs when logArchival.enabled=true
  Both success and error paths are covered

- src/server/lib/nativeHelm/helm.ts
  Same pattern for native Helm deploy jobs

- src/server/lib/kubernetes/getNativeBuildJobs.ts
  Merge archived build jobs (not present in live k8s) into the listing
  Add source?: 'live' | 'archived' field to BuildJobInfo

- src/server/lib/kubernetes/getDeploymentJobs.ts
  Same for deploy jobs / DeploymentJobInfo

- src/server/services/logStreaming.ts
  When k8s returns NotFound, attempt archived log lookup before returning
  NotFound. Returns status='Archived' with archivedLogs when found.

- helm/web-app/Chart.yaml + helm/environments/local/lifecycle.yaml
  Add minio subchart dependency (disabled by default in local values)

## Storage schema
  lifecycle-logs/
    {namespace}/{jobType}/{serviceName}/{jobName}/
      logs.txt       - full log content
      metadata.json  - job info (status, duration, sha, engine, timestamps)

## Enabling
All archival ops are gated on globalConfig.logArchival.enabled.
Insert into global_config to activate:
  { "logArchival": { "enabled": true, "retentionDays": 14 } }

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vigneshrajsb vigneshrajsb requested a review from a team as a code owner March 1, 2026 23:14
- Fix JobMonitor log ordering: wait for job completion before fetching
  logs so the full output is captured rather than a mid-run snapshot
- Add startedAt/completedAt/duration to JobMonitor.getJobStatus via
  kubectl job JSON, thread timing through engines.ts and helm.ts so
  archived metadata has accurate timestamps
- Upgrade live k8s jobs with no pod to source='archived' when an
  archive exists in MinIO, so they remain selectable in the UI
- Extend logStreaming archived fallback to also trigger when the k8s
  job exists but its pod has been cleaned up (!podInfo.podName)
- Add source field to NativeBuildJobInfo OpenAPI schema
- Add MinIO helm_resource to Tiltfile; remove erroneous minio subchart
  dependency from helm/web-app/Chart.yaml
- Add ALLOWED_ORIGINS to local lifecycle.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant