feat: add hub-worker binary and refactor config to cleanenv#50
feat: add hub-worker binary and refactor config to cleanenv#50
Conversation
WalkthroughThis pull request refactors the configuration system from flat environment variables to a nested, strongly-typed structure using Cleanenv, introducing new configuration groups for server, database, River, webhook, embedding, and observability settings. Concurrently, the codebase is restructured to support a separate worker executable alongside the existing API server. Both binaries are built during the compilation phase and deployed to the application. The River-based asynchronous job processing, previously integrated into the API, is now centralized in a dedicated worker application. Configuration references throughout the codebase are updated to use the new nested paths, and build tooling is adjusted to compile and execute both executables independently. 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Dockerfile (1)
38-53: 🧹 Nitpick | 🔵 TrivialConsider adding a HEALTHCHECK instruction for container orchestration.
The runtime stage correctly copies both binaries and sets the default entrypoint to
hub-api. For improved container health monitoring in orchestration environments (Kubernetes, Docker Swarm, ECS), consider adding aHEALTHCHECKinstruction.💡 Optional: Add HEALTHCHECK for container orchestration
EXPOSE 8080 +# Health check for container orchestration (adjust interval/timeout as needed) +HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ + CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1 + # Default: run hub-api. Override with command to run hub-worker: docker run ... hub-worker ENTRYPOINT ["/app/hub-api"]Note: This health check is primarily useful for
hub-api. If runninghub-worker, you may want to override or disable the health check since workers typically don't expose HTTP endpoints.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 38 - 53, Add a Docker HEALTHCHECK for the runtime image that probes the hub-api HTTP endpoint (e.g., localhost:8080/health or /healthz) so orchestrators can mark the container healthy; place the HEALTHCHECK after EXPOSE and before or after ENTRYPOINT, configure reasonable interval/timeout/start-period/retries, and document that the check should be disabled or overridden when running the hub-worker binary (ENTRYPOINT currently set to /app/hub-api) since workers do not expose the HTTP health endpoint.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Line 11: Add a single blank line immediately before the heading "## Build,
Test, and Development Commands" in AGENTS.md to satisfy MD022 (headings should
be surrounded by blank lines); simply insert one empty line above that heading
so it is separated from the preceding content.
In `@cmd/api/app.go`:
- Around line 172-192: The API duplicates worker and queue registrations that
hub-worker centralizes in NewRiverWorkersAndQueues; change
NewRiverWorkersAndQueues to accept a placeholderMaxWorkers int (or a config
struct) and return (river.Workers, map[string]river.QueueConfig), move
registrations for webhookSender/webhookWorker, EmbeddingsQueueName, etc. into
that shared function, then replace the duplicated block that creates apiWorkers,
webhookSender, webhookWorker and the queues map with a call to
NewRiverWorkersAndQueues(placeholderMaxWorkers) (use 1 for the API). Ensure the
new signature covers the webhook sender/worker and embedding queue logic so
apiWorkers and queues are populated from the shared helper and remove the
duplicated code in cmd/api/app.go.
In `@cmd/backfill-embeddings/main.go`:
- Around line 49-60: The cfg.Database.URL default from config.Load() makes the
DATABASE_URL check ineffective; detect and reject the test-only default in the
runtime binary by validating cfg.Database.URL after Load() and exiting on the
placeholder test URL. Update main.go to check the exact test-default value
returned by Load() (e.g., the in-memory/local test URL string) or call a config
helper that distinguishes test vs runtime, and if cfg.Database.URL equals that
test-default (or is still the placeholder), log an explicit error and return
exitFailure so production runs fail fast while keeping the default only for test
setup.
In `@cmd/worker/app.go`:
- Around line 59-75: The code creates a meterProvider but never constructs the
observability.Metrics wrapper or the webhookMetrics and embeddingMetrics, so
initialize the metrics wrapper right after NewMeterProvider succeeds (same place
as meterProvider is set) by building observability.Metrics (the package-level
metrics struct used to derive webhookMetrics and embeddingMetrics), assign
webhookMetrics and embeddingMetrics from it, and ensure you shut down the
metrics wrapper on errors (mirror the existing
observability.ShutdownMeterProvider usage when NewTracerProvider fails); use the
existing symbols meterProvider, tracerProvider, NewMeterProvider,
NewTracerProvider, observability.ShutdownMeterProvider, webhookMetrics and
embeddingMetrics to locate where to add the initialization and cleanup.
In `@cmd/worker/main.go`:
- Around line 27-40: After loading config in run(), validate the database URL
before calling database.NewPostgresPool: check cfg.Database.URL is non-empty and
not the runtime default value (compare to your project’s runtime default
constant or sentinel from the config package), and if it is empty/equals the
runtime default log a clear error via slog.Error and return exitFailure; place
this check immediately after config.Load() and before invoking
database.NewPostgresPool so the binary fails closed when the DB URL hasn’t been
explicitly configured.
In `@internal/config/config.go`:
- Around line 265-280: The validate() function is setting default values for
webhook fields; move those default assignments into applyDefaults() to separate
concerns: remove the blocks that set cfg.Webhook.HTTPTimeout, EnqueueMaxRetries,
EnqueueInitialBackoffMs, and EnqueueMaxBackoffMs from validate(), and add
equivalent defaulting logic in applyDefaults() that ensures
defaultWebhookHTTPTimeoutSec (15s) and the numeric defaults (3, 100, 2000) are
applied when fields are zero/negative; keep validate() focused only on checking
and returning errors for invalid values and reference the same cfg.Webhook field
names (HTTPTimeout, EnqueueMaxRetries, EnqueueInitialBackoffMs,
EnqueueMaxBackoffMs) when relocating the logic.
In `@internal/observability/provider.go`:
- Around line 88-94: Update the function comment for NewTracerProvider so it
references the correct config field path used in the code: change any mention of
cfg.OtelTracesExporter to cfg.Observability.TracesExporter and ensure the
sentence describes that it returns (nil, nil) when cfg is nil or
cfg.Observability.TracesExporter is empty; keep the nolint comment and current
behavior intact.
- Around line 33-39: The doc comment for NewMeterProvider is out of date: it
mentions cfg.OtelMetricsExporter but the function checks
cfg.Observability.MetricsExporter; update the comment to reference
cfg.Observability.MetricsExporter (and keep the same semantics about non-"otlp"
or empty values) so the comment matches the actual check in NewMeterProvider.
In `@Makefile`:
- Around line 112-122: The run-worker Makefile target currently fails if .env is
missing while run allows env vars; update run-worker to be consistent by
removing the hard-fail check or document the difference: either delete the if [
! -f .env ] ... exit 1 guard in the run-worker target so it behaves like run
(relying on environment variables), or add a clarifying comment above run-worker
explaining why DATABASE_URL must be in .env for the worker; refer to the
Makefile targets run-worker and run when making the change.
---
Outside diff comments:
In `@Dockerfile`:
- Around line 38-53: Add a Docker HEALTHCHECK for the runtime image that probes
the hub-api HTTP endpoint (e.g., localhost:8080/health or /healthz) so
orchestrators can mark the container healthy; place the HEALTHCHECK after EXPOSE
and before or after ENTRYPOINT, configure reasonable
interval/timeout/start-period/retries, and document that the check should be
disabled or overridden when running the hub-worker binary (ENTRYPOINT currently
set to /app/hub-api) since workers do not expose the HTTP health endpoint.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 90b954de-7331-4766-aaa6-37cbb11d1aa3
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (19)
.env.example.github/workflows/api-contract-tests.ymlAGENTS.mdDockerfileMakefilecmd/api/app.gocmd/api/main.gocmd/backfill-embeddings/main.gocmd/worker/app.gocmd/worker/main.gogo.modinternal/api/handlers/feedback_records_handler_test.gointernal/api/handlers/search_handler_test.gointernal/config/config.gointernal/config/config_test.gointernal/observability/provider.gointernal/workers/wiring.gotests/helpers.gotests/integration_test.go
- config: keep nested Config (cleanenv), add Embedding.GoogleCloudProject/Location and applyDefaults fallbacks - cmd/api: use service.NormalizeEmbeddingProvider and SupportedEmbeddingProviders; use setupEmbeddingSearchHandler with cfg.Embedding.*; add embeddings queue when embedding enabled - backfill-embeddings: use service.ValidateEmbeddingConfig and NewEmbeddingClient with cfg.Embedding; keep config.Load and DefaultDatabaseURL check; getEmbeddingProviderAndModel(cfg) - config_test: fix EmbeddingGoogleCloud* to cfg.Embedding.GoogleCloud* Made-with: Cursor
Made-with: Cursor
Suggested title:
feat: add hub-worker binary and refactor config to cleanenvWhat does this PR do?
cmd/worker): A separate binary that runs River job workers (webhook delivery, embeddings). The API server (hub-api) is now insert-only: it enqueues jobs but does not run workers; workers run only in hub-worker.apitohub-apiacross Makefile, Dockerfile, and CI (api-contract-tests workflow).env/env-defaulttags,config.Load()with optional.envfile (cleanenv supports.envinReadConfig). Removes manual env parsing and godotenv from entrypoints; backfill-embeddings now usesconfig.Load().DatabaseConfig.PoolConfig()ininternal/configto avoid repeating pool options at everydatabase.NewPostgresPoolcall (used in cmd/api, cmd/worker, backfill-embeddings, tests)..envwhen missing;make runonly starts hub-api. Addsmake run-worker,make build-api,make build-worker;make buildbuilds both binaries.hub-apiandhub-workerin one image; defaultENTRYPOINTruns hub-api; override with command to run hub-worker (e.g.docker run ... hub-worker).RiverConfig(job timeout, rescue stuck jobs, completed job retention, optional client ID) and env varsRIVER_JOB_TIMEOUT_SECONDS,RIVER_RESCUE_STUCK_JOBS_AFTER_SECONDS,RIVER_COMPLETED_JOB_RETENTION_SECONDS,RIVER_CLIENT_ID. Only the worker uses these; API remains insert-only. Documented in.env.example.httptest.NewRequestWithContext; varnamelen:h→handler; lll: line length).How should this be tested?
make build→bin/hub-apiandbin/hub-worker.make run(orgo run ./cmd/api). Ensure.envexists or setAPI_KEY,DATABASE_URL, etc. API should start; no workers run in this process.make run-worker(orgo run ./cmd/worker). RequiresDATABASE_URL; optional embedding/webhook and River env vars (see.env.example). Worker should start and process jobs enqueued by the API..env, setAPI_KEY,DATABASE_URL(and any other required vars) in the shell;make runandmake run-workershould still work (cleanenv falls back toReadEnvwhen.envis missing).docker build .then run the image (default = hub-api). Run worker with:docker run --env-file .env <image> /app/hub-worker.make testsandmake fmt(no lint issues)../bin/hub-api; ensure workflow passes if modified.Checklist
Required
make buildmake tests(integration tests intests/)make fmtandmake lint; no new warningsgit pull origin mainmigrations/with goose annotations and ranmake migrate-validateAppreciated
make testsor API contract workflow)docs/if changes were necessarymake tests-coveragefor meaningful logic changes