Skip to content

fix(build): predictable error handling + transient-read retry for build logs#104

Open
kristof-siket wants to merge 1 commit into
mainfrom
feat/build-logs-retry-auth
Open

fix(build): predictable error handling + transient-read retry for build logs#104
kristof-siket wants to merge 1 commit into
mainfrom
feat/build-logs-retry-auth

Conversation

@kristof-siket

Copy link
Copy Markdown
Contributor

What

Two fixes to the build logs <build_id> command's failure behavior, stacked on top of #102 (now merged to main).

1. Predictable error handling — 401 maps to the shared auth error

The command streams GET /v1/builds/{buildId}/logs through openapi-fetch, which does not throw on non-2xx — a 401 arrives as response.status === 401. The command-runner's SDKAuthError → authRequiredError mapping therefore never fired for this path, so an expired/invalid CLI session surfaced as a generic BUILD_LOGS_FAILED ("Failed to read build logs… / Retry").

A 401 now maps to the shared authRequiredError(["prisma-cli auth login"]), so the output is consistent with the rest of the CLI and tells the user what to actually do. 404 → BUILD_NOT_FOUND and the generic fallback are unchanged; 403 is intentionally not special-cased (the API uses 404-collapse for unauthorized access).

2. Bounded retry honoring the API's retryable signal, resuming from cursor

The read (stream open + NDJSON consumption) now runs inside a bounded retry loop:

  • 3 attempts max, backing off ~500ms then ~1500ms, abort-aware (a cancel during the wait returns cleanly, not as a failure).
  • Retries on: a transient open status (408, 429, 500, 502, 503, 504), a non-Abort network-style error from the GET/stream read, or the Management API's existing terminal error record with retryable: true.
  • Surfaces immediately (no retry): 401 → authRequiredError, 404 → BUILD_NOT_FOUND, any other non-transient status → BUILD_LOGS_FAILED, or a terminal error with retryable: false.
  • Resumes from the last cursor (tracked across log records and the terminal record) on every reconnect, so resumed reads don't reprint already-emitted output.
  • On exhaustion after a retryable failure, behavior is unchanged from before: the terminal error message prints to stderr and process.exitCode = 1 (terminal-record path), or the generic CliError is thrown (transient-open / network path).
  • --follow keeps working: a retryable drop reconnects from cursor through the same bounded loop (the attempt count stays bounded — follow does not retry unbounded).

This fixes the intermittent 408-from-Durable-Streams "Failed to read build logs." seen right after a build completes.

Trace evidence

A real failure showed the two-failure pattern this PR targets:

  1. First attempt: 401 — the CLI JWT had expired. (Now → actionable authRequiredError instead of a generic read failure.)
  2. Second attempt: the route returned 200, but the streams server (GET …/v1/stream/build-logs) returned HTTP 408 after ~5.2s. (Now → a retryable transient that the loop reconnects from the last cursor.)
  3. Third attempt: succeeded.

Tests

New packages/cli/tests/build-logs-controller.test.ts (deterministic, backoff injected as zeros — no real timers):

  • 401 → authRequiredError (asserts AUTH_REQUIRED, not BUILD_LOGS_FAILED).
  • Retryable terminal error on attempt 1, success on attempt 2 → succeeds, prints both log batches, and the second read carries the resumed cursor.
  • Retryable failure on all 3 attempts → process.exitCode = 1, message surfaced exactly once.
  • Transient open status (503) then success → retried.
  • --follow drop → reconnects from cursor with follow=true preserved.
  • 404 and non-retryable terminal end/no_logsnot retried (unchanged).

pnpm --filter @prisma/cli test (562 tests) green; tsc --noEmit, tsdown build, and biome check all clean.

Notes

🤖 Generated with Claude Code

…ld logs

`build logs` now maps a 401 from the streaming endpoint to the shared
authRequiredError ("run prisma-cli auth login") instead of a generic
"Failed to read build logs" — the SDK returns the 401 as a response, so
the command-runner's SDKAuthError mapping never fired for this path.

The read (open + NDJSON consumption) now runs inside a bounded retry
loop: at most 3 attempts, backing off ~500ms then ~1500ms, abort-aware.
It retries a transient open status (408/429/5xx), a non-Abort network
error, or the Management API's existing retryable terminal error record,
resuming each reconnect from the last cursor so output isn't reprinted.
401, 404, other non-transient statuses, and non-retryable terminals are
surfaced immediately. --follow reconnects through the same bounded loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kristof-siket

Copy link
Copy Markdown
Contributor Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Summary by CodeRabbit

  • Bug Fixes

    • Improved build log streaming so it can reconnect after temporary network or server issues without duplicating output.
    • Preserved the latest log position during retries, helping follow-mode stay in sync after interruptions.
    • Made error messages clearer when logs can’t be read after multiple attempts.
  • Tests

    • Added coverage for authentication errors, missing builds, transient failures, retry exhaustion, and reconnect behavior.

Walkthrough

runBuildLogs in packages/cli/src/controllers/build.ts is refactored from a single-shot stream read into a retryable loop. It gains an injectable BuildLogsDeps parameter (carrying backoffMs), a ReadOutcome discriminated union classifying open and consume results, cursor-based resumption across reconnects, cancellable backoff via a new sleep helper, and outcome-based retry/exhaustion logic including process.exitCode assignment. buildLogsRequestError's why message is adjusted for HTTP vs non-HTTP failures. A new Vitest suite covering all retry/no-retry scenarios is added.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: build log error handling and transient-read retries.
Description check ✅ Passed The description is directly related to the build logs retry and auth-handling changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/build-logs-retry-auth
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/build-logs-retry-auth

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/controllers/build.ts`:
- Around line 251-274: The build flow in build() currently lets terminal errors
with retryable false fall through to writeBuildLogRecord() and then return done,
which can incorrectly succeed. Update the terminal-record handling branch in the
record-processing loop to detect non-retryable terminal errors (using
record.type, record.kind, and record.retryable) and immediately fail via a
distinct fatal-terminal outcome or by setting process.exitCode = 1, while
keeping retryable terminals in the existing retryable-terminal path.

In `@packages/cli/tests/build-logs-controller.test.ts`:
- Around line 87-212: The build logs controller tests are missing coverage for
two retry paths in the controller logic: a rejected GET that is not an
AbortError, and an AbortError raised during the retry backoff. Add one focused
test for each branch in build-logs-controller.test.ts, using runWithClient and
the existing get stub to drive the specific failure mode and assert the expected
retry/exit behavior so regressions in the controller’s retry handling are
caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6b386456-f543-43a0-96fd-4b8d48a6b376

📥 Commits

Reviewing files that changed from the base of the PR and between 0d559ab and 912d9d3.

📒 Files selected for processing (2)
  • packages/cli/src/controllers/build.ts
  • packages/cli/tests/build-logs-controller.test.ts

Comment on lines +251 to +274
if (
record.type === "terminal" &&
record.kind === "error" &&
record.retryable
) {
retryableTerminal = record;
return;
}
writeBuildLogRecord(context, record);
});
} catch (error) {
if (isAbortError(error) || context.runtime.signal.aborted) {
throw error;
}
writeBuildLogRecord(context, record);
});
return { outcome: { kind: "retryable-network" }, cursor: latestCursor };
}

if (sawError) {
process.exitCode = 1;
if (retryableTerminal) {
return {
outcome: { kind: "retryable-terminal", record: retryableTerminal },
cursor: latestCursor,
};
}
return { outcome: { kind: "done" }, cursor: latestCursor };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make non-retryable terminal errors fail the command.

A terminal { kind: "error", retryable: false } falls through to writeBuildLogRecord() and then returns done, so the command can exit 0 after an error terminal. Return a distinct fatal-terminal outcome or set process.exitCode = 1 for non-retryable terminal errors. As per PR objectives, “Non-transient cases like 404 and non-retryable terminal errors remain immediate failures.” <pr_objectives>

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/controllers/build.ts` around lines 251 - 274, The build flow
in build() currently lets terminal errors with retryable false fall through to
writeBuildLogRecord() and then return done, which can incorrectly succeed.
Update the terminal-record handling branch in the record-processing loop to
detect non-retryable terminal errors (using record.type, record.kind, and
record.retryable) and immediately fail via a distinct fatal-terminal outcome or
by setting process.exitCode = 1, while keeping retryable terminals in the
existing retryable-terminal path.

Comment on lines +87 to +212
describe("build logs controller", () => {
it("maps a 401 to the shared auth-required error", async () => {
const get = vi.fn().mockResolvedValue(errorResult(401));
const { run } = await runWithClient(get);

await expect(run).rejects.toMatchObject({
code: "AUTH_REQUIRED",
domain: "auth",
});
expect(get).toHaveBeenCalledTimes(1);
});

it("retries a retryable terminal error and resumes from the cursor", async () => {
const get = vi
.fn()
.mockResolvedValueOnce(
streamResult([logLine("first", "c1"), retryableTerminal("c1")]),
)
.mockResolvedValueOnce(
streamResult([
logLine("second", "c2"),
{
type: "terminal",
kind: "end",
code: "end",
retryable: false,
cursor: "c2",
message: "",
},
]),
);
const { run, stdout } = await runWithClient(get);

await run;

expect(process.exitCode).toBeUndefined();
expect(stdout.buffer).toContain("first");
expect(stdout.buffer).toContain("second");
expect(get).toHaveBeenCalledTimes(2);
expect(queryOf(get.mock.calls[0])).not.toHaveProperty("cursor");
expect(queryOf(get.mock.calls[1])).toMatchObject({ cursor: "c1" });
});

it("exits non-zero and surfaces the message once when every attempt fails", async () => {
const get = vi
.fn()
.mockImplementation(async () => streamResult([retryableTerminal("c1")]));
const { run, stderr } = await runWithClient(get);

await run;

expect(process.exitCode).toBe(1);
expect(get).toHaveBeenCalledTimes(3);
const occurrences =
stderr.buffer.split("Failed to read build logs.").length - 1;
expect(occurrences).toBe(1);
});

it("reconnects a dropped --follow stream from the cursor", async () => {
const get = vi
.fn()
.mockResolvedValueOnce(
streamResult([logLine("a", "c1"), retryableTerminal("c1")]),
)
.mockResolvedValueOnce(streamResult([logLine("b", "c2")]));
const { run, stdout } = await runWithClient(get, { follow: true });

await run;

expect(process.exitCode).toBeUndefined();
expect(get).toHaveBeenCalledTimes(2);
expect(queryOf(get.mock.calls[0])).toMatchObject({ follow: "true" });
expect(queryOf(get.mock.calls[1])).toMatchObject({
follow: "true",
cursor: "c1",
});
expect(stdout.buffer).toContain("a");
expect(stdout.buffer).toContain("b");
});

it("retries a transient open status and then succeeds", async () => {
const get = vi
.fn()
.mockResolvedValueOnce(errorResult(503))
.mockResolvedValueOnce(streamResult([logLine("ok", "c1")]));
const { run, stdout } = await runWithClient(get);

await run;

expect(process.exitCode).toBeUndefined();
expect(get).toHaveBeenCalledTimes(2);
expect(stdout.buffer).toContain("ok");
});

it("does not retry a 404 and surfaces BUILD_NOT_FOUND", async () => {
const get = vi.fn().mockResolvedValue(errorResult(404));
const { run } = await runWithClient(get);

await expect(run).rejects.toMatchObject({ code: "BUILD_NOT_FOUND" });
expect(get).toHaveBeenCalledTimes(1);
});

it("does not retry a non-retryable terminal end", async () => {
const get = vi.fn().mockResolvedValue(
streamResult([
logLine("only", "c1"),
{
type: "terminal",
kind: "end",
code: "no_logs",
retryable: false,
cursor: "c1",
message: "No logs were produced.",
},
]),
);
const { run, stdout, stderr } = await runWithClient(get);

await run;

expect(process.exitCode).toBeUndefined();
expect(get).toHaveBeenCalledTimes(1);
expect(stdout.buffer).toContain("only");
expect(stderr.buffer).toContain("No logs were produced.");
});
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add coverage for the two untested retry branches.

This suite still never drives a rejected GET (non-AbortError) or an abort during the retry backoff, even though the controller now has dedicated logic for both. A regression in either path would currently pass unnoticed; please add one test for each branch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/tests/build-logs-controller.test.ts` around lines 87 - 212, The
build logs controller tests are missing coverage for two retry paths in the
controller logic: a rejected GET that is not an AbortError, and an AbortError
raised during the retry backoff. Add one focused test for each branch in
build-logs-controller.test.ts, using runWithClient and the existing get stub to
drive the specific failure mode and assert the expected retry/exit behavior so
regressions in the controller’s retry handling are caught.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant