feat: add legacy /chat/completions support#1804
Conversation
Thread modelBaseURL from x-model-base-url header through to V3 options, enabling providers like ZhipuAI, Ollama, and other OpenAI-compatible endpoints. Uses Chat Completions API (not Responses API) when a custom baseURL is set, and adds robust response coercion for models without native structured output support.
Adds "chatcompletions" as a generic provider that uses the Chat Completions API (/chat/completions) instead of the Responses API, for endpoints like ZhipuAI and Ollama. Also simplifies response coercion for models without native structured output support.
🦋 Changeset detectedLatest commit: e19a7e0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 5 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
1 issue found across 8 files
Confidence score: 2/5
- There is a high-confidence regression risk in
packages/core/lib/v3/llm/LLMProvider.ts: thechatcompletions→.chat()mapping is only applied in thehasValidOptionspath, so behavior diverges between configured and default client flows. - When
clientOptionsare absent, theelsebranch callsprovider(subModelName)on the defaultopenaiinstance, which can routechatcompletionsmodels incorrectly and cause user-facing failures in common usage. - Pay close attention to
packages/core/lib/v3/llm/LLMProvider.ts- ensure model normalization/dispatch is consistent in both branches so default and custom options behave the same.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/llm/LLMProvider.ts">
<violation number="1" location="packages/core/lib/v3/llm/LLMProvider.ts:53">
P1: The `chatcompletions` → `.chat()` handling only exists in the `hasValidOptions` branch. When no `clientOptions` are provided, the `else` branch calls `provider(subModelName)` on the default `openai` instance, which uses the Responses API — silently defeating the purpose of this provider.
Add the same `.chat()` handling in the `else` branch so `chatcompletions/model-name` always uses the Chat Completions API regardless of whether client options are present.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Client
participant Server as Server (Fastify)
participant Store as Session Store
participant Prov as LLM Provider (Core)
participant SDK as AI SDK Wrapper
participant LLM as External LLM API
Note over Client, LLM: Runtime flow for Model Base URL & Chat Completions Support
Client->>Server: Request (Header: x-model-base-url, Body: provider/model)
Server->>Server: NEW: getModelBaseURL()
Note right of Server: Checks body.options.model.baseURL <br/>OR x-model-base-url header
Server->>Store: createSession(modelBaseURL, apiKey, ...)
Store->>Prov: getAISDKLanguageModel(provider, model, baseURL)
alt NEW: Provider prefix is "chatcompletions/"
Prov->>Prov: Map to OpenAI provider instance
Prov->>Prov: NEW: Force .chat() method (bypasses /responses)
else Standard Provider
Prov->>Prov: Initialize standard AI SDK provider
end
Prov-->>Store: LanguageModel instance (with baseURL)
Store->>SDK: generateObject(schema, options)
alt NEW: Model requires Prompt JSON Fallback
SDK->>LLM: generateObject(output: "no-schema")
LLM-->>SDK: Raw JSON String / Partial Object
SDK->>SDK: NEW: Coerce stringified fields (e.g., "[]" to [])
alt Schema Validation Fails
SDK->>SDK: NEW: Heuristic fix (default missing arrays to [])
SDK->>SDK: safeParse() retry
end
else Native Structured Output
SDK->>LLM: generateObject(schema: ZodSchema)
LLM-->>SDK: Structured Data
end
SDK-->>Store: Validated Object
Store-->>Server: Session Result
Server-->>Client: 200 OK / Stream Response
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Greptile SummaryThis PR adds support for OpenAI-compatible providers that only expose the Key implementation areas:
Confidence Score: 3/5
Sequence DiagramsequenceDiagram
participant Client as SDK Client
participant Server as server-v3
participant Header as header.ts
participant Store as InMemorySessionStore
participant Core as LLMProvider
participant AISDK as AI SDK
Client->>Server: POST /v1/sessions/start<br/>x-model-base-url header<br/>modelName: chatcompletions/glm-4-flash
Server->>Header: getModelBaseURL(request)
Header-->>Server: baseURL value
Server->>Header: getModelApiKey(request)
Header-->>Server: apiKey value
Server->>Store: getOrCreateStagehand(sessionId, ctx)
Store->>Core: getAISDKLanguageModel("chatcompletions", "glm-4-flash", clientOptions)
Note over Core: hasValidOptions = true<br/>(baseURL or apiKey present)
Core->>AISDK: createOpenAI({ baseURL, apiKey })
AISDK-->>Core: provider instance
Core->>AISDK: provider.chat("glm-4-flash")
Note over AISDK: Targets /chat/completions<br/>instead of /responses
AISDK-->>Core: LanguageModelV2
Core-->>Store: AISdkClient
Store-->>Server: V3 instance
Server-->>Client: sessionId + cdpUrl
|
packages/core/lib/v3/llm/aisdk.ts
Outdated
| for (const issue of firstTry.error.issues) { | ||
| if ( | ||
| issue.code === "invalid_type" && | ||
| issue.expected === "array" && | ||
| issue.path.length === 1 | ||
| ) { | ||
| raw[issue.path[0] as string] = []; | ||
| } | ||
| } | ||
| parsed = options.response_model.schema.parse(raw); |
There was a problem hiding this comment.
Second parse() call can throw an untyped ZodError
After the array-field defaulting loop, options.response_model.schema.parse(raw) is called without a try/catch. If the response still fails validation for any reason other than a missing top-level array field (e.g., a nested object type mismatch, an extra required field), a raw ZodError is thrown. That error is caught by the outer catch (err) block, but that block only checks for NoObjectGeneratedError.isInstance(err) — a ZodError will just be re-thrown without the special logging context.
Consider wrapping this in a try/catch that converts ZodError into something more informative, or using .safeParse() again and surfacing the issues clearly:
| for (const issue of firstTry.error.issues) { | |
| if ( | |
| issue.code === "invalid_type" && | |
| issue.expected === "array" && | |
| issue.path.length === 1 | |
| ) { | |
| raw[issue.path[0] as string] = []; | |
| } | |
| } | |
| parsed = options.response_model.schema.parse(raw); | |
| const secondTry = options.response_model.schema.safeParse(raw); | |
| if (!secondTry.success) { | |
| throw new Error( | |
| `Model response could not be coerced into the expected schema: ${secondTry.error.message}`, | |
| ); | |
| } | |
| parsed = secondTry.data; |
Try structured output (schema:) first for all models. Only fall back to no-schema + response coercion when the call fails and the model matches a known fallback pattern. This avoids degrading DeepSeek/Kimi which already work with schema:.
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/llm/aisdk.ts">
<violation number="1" location="packages/core/lib/v3/llm/aisdk.ts:179">
P1: Models in `PROMPT_JSON_FALLBACK_PATTERNS` (deepseek, kimi, glm) will now always make a wasted API call that fails before falling back to no-schema mode. Previously these models skipped straight to the no-schema path. This doubles latency and cost for every extract call on these providers.
Consider keeping the original structure where `needsPromptJsonFallback` is checked *before* the first call, and only use the try-then-fallback pattern for models that are *not* in the known fallback list (i.e., the `chatcompletions/` prefix models that aren't predictable from the model ID).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
packages/core/lib/v3/llm/aisdk.ts
Outdated
| // Try structured output first. If the provider doesn't support | ||
| // response_format (e.g. chatcompletions/ endpoints), this will throw | ||
| // and we fall back to no-schema mode with response coercion below. | ||
| objectResponse = await generateObject({ |
There was a problem hiding this comment.
P1: Models in PROMPT_JSON_FALLBACK_PATTERNS (deepseek, kimi, glm) will now always make a wasted API call that fails before falling back to no-schema mode. Previously these models skipped straight to the no-schema path. This doubles latency and cost for every extract call on these providers.
Consider keeping the original structure where needsPromptJsonFallback is checked before the first call, and only use the try-then-fallback pattern for models that are not in the known fallback list (i.e., the chatcompletions/ prefix models that aren't predictable from the model ID).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/llm/aisdk.ts, line 179:
<comment>Models in `PROMPT_JSON_FALLBACK_PATTERNS` (deepseek, kimi, glm) will now always make a wasted API call that fails before falling back to no-schema mode. Previously these models skipped straight to the no-schema path. This doubles latency and cost for every extract call on these providers.
Consider keeping the original structure where `needsPromptJsonFallback` is checked *before* the first call, and only use the try-then-fallback pattern for models that are *not* in the known fallback list (i.e., the `chatcompletions/` prefix models that aren't predictable from the model ID).</comment>
<file context>
@@ -173,19 +173,39 @@ You must respond in JSON format. respond WITH JSON. Do not include any other tex
+ // Try structured output first. If the provider doesn't support
+ // response_format (e.g. chatcompletions/ endpoints), this will throw
+ // and we fall back to no-schema mode with response coercion below.
+ objectResponse = await generateObject({
+ model: this.model,
+ messages: formattedMessages,
</file context>
- Skip schema attempt for chatcompletions/ models (provider: openai.chat) since they can't do structured output — avoids a wasted LLM call per extract - Unify .chat() handling in getAISDKLanguageModel so chatcompletions/ works regardless of whether clientOptions are provided - Guard second schema.parse() with safeParse + descriptive error message
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/llm/aisdk.ts">
<violation number="1" location="packages/core/lib/v3/llm/aisdk.ts:291">
P1: Custom agent: **Exception and error message sanitization**
Generic `new Error()` with unsanitized Zod error message that may reflect sensitive prompt data back to the caller. Per the error-sanitization rule, use a typed error class and strip or redact the raw Zod message (which can contain actual field values from the model response).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| // 4. Validate against schema | ||
| const secondTry = options.response_model.schema.safeParse(raw); | ||
| if (!secondTry.success) { | ||
| throw new Error( |
There was a problem hiding this comment.
P1: Custom agent: Exception and error message sanitization
Generic new Error() with unsanitized Zod error message that may reflect sensitive prompt data back to the caller. Per the error-sanitization rule, use a typed error class and strip or redact the raw Zod message (which can contain actual field values from the model response).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/llm/aisdk.ts, line 291:
<comment>Generic `new Error()` with unsanitized Zod error message that may reflect sensitive prompt data back to the caller. Per the error-sanitization rule, use a typed error class and strip or redact the raw Zod message (which can contain actual field values from the model response).</comment>
<file context>
@@ -172,115 +172,129 @@ You must respond in JSON format. respond WITH JSON. Do not include any other tex
+ // 4. Validate against schema
+ const secondTry = options.response_model.schema.safeParse(raw);
+ if (!secondTry.success) {
+ throw new Error(
+ `Model response could not be coerced into the expected schema: ${secondTry.error.message}`,
+ );
</file context>
…PI specs, and stainless config
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ stagehand-typescript studio · code · diff
✅ stagehand-openapi studio · code · diff
⚡ stagehand-ruby studio · conflict
✅ stagehand-php studio · code · diff
✅ stagehand-go studio · code · diff
⚡ stagehand-kotlin studio · conflict
⚡ stagehand-java studio · conflict
⚡ stagehand-python studio · conflict
✅ stagehand-csharp studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
why
providers that only support /chat/completions are not supported
what changed
fallback-pattern models
sister python PR here: browserbase/stagehand-python#318
test plan
tested locally with ZhipuAI glm-4-flash: observe, act, extract, and agent execute all pass