fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones by shrey150 · Pull Request #1807 · browserbase/stagehand

shrey150 · 2026-03-11T00:05:35Z

Summary

Fixes #1789 — supersedes #1805 with a more robust implementation.

Based on the original fix by @elliotllliu in #1805 — thank you for identifying and diagnosing this bug!

The fillForm tool was passing only field action strings to observe(), causing the LLM to hallucinate placeholder values (e.g. test@example.com) instead of using the actual value provided by the caller. This broke login forms, 2FA flows, search queries, and any workflow where specific values matter.

What this PR does

Uses a fillIndex counter to align fill results from observe() with caller-provided field values, correctly handling interleaved non-fill actions (e.g. click-to-focus)
Skips extra fills when observe() returns more fill actions than fields provided, with a warning log
Warns when observe() returns fewer fills than fields, so dropped values are visible in logs

Changes

File	Change
`packages/core/lib/v3/agent/tools/fillform.ts`	Override observe result arguments with caller-provided values using fillIndex counter; skip/warn on mismatches
`packages/core/tests/unit/fillform-value-override.test.ts`	Unit tests covering value override, interleaved actions, empty string, non-fill passthrough, and extra fills
`.changeset/fix-fillform-value-hallucination.md`	Patch changeset

Test plan

Unit test: hallucinated values are overridden with caller-provided values (fails on main, passes here)
Unit test: non-fill methods (click) are not affected
Unit test: interleaved non-fill actions don't misalign values (fails on main, passes here)
Unit test: empty string value: "" correctly overrides (fails on main, passes here)
Unit test: extra fills from observe are skipped with warning
Existing agent-execution-model tests pass

🤖 Generated with Claude Code

…ed arguments The fillForm tool was passing only field actions to observe(), causing the LLM to hallucinate placeholder values (e.g. "test@example.com") instead of using the actual values provided by the caller. This broke login forms, 2FA flows, and any workflow where specific values matter. Uses a fillIndex counter to correctly align fill results with field values, handling interleaved non-fill actions. Also uses !== undefined instead of truthiness check so empty string values (for clearing fields) work correctly. Fixes #1789 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

changeset-bot · 2026-03-11T00:05:40Z

🦋 Changeset detected

Latest commit: 86eba36

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages

Name	Type
@browserbasehq/stagehand	Patch
@browserbasehq/browse-cli	Patch
@browserbasehq/stagehand-evals	Patch
@browserbasehq/stagehand-server-v3	Patch
@browserbasehq/stagehand-server-v4	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

greptile-apps · 2026-03-11T00:08:14Z

Greptile Summary

This PR fixes a bug where the fillForm tool was passing only the action description strings (not the values) to observe(), causing the LLM to hallucinate placeholder values (e.g. test@example.com) that were then typed into form fields instead of the real caller-provided values. The fix introduces a fillIndex counter that sequentially maps each fill result returned by observe() to the corresponding entry in fields, overriding its arguments with the caller-provided value and correctly skipping over any non-fill results (like click-to-focus steps) the LLM may inject between fills.

Core fix in fillform.ts is correct and well-scoped; the fillIndex alignment logic handles interleaved non-fill actions, extra fills (skip + warn), and fewer fills (post-loop warn)
New unit tests cover the primary happy paths; a dedicated test explicitly asserting the "fewer fills" warning is missing
The fillIndex alignment invariant (fields correspond 1-to-1 with fill actions, non-fill results consume no slot) is non-obvious and would benefit from an inline comment per the project's readability guideline
res.arguments is mutated in-place before being passed to recordAgentReplayStep, meaning the replay record stores the overridden values rather than the original LLM-observed values — functionally correct for replay, but the original LLM output is silently lost

Confidence Score: 4/5

PR is safe to merge; the core bug fix is correct with good test coverage, and remaining notes are non-blocking style improvements.
The fillIndex logic correctly addresses the hallucination bug and handles all documented edge cases (interleaved actions, extra fills, fewer fills). The fix is minimal and well-isolated. Minor gaps: the alignment invariant lacks an explanatory comment, the "fewer fills" warning has no dedicated test assertion, and the in-place mutation of observeResults before replay recording is an implicit design choice worth making explicit.
packages/core/lib/v3/agent/tools/fillform.ts — review the in-place mutation of observeResults objects and consider adding an explanatory comment on the fillIndex invariant.

Important Files Changed

Filename	Overview
packages/core/lib/v3/agent/tools/fillform.ts	Adds `fillIndex` counter to map caller-provided field values onto `fill` results from `observe()`, overriding LLM-hallucinated arguments; also adds warnings for extra/missing fills. Minor concerns: the `fillIndex` alignment invariant is undocumented, and `res.arguments` is mutated in-place before being passed to `recordAgentReplayStep`.
packages/core/tests/unit/fillform-value-override.test.ts	New unit test file covering value override, interleaved actions, empty string, non-fill passthrough, and extra fills. Missing a dedicated test that explicitly asserts the "fewer fills than provided fields" warning path.
.changeset/fix-fillform-value-hallucination.md	Patch changeset entry for the bugfix; no issues.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant fillFormTool
    participant observe
    participant act
    participant recordAgentReplayStep

    Caller->>fillFormTool: execute({ fields: [{action, value}, ...] })
    Note over fillFormTool: Build instruction from action strings only<br/>(values intentionally excluded to avoid LLM override)
    fillFormTool->>observe: observe(instruction)
    observe-->>fillFormTool: observeResults (may contain hallucinated fill values)

    Note over fillFormTool: fillIndex = 0
    loop for each res in observeResults
        alt res.method === "fill"
            alt fillIndex < fields.length
                Note over fillFormTool: Override res.arguments with<br/>fields[fillIndex].value (caller-provided)
                Note over fillFormTool: fillIndex++
                fillFormTool->>act: act(res with real value)
                act-->>fillFormTool: actResult
            else fillIndex >= fields.length
                Note over fillFormTool: WARN: more fills than fields → skip (continue)
            end
        else non-fill (click, etc.)
            fillFormTool->>act: act(res unchanged)
            act-->>fillFormTool: actResult
        end
    end

    alt fillIndex < fields.length
        Note over fillFormTool: WARN: fewer fills than fields provided
    end

    fillFormTool->>recordAgentReplayStep: record(observeResults with overridden args)
    fillFormTool-->>Caller: { success, actions, playwrightArguments }

Comments Outside Diff (3)

packages/core/lib/v3/agent/tools/fillform.ts, line 59-73 (link)

Add a comment explaining the fillIndex alignment invariant

The fillIndex counter relies on a non-obvious contract between fields and observeResults: every entry in fields is assumed to correspond to a fill action from observe(), and any non-fill results returned by observe() (e.g. click-to-focus) are treated as interstitial helper steps that do not consume a field slot. Without a comment, future readers could easily mistake this counter for a simple loop index or wonder why non-fill methods don't increment it.

Per the project's commenting guideline, a brief explanatory comment here would significantly aid readability:

Rule Used: Add detailed comments for longer PRs that aren't o... (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

packages/core/tests/unit/fillform-value-override.test.ts, line 158-190 (link)

Missing dedicated test for the "fewer fills" warning

The PR description explicitly calls out "Warns when observe() returns fewer fills than fields, so dropped values are visible in logs" as a shipped feature, and the corresponding post-loop warning is implemented in fillform.ts (lines 84–90). However, no test in the suite explicitly asserts this warning path.

The closest existing coverage is the "does not override non-fill methods" test which incidentally triggers the warning (observe returns a click but no fill, so fillIndex (0) < fields.length (1)), but that test doesn't assert on the logger call and is primarily verifying unrelated behaviour.

Consider adding a dedicated test case:

it("warns when observe returns fewer fill actions than provided fields", async () => {
  const v3 = createMockV3([
    { method: "fill", arguments: ["hal1"], description: "email" },
    // observe only returns one fill, but two fields were provided
  ]);

  const tool = fillFormTool(v3);
  await tool.execute!(
    {
      fields: [
        { action: "type email", value: "real@email.com" },
        { action: "type password", value: "realPass" },
      ],
    },
    toolCtx,
  );

  expect(v3.actCalls).toHaveLength(1);
  expect(v3.logger).toHaveBeenCalledWith(
    expect.objectContaining({
      category: "agent",
      message: expect.stringContaining("fewer fill actions"),
    }),
  );
});

packages/core/lib/v3/agent/tools/fillform.ts, line 63 (link)

In-place mutation of observeResults objects affects replay fidelity

res.arguments is mutated directly on the object returned by observe(). Because observeResults is later passed by reference to recordAgentReplayStep, the replay step will record the caller-provided values rather than what the LLM originally observed. For replay this is the right data (you want to replay what was actually typed), but it silently destroys the original LLM output — making it impossible to later diff "what the LLM hallucinated" vs "what was actually used" from the recorded step.

Consider spreading to avoid the side-effect if observability of the original LLM suggestions is important:

Or alternatively assign to a local copy before the loop and pass that to act, preserving the original observeResults for the replay record. This depends on whether the replay consumer expects the raw or overridden values.

_{Last reviewed commit: 86eba36}

packages/core/tests/unit/fillform-value-override.test.ts

packages/core/lib/v3/agent/tools/fillform.ts

cubic-dev-ai

No issues found across 3 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant C as Caller / User
    participant FT as fillForm Tool
    participant V3 as Agent Core (V3)
    participant LLM as LLM (Observer)
    participant B as Browser / DOM

    C->>FT: execute(fields: {action, value}[])
    
    Note over FT,V3: Prepare observation strings from fields[].action
    FT->>V3: observe(actionStrings)
    V3->>LLM: Analyze DOM + actions
    LLM-->>V3: Return Action list (e.g. click, fill)
    Note right of LLM: Hallucination Risk:<br/>LLM provides dummy values<br/>for 'fill' arguments.
    V3-->>FT: observeResults[]

    loop For each result in observeResults
        alt NEW: result.method === "fill"
            FT->>FT: CHANGED: Map result to fields[fillIndex]
            opt fields[fillIndex].value !== undefined
                FT->>FT: NEW: Override hallucinated arg<br/>with caller-provided value
            end
            FT->>FT: NEW: Increment fillIndex
        else Non-fill method (e.g. click)
            FT->>FT: Keep original arguments
        end

        FT->>V3: act(method, correctedArgs)
        V3->>B: Perform CDP/Playwright action
        B-->>V3: Success/Failure
        V3-->>FT: Action result
    end

    FT-->>C: Returns completed actions

When observe() returns more fill actions than fields provided (e.g. LLM invents a "confirm password" fill), skip the extra fills instead of silently typing hallucinated values. Logs a warning so callers know. Also renames misleading test name per review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shrey150 · 2026-03-11T00:23:41Z

@greptileai

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

packages/core/lib/v3/agent/tools/fillform.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shrey150 · 2026-03-11T00:41:05Z

@greptileai

packages/core/lib/v3/agent/tools/fillform.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shrey150 · 2026-03-11T00:55:07Z

@greptileai

shrey150 · 2026-03-11T04:47:44Z

@greptileai

tkattkat · 2026-03-19T18:00:31Z

packages/core/lib/v3/agent/tools/fillform.ts

        for (const res of observeResults) {
+          if (res.method === "fill") {
+            if (fillIndex < fields.length) {
+              res.arguments = [fields[fillIndex].value];
+              fillIndex++;
+            } else {
+              v3.logger({
+                category: "agent",
+                message: `fillForm: observe returned more fill actions than provided fields (${fields.length}); skipping extra fill`,
+                level: 1,
+              });
+              continue;
+            }
+          }


This assumes agent will provide the values in same order as the observe results are returned in .This likely will not be consistently true.

greptile-apps bot reviewed Mar 11, 2026

View reviewed changes

packages/core/tests/unit/fillform-value-override.test.ts Outdated Show resolved Hide resolved

packages/core/lib/v3/agent/tools/fillform.ts Show resolved Hide resolved

cubic-dev-ai bot reviewed Mar 11, 2026

View reviewed changes

shrey150 and others added 4 commits March 10, 2026 17:24

style: format test file with prettier

b223c55

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: simplify test file comment to match repo conventions

8fcaa8e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: drop issue ref from test describe block

c36ccd0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: replace PII in test fixtures with generic values

61f963b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps bot reviewed Mar 11, 2026

View reviewed changes

packages/core/lib/v3/agent/tools/fillform.ts Outdated Show resolved Hide resolved

packages/core/lib/v3/agent/tools/fillform.ts Outdated Show resolved Hide resolved

refactor: remove redundant undefined guard on required field value

118fc02

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps bot reviewed Mar 11, 2026

View reviewed changes

packages/core/lib/v3/agent/tools/fillform.ts Show resolved Hide resolved

fix: warn when observe returns fewer fills than provided fields

86eba36

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shrey150 changed the title ~~fix: use caller-provided values in fillForm instead of LLM-hallucinated arguments~~ fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones Mar 11, 2026

tkattkat requested changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones#1807

fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones#1807
shrey150 wants to merge 8 commits intomainfrom
fix/fillform-value-hallucination

shrey150 commented Mar 11, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 11, 2026 •

edited

Loading

Comments Outside Diff (3)

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

tkattkat Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shrey150 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR does

Changes

Test plan

Uh oh!

changeset-bot bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

greptile-apps bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

shrey150 commented Mar 11, 2026

Uh oh!

tkattkat Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shrey150 commented Mar 11, 2026 •

edited

Loading

changeset-bot bot commented Mar 11, 2026 •

edited

Loading

greptile-apps bot commented Mar 11, 2026 •

edited

Loading