Skip to content

fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones#1807

Open
shrey150 wants to merge 8 commits intomainfrom
fix/fillform-value-hallucination
Open

fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones#1807
shrey150 wants to merge 8 commits intomainfrom
fix/fillform-value-hallucination

Conversation

@shrey150
Copy link
Contributor

@shrey150 shrey150 commented Mar 11, 2026

Summary

Fixes #1789 — supersedes #1805 with a more robust implementation.

Based on the original fix by @elliotllliu in #1805 — thank you for identifying and diagnosing this bug!

The fillForm tool was passing only field action strings to observe(), causing the LLM to hallucinate placeholder values (e.g. test@example.com) instead of using the actual value provided by the caller. This broke login forms, 2FA flows, search queries, and any workflow where specific values matter.

What this PR does

  • Uses a fillIndex counter to align fill results from observe() with caller-provided field values, correctly handling interleaved non-fill actions (e.g. click-to-focus)
  • Skips extra fills when observe() returns more fill actions than fields provided, with a warning log
  • Warns when observe() returns fewer fills than fields, so dropped values are visible in logs

Changes

File Change
packages/core/lib/v3/agent/tools/fillform.ts Override observe result arguments with caller-provided values using fillIndex counter; skip/warn on mismatches
packages/core/tests/unit/fillform-value-override.test.ts Unit tests covering value override, interleaved actions, empty string, non-fill passthrough, and extra fills
.changeset/fix-fillform-value-hallucination.md Patch changeset

Test plan

  • Unit test: hallucinated values are overridden with caller-provided values (fails on main, passes here)
  • Unit test: non-fill methods (click) are not affected
  • Unit test: interleaved non-fill actions don't misalign values (fails on main, passes here)
  • Unit test: empty string value: "" correctly overrides (fails on main, passes here)
  • Unit test: extra fills from observe are skipped with warning
  • Existing agent-execution-model tests pass

🤖 Generated with Claude Code

…ed arguments

The fillForm tool was passing only field actions to observe(), causing the
LLM to hallucinate placeholder values (e.g. "test@example.com") instead of
using the actual values provided by the caller. This broke login forms, 2FA
flows, and any workflow where specific values matter.

Uses a fillIndex counter to correctly align fill results with field values,
handling interleaved non-fill actions. Also uses !== undefined instead of
truthiness check so empty string values (for clearing fields) work correctly.

Fixes #1789

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@changeset-bot
Copy link

changeset-bot bot commented Mar 11, 2026

🦋 Changeset detected

Latest commit: 86eba36

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/browse-cli Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch
@browserbasehq/stagehand-server-v4 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes a bug where the fillForm tool was passing only the action description strings (not the values) to observe(), causing the LLM to hallucinate placeholder values (e.g. test@example.com) that were then typed into form fields instead of the real caller-provided values. The fix introduces a fillIndex counter that sequentially maps each fill result returned by observe() to the corresponding entry in fields, overriding its arguments with the caller-provided value and correctly skipping over any non-fill results (like click-to-focus steps) the LLM may inject between fills.

  • Core fix in fillform.ts is correct and well-scoped; the fillIndex alignment logic handles interleaved non-fill actions, extra fills (skip + warn), and fewer fills (post-loop warn)
  • New unit tests cover the primary happy paths; a dedicated test explicitly asserting the "fewer fills" warning is missing
  • The fillIndex alignment invariant (fields correspond 1-to-1 with fill actions, non-fill results consume no slot) is non-obvious and would benefit from an inline comment per the project's readability guideline
  • res.arguments is mutated in-place before being passed to recordAgentReplayStep, meaning the replay record stores the overridden values rather than the original LLM-observed values — functionally correct for replay, but the original LLM output is silently lost

Confidence Score: 4/5

  • PR is safe to merge; the core bug fix is correct with good test coverage, and remaining notes are non-blocking style improvements.
  • The fillIndex logic correctly addresses the hallucination bug and handles all documented edge cases (interleaved actions, extra fills, fewer fills). The fix is minimal and well-isolated. Minor gaps: the alignment invariant lacks an explanatory comment, the "fewer fills" warning has no dedicated test assertion, and the in-place mutation of observeResults before replay recording is an implicit design choice worth making explicit.
  • packages/core/lib/v3/agent/tools/fillform.ts — review the in-place mutation of observeResults objects and consider adding an explanatory comment on the fillIndex invariant.

Important Files Changed

Filename Overview
packages/core/lib/v3/agent/tools/fillform.ts Adds fillIndex counter to map caller-provided field values onto fill results from observe(), overriding LLM-hallucinated arguments; also adds warnings for extra/missing fills. Minor concerns: the fillIndex alignment invariant is undocumented, and res.arguments is mutated in-place before being passed to recordAgentReplayStep.
packages/core/tests/unit/fillform-value-override.test.ts New unit test file covering value override, interleaved actions, empty string, non-fill passthrough, and extra fills. Missing a dedicated test that explicitly asserts the "fewer fills than provided fields" warning path.
.changeset/fix-fillform-value-hallucination.md Patch changeset entry for the bugfix; no issues.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant fillFormTool
    participant observe
    participant act
    participant recordAgentReplayStep

    Caller->>fillFormTool: execute({ fields: [{action, value}, ...] })
    Note over fillFormTool: Build instruction from action strings only<br/>(values intentionally excluded to avoid LLM override)
    fillFormTool->>observe: observe(instruction)
    observe-->>fillFormTool: observeResults (may contain hallucinated fill values)

    Note over fillFormTool: fillIndex = 0
    loop for each res in observeResults
        alt res.method === "fill"
            alt fillIndex < fields.length
                Note over fillFormTool: Override res.arguments with<br/>fields[fillIndex].value (caller-provided)
                Note over fillFormTool: fillIndex++
                fillFormTool->>act: act(res with real value)
                act-->>fillFormTool: actResult
            else fillIndex >= fields.length
                Note over fillFormTool: WARN: more fills than fields → skip (continue)
            end
        else non-fill (click, etc.)
            fillFormTool->>act: act(res unchanged)
            act-->>fillFormTool: actResult
        end
    end

    alt fillIndex < fields.length
        Note over fillFormTool: WARN: fewer fills than fields provided
    end

    fillFormTool->>recordAgentReplayStep: record(observeResults with overridden args)
    fillFormTool-->>Caller: { success, actions, playwrightArguments }
Loading

Comments Outside Diff (3)

  1. packages/core/lib/v3/agent/tools/fillform.ts, line 59-73 (link)

    Add a comment explaining the fillIndex alignment invariant

    The fillIndex counter relies on a non-obvious contract between fields and observeResults: every entry in fields is assumed to correspond to a fill action from observe(), and any non-fill results returned by observe() (e.g. click-to-focus) are treated as interstitial helper steps that do not consume a field slot. Without a comment, future readers could easily mistake this counter for a simple loop index or wonder why non-fill methods don't increment it.

    Per the project's commenting guideline, a brief explanatory comment here would significantly aid readability:

    Rule Used: Add detailed comments for longer PRs that aren't o... (source)

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. packages/core/tests/unit/fillform-value-override.test.ts, line 158-190 (link)

    Missing dedicated test for the "fewer fills" warning

    The PR description explicitly calls out "Warns when observe() returns fewer fills than fields, so dropped values are visible in logs" as a shipped feature, and the corresponding post-loop warning is implemented in fillform.ts (lines 84–90). However, no test in the suite explicitly asserts this warning path.

    The closest existing coverage is the "does not override non-fill methods" test which incidentally triggers the warning (observe returns a click but no fill, so fillIndex (0) < fields.length (1)), but that test doesn't assert on the logger call and is primarily verifying unrelated behaviour.

    Consider adding a dedicated test case:

    it("warns when observe returns fewer fill actions than provided fields", async () => {
      const v3 = createMockV3([
        { method: "fill", arguments: ["hal1"], description: "email" },
        // observe only returns one fill, but two fields were provided
      ]);
    
      const tool = fillFormTool(v3);
      await tool.execute!(
        {
          fields: [
            { action: "type email", value: "real@email.com" },
            { action: "type password", value: "realPass" },
          ],
        },
        toolCtx,
      );
    
      expect(v3.actCalls).toHaveLength(1);
      expect(v3.logger).toHaveBeenCalledWith(
        expect.objectContaining({
          category: "agent",
          message: expect.stringContaining("fewer fill actions"),
        }),
      );
    });
  3. packages/core/lib/v3/agent/tools/fillform.ts, line 63 (link)

    In-place mutation of observeResults objects affects replay fidelity

    res.arguments is mutated directly on the object returned by observe(). Because observeResults is later passed by reference to recordAgentReplayStep, the replay step will record the caller-provided values rather than what the LLM originally observed. For replay this is the right data (you want to replay what was actually typed), but it silently destroys the original LLM output — making it impossible to later diff "what the LLM hallucinated" vs "what was actually used" from the recorded step.

    Consider spreading to avoid the side-effect if observability of the original LLM suggestions is important:

    Or alternatively assign to a local copy before the loop and pass that to act, preserving the original observeResults for the replay record. This depends on whether the replay consumer expects the raw or overridden values.

Last reviewed commit: 86eba36

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant C as Caller / User
    participant FT as fillForm Tool
    participant V3 as Agent Core (V3)
    participant LLM as LLM (Observer)
    participant B as Browser / DOM

    C->>FT: execute(fields: {action, value}[])
    
    Note over FT,V3: Prepare observation strings from fields[].action
    FT->>V3: observe(actionStrings)
    V3->>LLM: Analyze DOM + actions
    LLM-->>V3: Return Action list (e.g. click, fill)
    Note right of LLM: Hallucination Risk:<br/>LLM provides dummy values<br/>for 'fill' arguments.
    V3-->>FT: observeResults[]

    loop For each result in observeResults
        alt NEW: result.method === "fill"
            FT->>FT: CHANGED: Map result to fields[fillIndex]
            opt fields[fillIndex].value !== undefined
                FT->>FT: NEW: Override hallucinated arg<br/>with caller-provided value
            end
            FT->>FT: NEW: Increment fillIndex
        else Non-fill method (e.g. click)
            FT->>FT: Keep original arguments
        end

        FT->>V3: act(method, correctedArgs)
        V3->>B: Perform CDP/Playwright action
        B-->>V3: Success/Failure
        V3-->>FT: Action result
    end

    FT-->>C: Returns completed actions
Loading

When observe() returns more fill actions than fields provided (e.g. LLM
invents a "confirm password" fill), skip the extra fills instead of
silently typing hallucinated values. Logs a warning so callers know.

Also renames misleading test name per review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shrey150
Copy link
Contributor Author

@greptileai

shrey150 and others added 4 commits March 10, 2026 17:24
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shrey150
Copy link
Contributor Author

@greptileai

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shrey150
Copy link
Contributor Author

@greptileai

@shrey150 shrey150 changed the title fix: use caller-provided values in fillForm instead of LLM-hallucinated arguments fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones Mar 11, 2026
@shrey150
Copy link
Contributor Author

@greptileai

Comment on lines 60 to +73
for (const res of observeResults) {
if (res.method === "fill") {
if (fillIndex < fields.length) {
res.arguments = [fields[fillIndex].value];
fillIndex++;
} else {
v3.logger({
category: "agent",
message: `fillForm: observe returned more fill actions than provided fields (${fields.length}); skipping extra fill`,
level: 1,
});
continue;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes agent will provide the values in same order as the observe results are returned in .This likely will not be consistently true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fillForm agent tool drops value parameter, causing LLM to hallucinate fill values

2 participants