fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones#1807
fix(agent): fillForm tool drops caller-provided values for LLM-hallucinated ones#1807
Conversation
…ed arguments The fillForm tool was passing only field actions to observe(), causing the LLM to hallucinate placeholder values (e.g. "test@example.com") instead of using the actual values provided by the caller. This broke login forms, 2FA flows, and any workflow where specific values matter. Uses a fillIndex counter to correctly align fill results with field values, handling interleaved non-fill actions. Also uses !== undefined instead of truthiness check so empty string values (for clearing fields) work correctly. Fixes #1789 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 86eba36 The changes in this PR will be included in the next version bump. This PR includes changesets to release 5 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Greptile SummaryThis PR fixes a bug where the
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant fillFormTool
participant observe
participant act
participant recordAgentReplayStep
Caller->>fillFormTool: execute({ fields: [{action, value}, ...] })
Note over fillFormTool: Build instruction from action strings only<br/>(values intentionally excluded to avoid LLM override)
fillFormTool->>observe: observe(instruction)
observe-->>fillFormTool: observeResults (may contain hallucinated fill values)
Note over fillFormTool: fillIndex = 0
loop for each res in observeResults
alt res.method === "fill"
alt fillIndex < fields.length
Note over fillFormTool: Override res.arguments with<br/>fields[fillIndex].value (caller-provided)
Note over fillFormTool: fillIndex++
fillFormTool->>act: act(res with real value)
act-->>fillFormTool: actResult
else fillIndex >= fields.length
Note over fillFormTool: WARN: more fills than fields → skip (continue)
end
else non-fill (click, etc.)
fillFormTool->>act: act(res unchanged)
act-->>fillFormTool: actResult
end
end
alt fillIndex < fields.length
Note over fillFormTool: WARN: fewer fills than fields provided
end
fillFormTool->>recordAgentReplayStep: record(observeResults with overridden args)
fillFormTool-->>Caller: { success, actions, playwrightArguments }
|
There was a problem hiding this comment.
No issues found across 3 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant C as Caller / User
participant FT as fillForm Tool
participant V3 as Agent Core (V3)
participant LLM as LLM (Observer)
participant B as Browser / DOM
C->>FT: execute(fields: {action, value}[])
Note over FT,V3: Prepare observation strings from fields[].action
FT->>V3: observe(actionStrings)
V3->>LLM: Analyze DOM + actions
LLM-->>V3: Return Action list (e.g. click, fill)
Note right of LLM: Hallucination Risk:<br/>LLM provides dummy values<br/>for 'fill' arguments.
V3-->>FT: observeResults[]
loop For each result in observeResults
alt NEW: result.method === "fill"
FT->>FT: CHANGED: Map result to fields[fillIndex]
opt fields[fillIndex].value !== undefined
FT->>FT: NEW: Override hallucinated arg<br/>with caller-provided value
end
FT->>FT: NEW: Increment fillIndex
else Non-fill method (e.g. click)
FT->>FT: Keep original arguments
end
FT->>V3: act(method, correctedArgs)
V3->>B: Perform CDP/Playwright action
B-->>V3: Success/Failure
V3-->>FT: Action result
end
FT-->>C: Returns completed actions
When observe() returns more fill actions than fields provided (e.g. LLM invents a "confirm password" fill), skip the extra fills instead of silently typing hallucinated values. Logs a warning so callers know. Also renames misleading test name per review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| for (const res of observeResults) { | ||
| if (res.method === "fill") { | ||
| if (fillIndex < fields.length) { | ||
| res.arguments = [fields[fillIndex].value]; | ||
| fillIndex++; | ||
| } else { | ||
| v3.logger({ | ||
| category: "agent", | ||
| message: `fillForm: observe returned more fill actions than provided fields (${fields.length}); skipping extra fill`, | ||
| level: 1, | ||
| }); | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
This assumes agent will provide the values in same order as the observe results are returned in .This likely will not be consistently true.
Summary
Fixes #1789 — supersedes #1805 with a more robust implementation.
Based on the original fix by @elliotllliu in #1805 — thank you for identifying and diagnosing this bug!
The
fillFormtool was passing only fieldactionstrings toobserve(), causing the LLM to hallucinate placeholder values (e.g.test@example.com) instead of using the actualvalueprovided by the caller. This broke login forms, 2FA flows, search queries, and any workflow where specific values matter.What this PR does
fillIndexcounter to alignfillresults fromobserve()with caller-provided field values, correctly handling interleaved non-fill actions (e.g. click-to-focus)observe()returns more fill actions than fields provided, with a warning logobserve()returns fewer fills than fields, so dropped values are visible in logsChanges
packages/core/lib/v3/agent/tools/fillform.tspackages/core/tests/unit/fillform-value-override.test.ts.changeset/fix-fillform-value-hallucination.mdTest plan
value: ""correctly overrides (fails on main, passes here)agent-execution-modeltests pass🤖 Generated with Claude Code