Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions packages/core/lib/v3/agent/tools/fillform.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import type { V3 } from "../../v3.js";
import type { Action } from "../../types/public/methods.js";
import type { AgentModelConfig, Variables } from "../../types/public/agent.js";
import { TimeoutError } from "../../types/public/sdkErrors.js";
import { substituteVariables } from "../utils/variables.js";

export const fillFormTool = (
v3: V3,
Expand All @@ -12,18 +13,22 @@ export const fillFormTool = (
toolTimeout?: number,
) => {
const hasVariables = variables && Object.keys(variables).length > 0;
const valueDescription = hasVariables
? `The exact text to type into the field. Use %variableName% to substitute a variable value. Available: ${Object.keys(variables).join(", ")}`
: "The exact text to type into the field";
const actionDescription = hasVariables
? `Must follow the pattern: "type <exact value> into the <field name> <fieldType>". Use %variableName% to substitute a variable value. Available: ${Object.keys(variables).join(", ")}. Examples: "type %email% into the email input", "type %password% into the password input"`
: 'Must follow the pattern: "type <exact value> into the <field name> <fieldType>". Examples: "type john@example.com into the email input", "type John into the first name input"';
? `Describe which field to target, e.g. "type into the email input", "type into the password field". Use %variableName% to substitute a variable value. Available: ${Object.keys(variables).join(", ")}. Example: "type %email% into the email input"`
: 'Describe which field to target, e.g. "type into the email input", "type into the first name input"';
Comment on lines 19 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Variable token in action description is misleading

The actionDescription for the hasVariables case still mentions %variableName% syntax and includes the example "type %email% into the email input". This suggests putting variable tokens in the action field, but the action string is used only to build the observe() instruction and no substituteVariables() call is ever applied to it—so any %variableName% token placed there would be passed verbatim to the observe LLM and never resolved.

Since the intent of this PR is to separate element targeting (action) from value typing (value), the action description should be cleaned up to remove the variable-substitution guidance entirely:

Suggested change
const actionDescription = hasVariables
? `Must follow the pattern: "type <exact value> into the <field name> <fieldType>". Use %variableName% to substitute a variable value. Available: ${Object.keys(variables).join(", ")}. Examples: "type %email% into the email input", "type %password% into the password input"`
: 'Must follow the pattern: "type <exact value> into the <field name> <fieldType>". Examples: "type john@example.com into the email input", "type John into the first name input"';
? `Describe which field to target, e.g. "type into the email input", "type into the password field". Use %variableName% to substitute a variable value. Available: ${Object.keys(variables).join(", ")}. Example: "type %email% into the email input"`
: 'Describe which field to target, e.g. "type into the email input", "type into the first name input"';
? `Describe which field to target, e.g. "the email input", "the password field".`
: 'Describe which field to target, e.g. "the email input", "the first name input"';


return tool({
description:
'FORM FILL - MULTI-FIELD INPUT TOOL\nFill 2+ form inputs/textareas at once. Each action MUST include the exact text to type and the target field, e.g. "type john@example.com into the email field".',
'FORM FILL - MULTI-FIELD INPUT TOOL\nFill 2+ form inputs/textareas at once. Each field requires an action describing the target element and a value with the text to type.',
inputSchema: z.object({
fields: z
.array(
z.object({
action: z.string().describe(actionDescription),
value: z.string().describe(valueDescription),
}),
)
.min(1, "Provide at least one field to fill"),
Expand All @@ -50,6 +55,17 @@ export const fillFormTool = (
: { timeout: toolTimeout };
const observeResults = await v3.observe(instruction, observeOptions);

// Override observe results with the actual values provided by the agent.
// The LLM used by observe() may hallucinate placeholder values instead of
// using the intended text, so we inject the real values before calling act().
for (let i = 0; i < observeResults.length && i < fields.length; i++) {
const res = observeResults[i];
if (res.method === "fill" && res.arguments && res.arguments.length > 0) {
const actualValue = substituteVariables(fields[i].value, variables);
res.arguments[0] = actualValue;
}
}
Comment on lines +61 to +67
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Value/field index misalignment risk

The loop assumes observeResults[i] corresponds to fields[i] in a 1-to-1, order-preserving way. However, observe() delegates to an LLM which processes the combined instruction string and returns actions in whatever order it infers (typically DOM/accessibility-tree order, not declaration order). If the LLM returns results in a different sequence—or returns fewer results because one element wasn't found—the wrong value will be injected into the wrong field.

A concrete example: given fields = [{action: "email input", value: "user@example.com"}, {action: "password input", value: "s3cr3t"}], if the password field appears before the email field in the DOM, the observe LLM may return [{...password element...}, {...email element...}]. The loop would then do:

  • observeResults[0] (password element) ← fields[0].value = "user@example.com"
  • observeResults[1] (email element) ← fields[1].value = "s3cr3t"

In the old code the value was embedded in the action string so each observe result already carried its own value regardless of ordering. The new separation of value from action makes the ordering dependency a correctness issue.

A safer approach would be to make one observe() call per field (matching each result directly to its value), or to enrich the instruction with a unique tag per field and parse that tag back from the observe results to reconstruct the mapping.


const completed = [] as unknown[];
const replayableActions: Action[] = [];
for (const res of observeResults) {
Expand Down
Loading