Skip to content

First letter of cdp_session.cdp_client.send.Input.dispatchKeyEvent does not work #18

@aybjax

Description

@aybjax

Background
CDP-use commands called by Browser-use agent in X.com post and comment section always results in first letter omission every time.

Example use
For example, see following code and what it results in:

			for i, char in enumerate("This is super interesting! I've seen similar models struggle with nuanced emotional expressions. How do you handle those subtle facial cues in your audio-to-talking head tech? Keep up the great work!"):
				# Handle regular characters
				# Get proper modifiers, VK code, and base key for the character
				modifiers, vk_code, base_key = self._get_char_modifiers_and_vk(char)
				key_code = self._get_key_code_for_char(base_key)

				# self.logger.debug(f'🎯 Typing character {i + 1}/{len(text)}: "{char}" (base_key: {base_key}, code: {key_code}, modifiers: {modifiers}, vk: {vk_code})')

				# Step 1: Send keyDown event (NO text parameter)
				res1 = await cdp_session.cdp_client.send.Input.dispatchKeyEvent(
					params={
						'type': 'keyDown',
						'key': base_key,
						'code': key_code,
						'modifiers': modifiers,
						'windowsVirtualKeyCode': vk_code,
					},
					session_id=cdp_session.session_id,
				)
				print(res1)

				# Small delay to emulate human typing speed
				await asyncio.sleep(0.005)

				# Step 2: Send char event (WITH text parameter) - this is crucial for text input
				res2 = await cdp_session.cdp_client.send.Input.dispatchKeyEvent(
					params={
						'type': 'char',
						'text': char,
						'key': char,
					},
					session_id=cdp_session.session_id,
				)
				print(res2)

				# Step 3: Send keyUp event (NO text parameter)
				res3 = await cdp_session.cdp_client.send.Input.dispatchKeyEvent(
					params={
						'type': 'keyUp',
						'key': base_key,
						'code': key_code,
						'modifiers': modifiers,
						'windowsVirtualKeyCode': vk_code,
					},
					session_id=cdp_session.session_id,
				)
				print(res3)

			# Small delay between characters to look human (realistic typing speed)
			await asyncio.sleep(0.001)

It does print empty dicts ({}) for all res1, res2 and res3. The result is as follows
Image
Although it had to write This is super interesting! I've seen similar models struggle with nuanced emotional expressions. How do you handle those subtle facial cues in your audio-to-talking head tech? Keep up the great work! in the above comment section.

Code around where it fails
Below is HTML code around which CDP commands fail. It does not fail in textarea or input. See X.com example below:

<div aria-activedescendant="typeaheadFocus-0.4489554008763781" aria-autocomplete="list"
  aria-controls="typeaheadDropdownWrapped-2" aria-describedby="placeholder-1u59i" aria-label="Post text"
  aria-multiline="true" class="notranslate public-DraftEditor-content" contenteditable="true"
  data-testid="tweetTextarea_0" role="textbox" spellcheck="true" tabindex="0" no-focustrapview-refocus="true"
  style="outline: none; user-select: text; white-space: pre-wrap; overflow-wrap: break-word;">
  <div data-contents="true">
    <div class="" data-block="true" data-editor="1u59i" data-offset-key="d0o8i-0-0">
      <div data-offset-key="d0o8i-0-0" class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr"><span
          data-offset-key="d0o8i-0-0"><br data-text="true"></span></div>
    </div>
  </div>
</div>

P.S.
See related issue in browser-use repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions