Skip to content

TKasperczyk/gemini-browser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini 2.5 Browser Pilot

Demo of the Gemini 2.5 Browser Pilot in action

Python 3.10+ FastAPI powered Playwright async Svelte 5 pnpm ready

AI-assisted browser automation you can supervise in real time.

This project wires Google's Gemini 2.5 Computer Use preview model to a real Chromium browser that you can watch—and interrupt—in real time. The agent gets a headful window, you get a live chat console, and both of you can take the wheel whenever you like.

The goal here is simple: give humans an honest look at what the preview model can do while keeping supervision front and center.


What you get

  • A FastAPI backend that exposes a WebSocket bridge for Gemini and serves the compiled Svelte UI.
  • A modern dashboard built with Svelte 5 plus Tailwind that shows messages, confirmations, screenshots, and action history.
  • A small action library on top of Playwright that handles clicks, drags, scrolling, navigation, text entry, and downloads.
  • Safety rails that pause the agent for confirmation when something risky is requested.

What you need before running it

  • Python 3.10 or newer
  • pnpm
  • A Gemini API key with access to gemini-2.5-computer-use-preview-10-2025

The backend talks to Gemini strictly through environment variables. Set your key with:

export GEMINI_API_KEY=your_key_here

Getting started the easy way

The helper script manage.py hides the usual wall of setup commands:

python manage.py launch

That one line will:

  1. Create (or reuse) .venv and install requirements.txt.
  2. Run pnpm install --frozen-lockfile inside ui/.
  3. Build the Svelte bundle if anything changed since the last build.
  4. Start Uvicorn on http://localhost:8000.

Pop that URL in a browser and you'll see the dashboard. Seconds later a Chromium window appears next to it—the agent's playground that you can also interact with.


Day-to-day commands

All of these are run as python manage.py <command>:

  • bootstrap — install backend and UI dependencies without starting anything.
  • build — ensure the UI bundle is up to date (skips work when sources are unchanged).
  • launch — bootstrap, build if needed, then start Uvicorn (what we used above).
  • dev — like launch, but leaves Uvicorn in --reload mode for backend development.
  • rebuild-ui — force a fresh pnpm run build.
  • clean — remove the virtualenv, cached install stamps, and the UI build directory.
  • clean-profile — wipe the persisted Chromium profile if you want a fresh browser state.

Under the hood the script drops timestamp files (.pip-installed, .pnpm-installed, .build-stamp) so it can tell when a reinstall or rebuild is truly necessary.

Prefer running things by hand? Activate .venv, install requirements, run pnpm install once, build the UI with pnpm run build, and then launch with:

uvicorn app.main:app --reload

Frontend workflow

The UI code lives under ui/.

cd ui
pnpm install          # once
pnpm run dev          # Svelte dev server at http://localhost:5173

Leave the FastAPI process running so the dev UI can reach the /ws WebSocket. When you're ready for a production bundle, run pnpm run build (the output lands in ui/build/ and is what the Python app serves).

Handy scripts:

  • pnpm run guard:derived — catches invalid $derived usage in Svelte reactivity.
  • pnpm run check — guard plus svelte-check.
  • pnpm run lint / pnpm run format — ESLint and Prettier with the project presets.

What a session looks like

  1. You type a goal in the chat panel—for example, “Find the latest blog post about the Gemini API.”
  2. Gemini starts issuing actions. Each appears in the action log with arguments and status.
  3. Potentially sensitive actions trigger a confirmation card; approve or deny to continue.
  4. The screenshot panel updates after every step so you can see what the browser is doing.
  5. If the agent gets stuck (CAPTCHA, paywall, missing info) it tells you instead of wandering off.

You can still click or type directly in the Chromium window at any time. Playwright and Gemini will adapt.


Configuration knobs

Everything is driven by environment variables; sensible defaults are baked in if you leave them unset.

Browser window

GEMINI_BROWSER_WIDTH=1440
GEMINI_BROWSER_HEIGHT=900
GEMINI_BROWSER_PROFILE=~/.gemini-browser/profile
GEMINI_BROWSER_DOWNLOADS=~/Downloads/gemini-browser

Timing

GEMINI_ACTION_DELAY=2.0          # default settle delay after actions
GEMINI_NAVIGATION_DELAY=2.0      # settle delay after page navigation
GEMINI_SCROLL_PAUSE=0.5          # wait after wheel events
GEMINI_DRAG_PREHOLD_DELAY=0.1    # pause before dragging
GEMINI_DRAG_POSTHOLD_DELAY=0.1   # pause after dropping
GEMINI_WAIT_ACTION_DURATION=5.0  # how long the wait action sleeps

Agent

GEMINI_MODEL_NAME=gemini-2.5-computer-use-preview-10-2025
GEMINI_TURN_LIMIT=1000

# Safety filters:
#   GEMINI_DISABLE_SAFETY=true  (ask for filters off)
#   GEMINI_ENABLE_SAFETY=true   (force filters on)

If you request the filters to be disabled and the API refuses, the code automatically retries with defaults and logs the downgrade.


Project map

  • app/main.py — FastAPI application serving the UI and WebSocket endpoint.
  • app/browser.py — launches and manages the shared Playwright browser context.
  • app/actions.py plus app/builtin_actions/* — the catalog of Playwright actions Gemini can call.
  • app/agent.py and app/session_worker.py — keep per-session state, confirmations, and Gemini turns.
  • app/utils.py, app/downloads.py, app/history.py — helpers for screenshots, downloads, and logging.
  • ui/ — Svelte workspace (source, scripts, static config, and build output).

Safety first

Gemini Computer Use is still a preview feature. Keep someone watching the screen and never hand it credentials you wouldn't paste into a chat window yourself.

  • The default browser profile lives at ~/.gemini-browser/profile. Use that isolated space or point the env var at another throwaway directory.
  • Downloads go to ~/Downloads/gemini-browser unless you override it.
  • When in doubt, stop the agent, clean the profile (python manage.py clean-profile), and start fresh.

Troubleshooting cheat sheet

  • Playwright launch issuesplaywright doctor, or reinstall with playwright install chromium (add --with-deps on Linux if system libraries are missing).
  • Gemini errors — confirm GEMINI_API_KEY is set and that the key has access to the preview model.
  • WebSocket says “connecting…” forever — check /healthz, then look at your browser console for errors.
  • UI looks stalepython manage.py rebuild-ui.
  • Browser acting weirdpython manage.py clean-profile.
  • Need more logsuvicorn app.main:app --log-level debug.

If things really fall over, python manage.py clean followed by python manage.py launch gives you a clean slate.


Final notes

This codebase is meant for tinkering, demos, and learning how Gemini's computer-use preview behaves with a real browser. Contributions and bug reports are welcome. Just keep the human-in-the-loop mindset: supervise what the model is doing, review downloads, and never run it unattended on sensitive accounts.

About

Just a small wrapper for the Gemini 2.5 Computer Use model with a Svelte frontend

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published