AI-assisted browser automation you can supervise in real time.
This project wires Google's Gemini 2.5 Computer Use preview model to a real Chromium browser that you can watch—and interrupt—in real time. The agent gets a headful window, you get a live chat console, and both of you can take the wheel whenever you like.
The goal here is simple: give humans an honest look at what the preview model can do while keeping supervision front and center.
- A FastAPI backend that exposes a WebSocket bridge for Gemini and serves the compiled Svelte UI.
- A modern dashboard built with Svelte 5 plus Tailwind that shows messages, confirmations, screenshots, and action history.
- A small action library on top of Playwright that handles clicks, drags, scrolling, navigation, text entry, and downloads.
- Safety rails that pause the agent for confirmation when something risky is requested.
- Python 3.10 or newer
- pnpm
- A Gemini API key with access to
gemini-2.5-computer-use-preview-10-2025
The backend talks to Gemini strictly through environment variables. Set your key with:
export GEMINI_API_KEY=your_key_hereThe helper script manage.py hides the usual wall of setup commands:
python manage.py launchThat one line will:
- Create (or reuse)
.venvand installrequirements.txt. - Run
pnpm install --frozen-lockfileinsideui/. - Build the Svelte bundle if anything changed since the last build.
- Start Uvicorn on
http://localhost:8000.
Pop that URL in a browser and you'll see the dashboard. Seconds later a Chromium window appears next to it—the agent's playground that you can also interact with.
All of these are run as python manage.py <command>:
bootstrap— install backend and UI dependencies without starting anything.build— ensure the UI bundle is up to date (skips work when sources are unchanged).launch— bootstrap, build if needed, then start Uvicorn (what we used above).dev— likelaunch, but leaves Uvicorn in--reloadmode for backend development.rebuild-ui— force a freshpnpm run build.clean— remove the virtualenv, cached install stamps, and the UI build directory.clean-profile— wipe the persisted Chromium profile if you want a fresh browser state.
Under the hood the script drops timestamp files (.pip-installed, .pnpm-installed, .build-stamp) so it can tell when a reinstall or rebuild is truly necessary.
Prefer running things by hand? Activate .venv, install requirements, run pnpm install once, build the UI with pnpm run build, and then launch with:
uvicorn app.main:app --reloadThe UI code lives under ui/.
cd ui
pnpm install # once
pnpm run dev # Svelte dev server at http://localhost:5173Leave the FastAPI process running so the dev UI can reach the /ws WebSocket. When you're ready for a production bundle, run pnpm run build (the output lands in ui/build/ and is what the Python app serves).
Handy scripts:
pnpm run guard:derived— catches invalid$derivedusage in Svelte reactivity.pnpm run check— guard plussvelte-check.pnpm run lint/pnpm run format— ESLint and Prettier with the project presets.
- You type a goal in the chat panel—for example, “Find the latest blog post about the Gemini API.”
- Gemini starts issuing actions. Each appears in the action log with arguments and status.
- Potentially sensitive actions trigger a confirmation card; approve or deny to continue.
- The screenshot panel updates after every step so you can see what the browser is doing.
- If the agent gets stuck (CAPTCHA, paywall, missing info) it tells you instead of wandering off.
You can still click or type directly in the Chromium window at any time. Playwright and Gemini will adapt.
Everything is driven by environment variables; sensible defaults are baked in if you leave them unset.
GEMINI_BROWSER_WIDTH=1440
GEMINI_BROWSER_HEIGHT=900
GEMINI_BROWSER_PROFILE=~/.gemini-browser/profile
GEMINI_BROWSER_DOWNLOADS=~/Downloads/gemini-browserGEMINI_ACTION_DELAY=2.0 # default settle delay after actions
GEMINI_NAVIGATION_DELAY=2.0 # settle delay after page navigation
GEMINI_SCROLL_PAUSE=0.5 # wait after wheel events
GEMINI_DRAG_PREHOLD_DELAY=0.1 # pause before dragging
GEMINI_DRAG_POSTHOLD_DELAY=0.1 # pause after dropping
GEMINI_WAIT_ACTION_DURATION=5.0 # how long the wait action sleepsGEMINI_MODEL_NAME=gemini-2.5-computer-use-preview-10-2025
GEMINI_TURN_LIMIT=1000
# Safety filters:
# GEMINI_DISABLE_SAFETY=true (ask for filters off)
# GEMINI_ENABLE_SAFETY=true (force filters on)If you request the filters to be disabled and the API refuses, the code automatically retries with defaults and logs the downgrade.
app/main.py— FastAPI application serving the UI and WebSocket endpoint.app/browser.py— launches and manages the shared Playwright browser context.app/actions.pyplusapp/builtin_actions/*— the catalog of Playwright actions Gemini can call.app/agent.pyandapp/session_worker.py— keep per-session state, confirmations, and Gemini turns.app/utils.py,app/downloads.py,app/history.py— helpers for screenshots, downloads, and logging.ui/— Svelte workspace (source, scripts, static config, and build output).
Gemini Computer Use is still a preview feature. Keep someone watching the screen and never hand it credentials you wouldn't paste into a chat window yourself.
- The default browser profile lives at
~/.gemini-browser/profile. Use that isolated space or point the env var at another throwaway directory. - Downloads go to
~/Downloads/gemini-browserunless you override it. - When in doubt, stop the agent, clean the profile (
python manage.py clean-profile), and start fresh.
- Playwright launch issues —
playwright doctor, or reinstall withplaywright install chromium(add--with-depson Linux if system libraries are missing). - Gemini errors — confirm
GEMINI_API_KEYis set and that the key has access to the preview model. - WebSocket says “connecting…” forever — check
/healthz, then look at your browser console for errors. - UI looks stale —
python manage.py rebuild-ui. - Browser acting weird —
python manage.py clean-profile. - Need more logs —
uvicorn app.main:app --log-level debug.
If things really fall over, python manage.py clean followed by python manage.py launch gives you a clean slate.
This codebase is meant for tinkering, demos, and learning how Gemini's computer-use preview behaves with a real browser. Contributions and bug reports are welcome. Just keep the human-in-the-loop mindset: supervise what the model is doing, review downloads, and never run it unattended on sensitive accounts.
