Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ mcp-comet.config.json
.claude/settings.local.json
.claude/plans

tools/
/tools/
# Internal Documentation & Plans (local only)
docs/claude-code-guide/
docs/plans/
Expand All @@ -28,4 +28,5 @@ coverage/
video/
.playwright-mcp/
# worktrees
.worktrees/*
.worktrees/*
.claude/worktrees/
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AGENTS.md — MCP Comet

MCP Comet is a TypeScript MCP (Model Context Protocol) server that automates the Perplexity Comet browser via Chrome DevTools Protocol (CDP). It exposes 13 tools over stdio for prompting, polling, screenshots, tab management, source extraction, and mode switching.
MCP Comet is a TypeScript MCP (Model Context Protocol) server that automates the Perplexity Comet browser via Chrome DevTools Protocol (CDP). It exposes 14 tools over stdio for prompting, polling, screenshots, tab management, source extraction, and mode switching.

## Commands

Expand All @@ -27,7 +27,7 @@ Four layers, top to bottom:
MCP Tools (server.ts) → UI Automation (src/ui/) → CDP Transport (src/cdp/) → Comet Browser
```

- **server.ts** — Single file defining all 13 tools via `McpServer.tool()`. Contains `startServer()`, tool definitions, Zod schemas, and all handler logic. This is the main file to edit when adding/modifying tools.
- **server.ts** — Single file defining all 14 tools via `McpServer.tool()`. Contains `startServer()`, tool definitions, Zod schemas, and all handler logic. This is the main file to edit when adding/modifying tools.
- **src/ui/** — Functions that return JavaScript strings (evaluated in the browser via `Runtime.evaluate`). Each `build*Script()` function returns a self-contained IIFE string. **Do not** pass complex objects — everything must serialize to a JS expression.
- **src/cdp/client.ts** — `CDPClient` singleton (`CDPClient.getInstance()`) managing WebSocket connections, auto-reconnect with exponential backoff, and an operation queue (`enqueue()`) to serialize concurrent CDP calls.
- **src/selectors/** — Version-keyed CSS selector sets (`SelectorSet`). `v145.ts` is the current set. New Comet/Chrome versions get a new `v{version}.ts` file registered in `index.ts`. Unknown versions fall back to the latest known set.
Expand Down
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,41 @@
# Changelog

## [1.2.0] - 2026-04-14

### Added

- `comet_approve_action` tool: click primary or cancel buttons on Comet permission/confirmation prompts (14th tool)
- `awaiting_action` agent status: detects when Comet is waiting for user confirmation before executing an action
- `actionPrompt` and `actionButtons` fields in poll/wait responses when a permission prompt is active
- `COMET_OVERRIDE_VIEWPORT` configuration option (default: `false`) to control whether the browser viewport is overridden on connect
- Dynamic viewport detection via `Page.getLayoutMetrics()` for accurate screenshots regardless of window size
- Response truncation marker at 8000 chars guiding users to `comet_get_page_content` for full text
- `ACTION_BANNER` selector for detecting Comet's `@container/banner` permission prompt containers

### Fixed

- **Permission prompt detection**: `comet_wait` and `comet_poll` now correctly report `awaiting_action` status instead of `completed` when Comet shows a permission prompt
- **Browser window resize**: `Emulation.setDeviceMetricsOverride` no longer resizes the browser window by default — viewport override is now opt-in via `COMET_OVERRIDE_VIEWPORT=true`
- **Race condition — concurrent connects**: Multiple simultaneous `ensureConnected()` calls are now deduplicated via a shared promise guard
- **Race condition — concurrent asks**: `comet_ask` now uses a mutex to prevent concurrent prompt submissions from corrupting each other
- **Mode read navigation**: `comet_mode` (read) no longer navigates away from the current page when already on the Perplexity home page
- **Mode switch reliability**: `comet_mode` (switch) now invokes React's `onMouseDown` handler directly via fiber props instead of `item.click()`, which silently failed because Comet's typeahead menu items use `onMouseDown`, not `onClick`
- **Editor clearing**: Mode switching now clears residual Lexical editor text (select-all + delete + backspace safety net) instead of relying on page reload, which did not clear Lexical state
- **Mode read editor clearing**: `comet_mode` (read) now also clears existing editor text before typing `/` and presses Escape + Backspace between retries to prevent `/` character accumulation
- **Prose filter over-trimming**: Short questions under 20 chars (previously 100) are no longer excluded from response text
- **Submit verification**: `buildSubmitPromptScript` now focuses the input element before submitting and verifies the input was cleared
- **Gitignore scope**: `tools/` pattern changed to `/tools/` (root-level only) to unblock staging test files in `tests/unit/tools/`
- **Status parsing hardening**: `parseAgentStatus` now validates all fields with defaults instead of raw casting — prevents `TypeError` crashes when the browser returns incomplete status data
- **Empty prompt rejection**: `comet_ask` now requires non-empty prompts via `z.string().min(1)` validation
- **Switch tab validation**: `comet_switch_tab` returns a clear error when called with no `tabId` or `title` instead of showing "undefined"
- **Boundary value guards**: `comet_wait` and `comet_get_page_content` now handle `timeout=0` and `maxLength=0` gracefully instead of producing confusing behavior

### Changed

- 14 tools total (was 13)
- Removed unused `AgentState` enum and dead `AgentStatus` interface — replaced with canonical `AgentStatus` type with `AgentStatusValue` union (`'idle' | 'working' | 'completed' | 'awaiting_action'`)
- Removed dead `timeout` parameter from `comet_ask` schema (was accepted but never used)

## [1.1.2] - 2026-04-11

### Fixed
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
<a href="https://github.com/OneStepAt4time/mcp-comet/actions"><img src="https://img.shields.io/github/actions/workflow/status/OneStepAt4time/mcp-comet/ci.yml?style=flat-square" alt="build" /></a>
<a href="LICENSE"><img src="https://img.shields.io/github/license/OneStepAt4time/mcp-comet?style=flat-square" alt="license" /></a>
</p>
<p><strong>7 modes</strong> · <strong>13 tools</strong> · zero-friction setup · full browser control</p>
<p><strong>7 modes</strong> · <strong>14 tools</strong> · zero-friction setup · full browser control</p>
<p>
<a href="#quick-start">Quick Start</a> ·
<a href="docs/tools.md">Tool Reference</a> ·
Expand Down Expand Up @@ -145,6 +145,7 @@ mcp-comet call comet_get_sources
- `comet_poll`: returns live status and partial progress.
- `comet_wait`: waits for completion and returns the full response.
- `comet_stop`: stops a running task.
- `comet_approve_action`: approves or cancels Comet permission prompts.

### Query

Expand Down Expand Up @@ -182,6 +183,7 @@ Full reference: [docs/tools.md](docs/tools.md)
| Multi-perspective debate | `comet_mode(model-council)` -> `comet_ask` -> `comet_wait` |
| Visual evidence capture | `comet_screenshot` -> pass image into your vision-capable model |
| Resume old investigations | `comet_list_conversations` -> `comet_open_conversation` -> `comet_get_page_content` |
| Action with permission prompt | `comet_ask` -> `comet_wait` -> `comet_approve_action` |

---

Expand Down
9 changes: 6 additions & 3 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
MCP Comet uses a four-layer architecture:

```
MCP Tools (13 tools)
MCP Tools (14 tools)
|
UI Automation (selectors, input, status, extraction, navigation)
|
Expand All @@ -14,7 +14,7 @@ Perplexity Comet Browser (Chromium)

## MCP Layer

MCP Comet exposes 13 MCP tools grouped by function:
MCP Comet exposes 14 MCP tools grouped by function:

**Session**

Expand All @@ -31,6 +31,7 @@ MCP Comet exposes 13 MCP tools grouped by function:
|------|---------|
| `comet_ask` | Send a prompt and wait for the response |
| `comet_mode` | Switch Comet focus mode |
| `comet_approve_action` | Approve or cancel permission prompts |

**Content**

Expand All @@ -57,7 +58,7 @@ Selectors are ordered arrays of CSS selectors. Each strategy tries selectors in

### Typeahead Mode Detection

When switching modes via `comet_mode`, MCP Comet reads the SVG icon `href` from typeahead menu items with the `.bg-subtle` class. Icon IDs map to mode names:
When switching modes via `comet_mode`, MCP Comet opens the typeahead menu by typing `/` into the Lexical editor via `document.execCommand('insertText')`. It then reads the SVG icon `href` from typeahead menu items with the `.bg-subtle` class to detect the active mode. Icon IDs map to mode names:

| Icon ID | Mode |
|---------|------|
Expand All @@ -70,6 +71,8 @@ When switching modes via `comet_mode`, MCP Comet reads the SVG icon `href` from

If icon detection fails, the system falls back to URL-based mode detection.

For mode switching, the target menu item's React `onMouseDown` prop is invoked directly via fiber props (`__reactProps$`). This is necessary because Comet's typeahead items use React's `onMouseDown` handler, not the standard DOM `click()` method. Before opening the typeahead, the editor is cleared of any existing text using select-all + delete + backspace, since Lexical editor state persists across page navigations.

### Collapsed Citation Expansion

Sources with collapsed citation text (matching the pattern `^\w+\+\d+$`, such as "arXiv+3") do not expose a URL directly. MCP Comet clicks these elements to reveal the full source URL, then re-extracts sources in a second pass. This two-pass strategy ensures complete source collection.
Expand Down
4 changes: 4 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ The three most commonly used variables:
| `COMET_USER_DATA_DIR` | null | Path to a custom Chrome user data directory. Use this to persist cookies, local storage, and other browser profile data across sessions (for example, `~/.config/mcp-comet/chrome-profile`). When unset, Comet uses a temporary profile each launch. |
| `COMET_WINDOW_WIDTH` | 1440 | Browser window width in pixels at launch. Controls the initial viewport dimensions of the Comet browser window. |
| `COMET_WINDOW_HEIGHT` | 900 | Browser window height in pixels at launch. Controls the initial viewport dimensions of the Comet browser window. |
| `COMET_OVERRIDE_VIEWPORT` | false | Override the browser viewport via CDP `setDeviceMetricsOverride` on connect. **Warning:** enabling this physically resizes the browser window. When disabled (default), screenshots use the actual viewport dimensions. |

## Priority

Expand Down Expand Up @@ -66,6 +67,9 @@ Create `mcp-comet.config.json` in your project root. Keys use camelCase (not the
"windowWidth": 1440,
"windowHeight": 900,

// Override browser viewport via CDP on connect (resizes the window)
"overrideViewport": false,

// Reconnection behavior
"maxReconnectAttempts": 5,
"maxReconnectDelay": 5000,
Expand Down
2 changes: 1 addition & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Run `npm run lint` before committing to catch style issues early.
```
src/
cli.ts -- CLI entry point (start, call, detect commands)
server.ts -- MCP server with 13 tool handlers
server.ts -- MCP server with 14 tool handlers
config.ts -- Configuration loading + validation
errors.ts -- 9 error subclasses with codes
index.ts -- Library entry point (exports startServer)
Expand Down
2 changes: 1 addition & 1 deletion docs/integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Claude Code reads MCP server configuration from `~/.claude/claude_desktop_config
}
```

After updating the configuration file, restart Claude Code. MCP Comet will appear as an MCP server exposing 13 tools.
After updating the configuration file, restart Claude Code. MCP Comet will appear as an MCP server exposing 14 tools.

**Verify:** Ask Claude "What MCP tools do you have available?" The list should include `comet_connect`, `comet_ask`, `comet_wait`, and the other tools documented in [tools.md](tools.md).

Expand Down
Loading