spike(search): Anakin first, Google CSE fallback + eval harness#99
spike(search): Anakin first, Google CSE fallback + eval harness#99paritoshtripathi935 wants to merge 33 commits into
Conversation
…ness
Introduces a pluggable web-search provider layer at
app/services/search_providers/ so we can A/B Anakin's single-call
search+content-extraction API against the existing Google CSE + BS4
pipeline without touching the call sites.
Wiring is conservative:
- Default behaviour (no env config) → GoogleCSEProvider, byte-for-byte
identical to current production. Existing /search latency / shape /
failure modes are unchanged for deploys that don't opt in.
- Set ANAKIN_API_KEY (and optionally WEB_SEARCH_PROVIDER=anakin_then_google)
→ FallbackProvider runs Anakin first; on any HTTP error, parse
failure, or empty result set, it transparently falls through to
GoogleCSEProvider. Worst-case behaviour is "same as today".
- Smart default: with ANAKIN_API_KEY present and no explicit override,
the chain is selected automatically.
The Anakin client is defensive about response shape because the public
docs hide the JSON schema (it sits behind login). The parser tries a
priority-ordered list of field names per row (title|name, url|link,
snippet|description, content|extracted_content|text|...) and logs the
top-level payload keys on a zero-result response so we know exactly
which name to lock in when we have real responses to look at.
Eval harness in backend/scripts/eval_search_providers.py runs N queries
through every requested provider, dumps a JSON report with per-call
latency, result count, total extracted-chars, urls/titles/samples, and
prints a friendly per-provider summary table. Smoke-tested against
the live Anakin endpoint with an invalid key — POST /v1/search is
reachable, our body shape was accepted (401 unauthorized rather than
400 bad request), and the error path logs the response body per the
repo's "always log .response.text[:500] on httpx errors" convention.
Includes a starter scripts/eval_queries.txt (8 queries spanning the
operator-question taxonomy) so this is one command away from a useful
report once a real key is set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for mini-perplexity canceled.
|
✅ Deploy Preview for mini-perplexity ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Live API returns 400 invalid_request "Prompt is required" when the body uses `query`. Renamed to `prompt`. Eval confirmed working: 8/8 queries succeed, avg 900ms vs Google CSE's 17.2s, avg 13.5k chars extracted vs Google's 1.3k. Comment block in _request_body() updated to reflect locked-in field names rather than guesses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live eval — Anakin vs Google CSERan the harness against the 8 starter queries in
Per-result extracted content: Google ~180 chars · Anakin ~2,700 chars. Qualitative samples (URLs returned per provider)Q: "how to ramp meta ABO budgets in Q4 holiday push"
Q: "what changes when iOS 14 ATT prompt opt-in rate falls below 30 percent"
Q: "lifecycle email cadence for warm leads in B2C DTC"
One fix needed during evalFirst run was 8/8 Side finding: Google CSE is actively degradingThe per-result RecommendationMerge. The smart-default behaviour already does the right thing: with |
… regex
Frontend hosted at paidpilot.netlify.app couldn't talk to the backend
because the hardcoded CORS allowlist only covered the legacy
mini-perplexity domain. Two changes:
1. Default allowlist now includes both Netlify sites and
127.0.0.1:5173 (CommandPalette and Clerk both occasionally route
through the IP address instead of localhost during dev).
2. CORS_ORIGINS env var, when set, replaces the default list. Lets
us add per-PR previews or future domains without code changes —
just bounce the dyno.
3. allow_origin_regex covers Netlify's per-deploy unique URLs
(`<hex>--paidpilot.netlify.app`, `deploy-preview-N--…`) which
can't be enumerated up front. Matches both paidpilot and the
legacy mini-perplexity for continuity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for paidpilot ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
…ng the UI
Before this change, a 401 from any auth-required endpoint was caught by
component-level `.catch(() => null)` blocks and silently turned into
\"first-time user\" branches. The most visible symptom was the
onboarding wizard locking on \"welcome to paidpilot · step 1 of 3\"
with empty fields: getBrandProfile 401'd (treated as \"no profile,
show onboarding\"), then listProjects + listCampaigns + the brand
profile hydration inside the wizard all 401'd too, leaving the user
with nothing to fill in and no path to recover. Could happen any time
Clerk's local JWT was valid but the backend rejected it (audience
mismatch, signing-key rotation, user deleted in Clerk, JWKS cache
miss across an origin migration).
Three-piece fix:
1. New `services/authEvents.ts` — minimal pub/sub module
(`notifyUnauthorized`, `subscribeUnauthorized`). Coalesces bursts
within 1.5 s so a fan-out of 10 simultaneous 401s only triggers one
sign-out + one redirect.
2. `services/api.ts` — every fetch wrapper that throws on non-OK now
calls `maybeNotifyUnauthorized(response, headers)` first, which
fires only when the request actually carried an Authorization
header (anonymous endpoints can legitimately 401 without the
session being dead). Covers jsonRequest (the bulk of project /
campaign / brand-profile traffic) plus the legacy `getBrandProfile`
/ `putBrandProfile` fetchers which were the direct breakpoints for
the onboarding gate.
3. `App.tsx` AuthedShell — subscribes to the event and fires
`clerk.signOut({ redirectUrl: '/sign-in' })`. Guarded by
`isSignedIn` so we don't loop on a stray anonymous 401 during
sign-out itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… first token
Before: qwq-32b (and any model that emits `</think>` without a matching
opener) streamed its chain-of-thought into the answer body, then jerked
all of it into the thinking disclosure the moment the closer arrived.
Visible flicker on every reasoning turn; the prose body would flash a
multi-paragraph "Okay, let me think about this..." then suddenly empty
and re-fill with the real answer.
Root cause: splitThinking only synthesised a `<think>` opener once
`</think>` was already in the buffer. During streaming there's no
closer yet, so the fall-through path treated the reasoning text as
answer body until the closer landed.
Fix: splitThinking now takes `{ isStreaming, modelId }`. When
`isStreaming` is true AND the model is on the known closer-only list
(matched by substring "qwq" so future revisions like qwq-32b-v2 keep
working) AND neither `<think>` nor `</think>` has appeared yet AND the
content is non-empty, we prepend a synthetic `<think>` at the start.
Once `</think>` arrives the existing extraction path takes over with
no further special-case. Stream finishes without ever emitting
`</think>` → no synthetic prepend (guard checks isStreaming), so we
never wrongly hide a final answer that happens to look like reasoning.
Plumbing: ChatMessage gains an optional `activeModelId` prop; ChatPage
passes `me?.preferred_chat_model` (the model that will produce this
turn). Historical turns rehydrated from the server are unaffected —
they're not streaming, so the synthetic-prepend branch never fires
on them; their existing `</think>` (if any) is extracted by the
pre-existing case-1 path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…int pass Three changes off one report. /docs is a real page now — eyebrow + h1 + sections (getting started, hierarchy, investigations + citations, plays, calculators, integrations, keyboard shortcuts, more help). Lazy-loaded so first paint is unaffected. The sidebar "docs" row was an external <a href="/docs"> that round-tripped through the SPA redirect; replaced with a React Router Link via new SidebarInternalLink. The "help" row was a static mailto: with no context. Replaced with SidebarHelpButton — reads the current location on click, fires a mailto with subject and a body that includes the page the user was on. Support gets "sent from /projects/.../c/.../investigations/..." instead of a blank inbox dump. ProjectDetailPage's identity card was out of sync with the rest of the app — 170 px tall with a 64 px filled project-coloured initial tile, no PageHeader. Replaced with the standard PageHeader pattern used by every other page (eyebrow, h1, subtitle, actions). The project colour is now a single 8 px whisper dot beside the title — respects the post-#93 restraint rule that project colour is an accent, never a dominant fill. Rename / archive overflow menu moves into PageHeader's `actions` slot. Inline-rename input keeps the display-h1 typography so the header doesn't twitch on flip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote /docs from a stub to a real operator's guide. Sections cover each user-facing surface in the app: getting started, projects + campaigns, brand profile, investigations, model selection (the call- out the user asked for), citation drawer, videos drawer, next-step chips, plays, slash menu + url paste, calculators + scenarios, creatives, integrations, navigation (sidebar / palette / sessions), keyboard shortcuts, settings + theme, account + sign-out, help. Model selection gets its own dedicated section with a card grid — gpt-oss-120b (default), gpt-oss-20b, qwq-32b (reasoning), qwen3-30b (structured), mistral-small (cross-check). Each card carries a characteristic chip (default / fast / reasoning / structured / generalist), the pitch, and a concrete "when to pick" line. The default card gets a brand-tinted border so it stands out without adding any new colour to the palette. Design language unchanged from the rest of the app — eyebrow + h2 + body, code chips, kbd chips for shortcuts. Added one new piece of chrome: a sticky table-of-contents rail on `lg:` and up with scroll- spy highlighting via IntersectionObserver (the rootMargin is tuned so the active row updates as you cross 96 px from the top of the viewport). Mobile / tablet falls back to a single column, no TOC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… break react-markdown@9 strips raw HTML by default, so models that emit `<br>` inside table cells (a common pattern because GFM tables don't support newlines in cells) end up rendering the literal text `<br>` on screen. Fix: normalise every `<br>`, `<br/>`, `<br />` (case- insensitive) in the prose body to a single PUA sentinel character. walkCitations splits text nodes on the sentinel and emits real <br/> elements — covers tables, lists, paragraphs uniformly without enabling raw-HTML passthrough (and its XSS surface). Normalisation runs on the visible prose body only; the thinking disclosure renders in a <pre> with whitespace-pre-wrap where actual newlines work natively and the PUA char would surface as a tofu glyph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nnel
Ships Slack as the first non-Meta active integration. Webhook-only
flow chosen over OAuth: zero app review, zero token rotation, ships
in one PR. User pastes an incoming-webhook URL from their Slack
workspace, we validate it structurally + with a live test POST,
encrypt it at rest, and persist to provider_connections.
What's reachable:
- /settings/integrations Slack card flips to active when connected.
Disconnected → gradient connect button opens a paste-URL form
with a link to Slack's "create an incoming webhook" docs.
- Connected card → manage panel reveals masked URL, "send test"
button, "disconnect" button.
- Inside any finished investigation turn, a new "share to slack"
action button sits beside copy + regenerate (only visible when
the user has Slack connected — no nag for the unconnected).
Posts a block-kit payload with the question, first ~320 chars
of the answer, citation count, active campaign name, and a deep
link back to the source investigation URL.
Backend (5 endpoints + 1 service):
- POST /integrations/slack/connect validate URL + fire test POST + persist
- GET /integrations/slack masked URL + connected_at for the manage panel
- DELETE /integrations/slack drop the row, cascade-safe
- POST /integrations/slack/test re-fire the test message for a stored webhook
- POST /messages/{id}/share-to-slack post a finished turn to the user's channel
- services/slack_webhook.py — httpx client, block-kit builders, URL validator, URL masker
Schema: migration 011 widens the provider_connections check
constraint to permit 'slack' alongside 'meta'/'google_ads'. The webhook
URL stores in access_token_ciphertext (same Fernet key as Meta — no
second secret), token_expires_at gets the far-future sentinel since
webhooks don't expire. Re-using the Meta table over a dedicated
webhook_endpoints table is intentional spike scope; we'll split it
out if Slack proves out + we add more webhook providers.
Frontend additions:
- api.ts: connectSlack, disconnectSlack, getSlackStatus, sendSlackTest,
shareMessageToSlack
- queries.ts: useIntegrationsStatus SWR hook for cross-component
feature gating
- IntegrationsPage: Slack-aware ProviderCard with SlackConnectForm +
ConnectedSlackBody, same expansion pattern as Meta
- ChatMessage: share-to-slack ActionBtn with idle/sending/sent/error
state machine; visible only when slackConnected prop is true
DB: migrations 009 (patched — partial-index predicate using now() was
rejected by current Postgres; index dropped, table created) + 011
applied to the dev Neon DB connected by .env.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oviders
The previous binary status pill (active / coming-soon) was misleading
once Slack went live — disconnected users saw "coming soon" on a card
that already had a working connect button right next to it. Adds a
third state:
- active — this user has it connected (emerald + blink, unchanged)
- live — shipped + reachable, this user hasn't connected yet
(brand-tinted pill, no dot)
- coming-soon — not yet built (neutral grey pill, unchanged)
Slack always renders 'live' or 'active'. Meta renders 'live' when the
deploy has the OAuth env, 'active' when this user has connected,
'coming-soon' otherwise (matches the existing "config needed" CTA).
Every other provider stays 'coming-soon' until it ships.
Summary strip at the top now shows "N active · N live · N coming soon"
instead of conflating the middle bucket. Counts are computed via the
same statusState() helper so card + strip stay in lockstep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o surface
Studio is the second active integration after Slack — a dedicated
creative-generation surface for every campaign. Generated images
land in the same campaign_creatives library that user uploads land
in, so the library stays the single source of truth and Studio is
purely the generator.
Backend:
- Cloudflare Workers AI client (services/image_gen.py) wrapping
@cf/black-forest-labs/flux-1-schnell. Handles both base64-in-JSON
and raw-PNG response shapes. Smoke-tested live: ~280 kB PNG in
one call against the production endpoint.
- Prompt composer that combines the user's text with structured
style + aspect-ratio modifiers (4 styles × 4 aspect ratios).
Aspect ratio is a prompt-level hint, not a render dimension —
Cloudflare's hosted Flux renders at native ~1024×1024 and users
crop downstream.
- New endpoint POST /projects/:p/campaigns/:c/creatives/generate.
For each of N variants (default 3, max 4): generate → grab
presigned URL → PUT bytes from the backend → insert
campaign_creatives row with prompt + ai_model metadata. Partial-
success path returns whatever variants completed before an error
+ logs the residue.
- Server-side multipart-aware uploader (_upload_bytes_to_storage)
so the same path works against R2 (PUT-raw) and UploadThing
(PUT-multipart) without touching the storage protocol.
Schema: migration 012 adds two nullable columns to
campaign_creatives — `prompt` and `ai_model`. Non-null `prompt`
flags a row as generated; uploads stay NULL. Applied to dev Neon.
Frontend:
- New page /projects/:p/c/:c/studio. Large prompt composer,
aspect-ratio chip selector (1:1 / 9:16 / 1.91:1 / 4:5), style
chip selector (photo / illustration / minimal / 3d), single
gradient "generate" CTA. Tiles flow in below with a "just now"
chip on the latest batch.
- Sidebar gains a studio entry (Wand2 icon) between calculators
and projects. Highlights on /studio paths via the same
matchPrefixes pattern the other tool rows use.
- CampaignHomePage tile grid grows from 4-up to 5-up to surface
Studio at the campaign entry point.
- api.ts: generateCreatives() + Creative.prompt + Creative.ai_model.
Demo path: open any campaign → studio → "minimalist holiday gift
box on a warm cream background, soft light" → generate → 3 variants
land in ~15s. Tiles are saved to /creatives automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + campaign
Studio's main friction is staring at a blank textarea wondering what
makes a good Flux prompt. Adds a brand-tinted "suggest from campaign"
button next to the brief label that pulls a draft from llama-3.2-3b
grounded in the campaign's brand profile + objective + date window.
Two modes triggered by the same button:
- empty textarea → drafts from context cold
- non-empty textarea → refines the user's intent against the same
context. Button label flips to "refine from campaign" to set the
expectation.
Backend:
- CloudflareChat.suggest_image_prompt(brand_block, campaign_block,
hint?) → str. Uses the same SHORT_CALL_MAX_TOKENS + llama-3.2-3b
path the next-step generator runs on; strips markdown emphasis,
"Prompt:" preambles, and collapses multi-line output to a single
paragraph (image-gen prompts are one paragraph).
- POST /projects/:p/campaigns/:c/creatives/suggest-prompt. Loads
the brand profile, renders the same brand + campaign blocks that
grounds investigations, asks the LLM. 502 on Cloudflare failure
(non-blocking — UI lets the user write their own).
Frontend:
- api.ts: suggestStudioPrompt(projectId, campaignId, hint, token).
- StudioPage: handleSuggest() with suggesting state; button disables
during both suggesting + generating to prevent overlapping calls.
Spinner + label swap during in-flight; error surfaces in the
same banner the generate flow uses.
Smoke against a real Cloudflare account with a synthesized
Northwind Outfitters brand + Q4 holiday-bundle campaign returned
on-brand multi-sentence Flux-ready prompts ("a snow-covered cabin's
wooden exterior … a gift set wrapped in Northwind Outfitters'
earthy-toned paper, rests on a nearby wooden bench …"). Renders
work straight through the existing generate endpoint without edits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The honest answer to "is brand context baked into Flux today" was
'only via the suggest step'. The generate call sent the user's
textarea + style/aspect modifiers, no brand. Adds an explicit
context-bake toggle so the user decides — and defaults it ON so the
out-of-the-box experience is on-brand.
Backend:
- StudioGenerateRequest.bake_context: bool (default true). When
true, server loads the campaign's brand profile + reads the
campaign objective, distills a tight one-line context phrase
(company name, voice ~100 chars, objective ~100 chars) and
appends it as a final modifier to the Flux prompt. When false,
server skips the brand-profile query entirely — saves a round-
trip on every generate when the user wants a bare prompt.
- StudioGenerateResponse gains composed_prompt + context_baked so
the UI can surface "this is exactly what we sent to Flux".
context_baked is the AND of "user requested it" and "we actually
had a brand profile to honor".
- services/image_gen.distill_brand_context(profile, campaign) does
the distillation. compose_prompt() renames its brand_voice param
to brand_context to reflect the richer payload.
Frontend UX pass:
- Context-bake toggle in the same chip register as aspect + style.
Brand-tinted ON state, neutral OFF, descriptive helper line
underneath ("distilled brand voice and campaign objective are
added to every generation" vs "only your prompt + style + aspect
go to flux").
- Visual aspect-ratio glyph (a small rounded rect of the actual
proportions) inside each aspect chip — communicates portrait /
landscape / square faster than the numeric ratio alone.
- Composed-prompt disclosure after each batch: "sent to flux"
summary with the full final prompt + a "context baked" pill +
copy button. Defaults closed; expanding shows exactly what hit
the model.
- Per-tile hover actions: copy prompt, re-use prompt (drops it
into the composer + scrolls to top), download. Same overlay
pattern as the chat copy/regenerate row.
- Cmd / Ctrl + Enter in the textarea generates. Hint in the
char-count line.
- Elapsed-time counter on the gradient button while rendering
("rendering · 7s") so the 8-15s Flux call doesn't feel frozen.
- Empty-state copy nudges toward the suggest-from-campaign flow as
the easier path for first-time use.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each generate run was writing three rows to campaign_creatives
unconditionally. Twenty minutes of iteration = the library filling
with 60 rejected variants the user never wanted to keep. Splits the
contract:
- generate uploads N variants to storage and returns the preview
metadata. No DB rows written.
- save-from-studio takes one or more preview refs and persists them
as campaign_creatives rows (storage-key prefix check blocks
cross-campaign smuggling).
- discard-studio best-effort deletes one or more storage keys
without persisting anything.
Backend:
- StudioGenerateOut.previews replaces .creatives. Each carries
storage_key + download_url + size_bytes + mime_type + filename +
the original prompt + ai_model, so save can be a single round-
trip without re-deriving anything.
- Two new endpoints: save-from-studio (201 with the persisted rows)
and discard-studio (204, fire-and-forget). Both gate on
_require_campaign and validate the storage_key prefix.
- Generate now skips db.commit() entirely.
Frontend:
- StudioPage gains a "review · pick what to keep" zone between the
composer and the recents grid. Renders preview tiles with a
persistent save / discard footer + the same hover overlay
(copy / re-use / download) GeneratedTile has.
- "save all" / "discard all" bulk actions in the zone header when
>1 preview exists.
- Save promotes to the recents grid + drops from previews. Discard
drops from previews + fires the storage-cleanup endpoint.
- Starting a new generate run discards the previous batch's
unsaved previews (one-batch-at-a-time review experience).
- Shimmer tiles now live in the previews zone during in-flight
generation instead of the recents zone.
Storage cleanup for orphaned previews (user navigates away without
saving or discarding) is deferred to a future cron — bytes are cheap
and the spike doesn't need it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…drop fake legal
Inventory pass:
LandingNav.tsx
- DROP: 'Changelog' (#changelog) — no anchor on the page
- FIX: 'Docs' was #docs (dead anchor) → now React Router Link to
/docs (the real page we built recently)
- All in-page anchors checked against actual id="..." attributes:
#features, #plays, #calculators, #pricing all resolve.
LandingFooter.tsx
- Product column: dropped 'Changelog' (dead anchor); kept Plays /
Calculators / Pricing (all resolve).
- Resources column rebuilt: 'Docs' now routes to /docs through
SPA; added 'GitHub' (external, new-tab); 'Contact' switched from
a personal address (paritosh@parspec.io) to the brand mailbox
hello@paidpilot.app — matches every other mailto in the app.
Dropped 'Brand guide' and 'Status' (no asset / no status page).
- Legal column DROPPED entirely. Privacy / Terms / Security / DPA
were all dead anchors. Shipping fake legal pages is worse than
not shipping them (claims could be wrong, exposure). When the
product takes real traffic these get added back as a real /legal
surface — flagged in the file comment.
- Grid drops from 4 cols to 3 (logo + 2 link columns).
- Bottom strip absorbs an inline "questions? hello@paidpilot.app"
so the contact entry point stays prominent even without a column.
Mechanically: introduced a `LinkKind` union ('anchor' | 'route' |
'external') threaded through both files. SPA links use React Router's
Link; mailto / external links get target=_blank + rel=noopener.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both endpoints were enforcing a campaigns/<id>/studio/ prefix on the storage_key. R2 honours this (we generate the key, R2 stores it verbatim) but UploadThing rewrites incoming keys to its own opaque short identifiers — every UploadThing-backed deploy was hitting "storage_key 'Z5pp...' does not belong to this campaign's studio" on the first save click. Gate the prefix check on `storage.name == 'r2'`. Matches the threat-model trade-off the user-upload path already makes in confirm_creative_upload: when the provider's keys are server- generated the prefix is enforceable defense-in-depth; when they're provider-generated opaque blobs we lean on the route's _require_campaign auth gate (a malicious client can only attach their own keys to their own campaigns, no cross-campaign escalation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops a single canonical Stitch prompt that pins every functional piece the current StudioPage carries (composer, suggest-from- campaign, aspect / style / context-bake chips, sent-to-flux disclosure, preview tiles with save/discard, recent generations grid) plus a concrete redesign direction (split timeline — saved and unsaved generations live together in a chronological feed by batch). Includes three alternate redesign directions (sidebar-driven / canvas / split-pane) that can be swapped into the DIRECTION block without rewriting the rest, so the same prompt scaffolding can explore multiple UI shapes against Stitch in parallel. Sits alongside STITCH_PROMPTS_H.md from the projects + campaigns redesign round — same brand language and restraint rules, named in the file's intro for cross-reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eative
Studio redesigned per docs/product/STITCH_PROMPTS_STUDIO.md (direction
1: split timeline). What changes vs the prior stacked layout:
- sticky composer at the top, single horizontal row (chips + toggle
+ generate button all inline; textarea collapses to 2 visible
rows). No more multi-section composer with separate eyebrows for
aspect / style / context.
- "review · pick what to keep" + "recent generations" sections
collapse into one chronological timeline. Each row IS a batch:
left rail (timestamp + prompt excerpt + context-baked chip +
re-run + sent-to-flux disclosure) + right area (3 tiles in a
grid).
- Tile actions move from a persistent footer to corner icon
buttons (bookmark + close in top-right). Saved tiles flip to a
SAVED chip in the same corner. Preview tiles get a brand-violet
border + glow; saved tiles get an emerald border. Hover overlay
on the bottom strip carries copy-prompt + download.
- Active-batch saved tiles stay in place (no longer remove
themselves from the row on save) — they flip visual treatment
inline so the row shows the natural mix of unsaved siblings
next to a saved keeper, matching the Stitch screenshot.
- Historic batches are grouped from the persisted Creative rows by
(prompt, 60s window) and rendered newest-first. To support this
de-duplication without showing a tile twice (once in the active
batch row when saved, once in the historic timeline), we plumb
storage_key through to the Creative response and use it as the
de-dupe key.
API change (additive):
- CreativeOut + the frontend Creative type now carry `storage_key`.
Opaque pointer; no security implications — presigning rules
enforce access server-side regardless.
All existing handlers and state survive verbatim — only the layout
+ chrome change. Backend untouched aside from CreativeOut field add.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions + full-tile hover overlay
Side-by-side diff against the Stitch mock surfaced three places where
the deployed Studio drifted from the source design:
- Tiles used rounded-2xl (16px); Stitch uses rounded-xl (12px).
Tighter visual register, less candy.
- Preview tiles had a brand-violet border + glow; Stitch keeps the
border identical to saved tiles and distinguishes purely via the
corner action icons. Removes one source of visual noise.
- Action buttons were square pills in a bottom strip; Stitch uses
circular (rounded-full) pills in the top-right with a stronger
semantic tint (bg-color/20 at rest, full bg-color on hover).
- Hover overlay was a gradient bottom strip; Stitch uses a full-tile
dimmed scrim (bg-black/55) with the copy + download buttons
stacked centred. Reads more "select me" than "tweak me".
Image gets opacity-90 at rest + a subtle group-hover:scale-[1.05]
zoom — matches Stitch's group-hover:scale-110 (toned down slightly so
the edges don't appear cropped on smaller viewports).
Composer container drops to rounded-xl + shadow-2xl + surface-
raised/95 (was rounded-2xl + shadow-card + /60). Same backdrop blur,
denser visual.
Renamed CornerIconBtn → CircleActionBtn to reflect the new shape.
All handlers unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d + campaign
Mirrors Studio's "suggest from campaign" affordance into the
investigation composer. The user types a rough draft ("how do i
scale ads"), clicks the brand-tinted "improve" button next to the
URL action, gets a sharpened one-or-two-sentence version that names
the specific channels + metrics + time horizon implied by their
brand profile and active campaign. The textarea content swaps in
place so they can edit-and-send or send as-is.
Backend:
- CloudflareChat.improve_question(brand_block, campaign_block,
draft) — uses the same fast/cheap llama-3.2-3b path as
next-step chips + Studio's prompt suggester. Output post-
processing strips "Question:" preambles, surrounding quotes,
markdown emphasis, and collapses any multi-line drift into a
single line (chat composer is one-line by convention).
- POST /projects/:p/campaigns/:c/questions/improve — loads the
brand profile, renders the same brand + campaign blocks the
investigation system prompt uses for grounding, returns the
refined question. 502 on cloudflare error (non-blocking — UI
surfaces the message but lets the user send their draft).
- Live-smoke against a Northwind Outfitters synthesised brand
turned "how do i scale ads" into "what are the most effective
Meta and Klaviyo ad strategies to achieve a CAC of $45 and
ROAS of 3.0 within the Q4 Holiday Gift Bundles campaign…"
Frontend:
- SearchBar gains an optional onImprovePrompt callback. When
provided AND the textarea has >= 4 chars, a brand-tinted
"improve" button appears in the bottom toolbar next to the URL
action. Click → callback fires → returned text replaces the
textarea content + caret moves to the end so the user can
immediately tweak. Errors surface as a small inline hint with
a dismiss link.
- ChatPage wires the callback to call improveInvestigationPrompt
with the active projectId + campaignId. The button only
renders when a real campaign scope exists — outside a campaign
the improver has nothing to ground against, so we hide rather
than nag.
- api.ts: improveInvestigationPrompt() client fn matching the
backend endpoint shape.
Docs:
- New /docs section "improve prompt" right after investigations.
Explains what the improver does (names channels, names metrics,
keeps it one-or-two sentences, preserves intent), when the
button appears (>= 4 chars + inside a campaign scope), and
that it runs on the small fast model so it doesn't burn
answer-quality budget.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o segmented tray
Two specific drifts the live deploy showed against the Stitch mock:
1. Aspect chips were inside a segmented tray (inline-flex inside a
bordered container with internal padding). Stitch's chip group is
individual pills. Drops the tray; each aspect chip is now its own
border-pill, identical shape to the style chips.
2. Active chip used `bg-brand/15 text-brand` — a 15%-opacity wash that
barely registered as "active" on a dark surface. Stitch uses a
solid primary fill (`bg-primary text-on-primary`). Switching to
`border-brand/60 bg-brand/25 text-fg font-medium` — heavier
border, denser fill, font-weight bump so the active state reads
from across the room.
Same treatment applied to the style chips so the two groups stay in
sync visually. Added a thin vertical divider between the aspect group
and the style group — without the tray boundary, the eye needs
something to anchor on.
Hint labels ("meta", "reels", etc.) now appear at xl+ instead of lg+
since the pill-without-tray shape is slightly wider per chip.
No netlify deploy in this commit — CI/CD handles it on push.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…chips
Per design feedback. Single-direction scale-up across the composer:
- outer padding p-4 → p-6 (16 → 24)
- vertical row gap space-y-3 → space-y-4
- textarea rows 2 → 4, font text-body-base → text-body-lg,
line-height bumped (leading-relaxed)
- all chips h-7 → h-9, px-2 → px-3, text-[11px] → text-body-sm
- chip internal gap gap-1.5 → gap-2 to absorb the taller bodies
- aspect glyphs grown ~22%: 1:1 9→11, 9:16 6×10→8×13, etc.
- divider between aspect + style groups: w-px h-4 mx-1 →
h-5 mx-1.5 to match the new chip height
- toggle switch glyph 7×3.5 → 9×4, dot 2.5 → 3, label text-[10px]
→ text-[11px] with a 3.5 Target icon
- generate button h-8 px-4 → h-10 px-5, text-body-sm →
text-body-base, icon 3.5 → 4
No new behavior. Composer occupies more vertical real estate but
the chip group still fits on one row at 1024+; below that it wraps
the same way it did before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ster matches Studio
Two pieces:
(1) docs/product/STITCH_PROMPTS_DOCS.md
Stitch prompt for the /docs redesign. Direction 1 is a three-
column developer-docs layout (left sidebar with grouped TOC,
center content with a hero search, right rail with per-section
mini-TOC). Three alternates (hero+tabs / card-grid landing /
vertical timeline) included so the same prompt scaffolding can
explore other shapes against Stitch in parallel.
Cross-references the existing STITCH_PROMPTS_STUDIO.md and
STITCH_PROMPTS_H.md so all three live as a coherent design-
prompt set with shared brand language and restraint rules.
Includes a "sections that grew since the last revision" note
flagging improve-prompt and studio as new sections the mock
needs to cover.
(2) frontend/src/components/SearchBar.tsx
Scale up the chat composer to match the Studio composer's
register so both prompt surfaces feel like the same product:
- container background: bg-surface → bg-surface-raised/40 +
backdrop-blur, shadow-2xl (matches Studio's elevated card)
- focus state: border-border-strong + shadow-card →
border-brand/40 + brand-violet glow (signals "primary input")
- textarea: text-body-base → text-body-lg, leading-[1.5] →
leading-relaxed; px-4 → px-5; pt-3 → pt-4
- chip row: px-3 pt-3 → px-5 pt-4 (more breathing room
around active-play + URL chips)
- URL button: text-body-md ghost → text-body-sm bordered pill,
h-7 → h-8, brand-tinted when active matching the improve btn
- improve button: text-[11px] → text-body-sm, h-7 → h-8,
bg-brand/5 → bg-brand/15 (denser active hint), hover /15 → /25
- send button: square 28px icon → 36px pill with "send" label
+ arrow icon (matches Studio's labelled "generate" button)
- bottom toolbar padding: px-2 pb-2 → px-3 pb-3 pt-1
All behaviour preserved verbatim — slash menu, URL paste detection,
chip clearing, Cmd+Enter submit, the improve-prompt callback flow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The centered hover overlay (PROMPT + DOWNLOAD buttons) covered the
entire tile via `absolute inset-0` — even though its visible content
sat in the middle of the tile, the overlay div itself caught every
click destined for the corner save / discard buttons that lived
underneath it in DOM order.
Two changes:
1. Corner action block now sits at z-10 — defends against any
future overlay re-introduction. Same guard on the SAVED chip in
SavedTile.
2. Preview tile state determines which hover surface renders:
- unsaved → bottom-strip overlay (gradient-to-t with copy +
download buttons hugging the bottom 1/3 of the tile, leaving
the top-right corner clear for save / discard). Click on
the image still falls through to nothing — save and discard
are now the only top-right affordances.
- saved → full-tile dimmed overlay with the same copy +
download buttons centered. No save / discard needed.
This matches the Stitch mock convention (preview tiles in Stitch
never carry the centered overlay; only saved tiles do).
Net effect: hovering an unsaved preview tile shows
copy + download in a bottom strip AND save + discard in the top-
right corner — both independently clickable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctions, floating TOC
Drops the previous three-column dev-docs layout (sticky right-rail
TOC + scroll-spy left-bar highlight) and rebuilds /docs as the
vertical-timeline direction from STITCH_PROMPTS_DOCS.md / direction D.
Structural changes:
- max-w-[800px] centered single-column container; px-5 / py-8 hero
- hero block: "official walkthrough" eyebrow + "how paidpilot
works." h1 (48px) + subtitle + search input (search input rounded-
full with Cmd+K kbd hint; filters sections by title/eyebrow
match)
- each section renders as a "step N of M" labelled block: centered
eyebrow + h2, optional aspect-video screenshot placeholder
(sparse — only the marquee sections that benefit visually carry
one), the existing body content unchanged, "back to top" link
centered at the bottom
- step numbers are ABSOLUTE — search filtering hides sections but
doesn't renumber the remaining ones
- section spacing: space-y-32 (128px) — matches Stitch's "tour"
cadence
- new "ready to start?" CTA card before the footer, with two
actions: "create a project" (gradient CTA → /projects) + "email
the team"
- last-updated stamp at the bottom
Model section gets a special showcase treatment: gpt-oss-120b is a
featured full-width card with a 2px brand-violet border + a "Default"
badge bleeding into the top-right corner, a "performance powerhouse"
h3, and an italic "when to pick" quote. siblings render in a 2-up
grid below with a more compact treatment.
A new floating "guide nav" pill bottom-right replaces the sticky
right-rail TOC. Click to open a category-grouped popover (start
here / building blocks / conversations / tools / connections /
navigation / settings). Esc + outside-click dismiss; each link
auto-closes on selection.
Section data changes:
- Section type gains `group: SectionGroup` (drives the floating
nav grouping) and optional `placeholder: string` (aspect-video
label for marquee sections)
- New section: STUDIO. Was missing from the previous docs version
even though the surface had shipped. Covers the composer's
suggest / aspect / style / context-bake affordances, the
review-then-save timeline, save-to-library behavior.
Drops: FooterMeta, useScrollSpy. PageHeader is replaced by the
inline hero block.
Total section count: 19 → 20 (the magic "step N of 20" number that
Stitch's mock referenced).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The HTML <a download="..."> attribute is spec'd to be IGNORED on
cross-origin links "as a security precaution". Our uploads land on
UploadThing's CDN with Content-Disposition: inline, so a click on
<a href={presignedUrl} download={name}> opened the image in a new tab
instead of downloading it. Every download button in Studio + Creatives
was hitting this — the user reported it explicitly on Studio's
"download" affordance.
Adds utils/download.ts with a `downloadFile(url, filename)` helper
that fetches the bytes (UploadThing's CDN allows CORS GET), wraps them
in a same-origin blob URL, and triggers a programmatic <a download>
click on a temporary in-memory anchor. The browser respects download
on same-origin blob URLs even when the original asset is cross-
origin.
Trade-off: the full file streams through the browser tab before save,
no streamed-to-disk save dialog mid-fetch. Fine at our file sizes
(generated images are ~280 kB).
Errors fall back to opening the URL in a new tab so the user still
sees the file when CORS is blocked or the fetch fails. Logged but
not surfaced — the fallback IS the previous broken-ish behaviour, so
this can only improve things.
Callsites updated:
- StudioPage PreviewTile unsaved hover-strip download button
- StudioPage PreviewTile saved overlay download button
- StudioPage SavedTile overlay download button
- CreativesPage handleDownload() (was using window.open)
CreativesPage's preview-on-click is unchanged — it still uses window
.open() because "open in new tab" is the right semantic for the
preview action; only the explicit Download button needs the force-
download treatment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Studio image generation was sharing the same Cloudflare account as
chat text generation. Flux burns ~50-100× the neurons per call, so
the 10,000 neurons/day free tier was getting eaten by Studio and
starving text streaming — the user reported "Streaming failed (429):
you have used up your daily free allocation of 10,000 neurons" on
the /answer path.
Split the credential paths and added rotation so additional accounts
can be stacked for image generation without affecting text:
Env-var precedence inside `_credential_pool()`:
Slot 1: CLOUDFLARE_IMAGE_ACCOUNT_ID + CLOUDFLARE_IMAGE_API_KEY
Slot 2: CLOUDFLARE_IMAGE_ACCOUNT_ID_2 + CLOUDFLARE_IMAGE_API_KEY_2
Slot 3: CLOUDFLARE_IMAGE_ACCOUNT_ID_3 + CLOUDFLARE_IMAGE_API_KEY_3
… up to _MAX_IMAGE_SLOTS (5)
Fallback: CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_API_KEY
(only when NO image-specific slot is configured — once
the operator adds image slots, text stays isolated.)
Walk order on `generate()`:
Try slot 1. If `_is_quota_error()` returns true (status 429/401/403
or body contains "daily free allocation" / "neurons"), log + move
on to slot 2, then slot 3, etc. Other errors raise immediately
so the user sees the specific message (content-policy reject,
malformed prompt) instead of burning the whole pool.
When all slots return quota errors, raise
`ImageGenQuotaExhaustedError` (new subclass of ImageGenError).
The endpoint translates this to HTTP 429 with a tailored message
pointing the operator at "add another CLOUDFLARE_IMAGE_API_KEY_N
or upgrade one of the existing accounts" — distinct from the
generic 502 raised by transient failures.
To stop text from getting starved today:
set CLOUDFLARE_IMAGE_ACCOUNT_ID + CLOUDFLARE_IMAGE_API_KEY (and
_2, _3 …) to the new image-only Cloudflare accounts. The original
CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_API_KEY then serve text only.
CloudflareChat (text generation) was not modified — it continues to
read CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_API_KEY from
constants.constants. Backward compatible: deploys without
image-specific creds keep working with the legacy single-account
behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/ improve
Reported error: chat streaming hit Cloudflare's daily-free-allocation
429 and the UI showed the raw JSON payload ("AiError: AiError: you
have used up your daily free allocation of 10,000 neurons …"). Not
useful to a marketer — they need to know when to come back and what
to do now.
Adds:
utils/errors.ts — describeError(unknown) classifies any caught
error into { kind: 'quota-exhausted' | 'generic', headline, detail,
raw }. Quota detection looks for the literal phrases the Workers
AI API uses ("daily free allocation", "neurons", code 4006, "workers
paid plan") plus the 429 status code. Calculates hours until the
next 00:00 UTC reset so the detail line says "resets at 00:00 UTC
(in ~7h)".
components/ErrorBanner.tsx — drop-in component that takes an
`error: unknown`. Renders an amber clock-icon treatment for quota
errors with the "ai quota" headline + the reset-time detail line +
a "show raw error" disclosure (collapsed by default; lets the
operator debug without polluting the default surface). Falls
back to the existing rose alert-circle treatment for generic
errors.
Callsites updated:
- ChatPage error rendering — was a raw <div>{error}</div>. Now
renders <ErrorBanner error={error} onDismiss={() => setError(null)} />
- StudioPage composer error — was an inline <p>{err}</p>
- SearchBar improve-prompt error — was the same inline rose treatment
All three callsites end up showing the same "you've used today's ai
quota. resets at 00:00 UTC …" banner when the quota response is
detected, regardless of which endpoint hit it (chat stream / Studio
generate / improve-prompt / next-step generator).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User report: main Cloudflare account hit "you have used up your daily
free allocation of 10,000 neurons" on the chat-streaming path even
though additional CLOUDFLARE_IMAGE_API_KEY_N accounts had budget
remaining. The image-gen path rotation we shipped two commits ago was
isolated to image gen — text streaming kept hammering only the
primary account.
Fix: extend the rotation pool to text. New
`text_credential_pool(primary_account, primary_key)` helper in
image_gen.py returns the primary text creds followed by every
CLOUDFLARE_IMAGE_* slot in order (deduped). CloudflareChat's
`_call_for_prompt` AND `stream_answer` both walk this pool now,
falling through to the next slot on quota / auth errors detected via
the shared `is_quota_error()` helper.
Asymmetric on purpose: text falls through to image, but the image
path does NOT fall through to text. Image gen burns ~50-100x the
neurons per call, so letting it spill into the text budget would
starve chat in minutes. Letting text spill into image is safe — a
chat call is cheap and there's always headroom.
Streaming-path nuance: rotation only kicks in BEFORE any tokens have
been emitted (i.e. on the initial HTTP status check). Once a 200
response starts streaming SSE chunks, errors mid-stream still abort
the turn the same way they did pre-rotation. We can't recover a
half-rendered answer.
Renamed `_is_quota_error` → `is_quota_error` (public-safe) so
language_model.py can import it cleanly; kept the underscore name
as an alias for backward compat.
Logs each rotation step ("chat slot N/M quota — trying next", "chat
call succeeded on slot N/M (1..N-1 exhausted)") so the operator can
see which slots are taking traffic from the Render logs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /docs route was inside the SignedIn branch of the router, so a
hackathon judge / new visitor hitting paidpilot.netlify.app/docs got
bounced to the landing page. Treat docs as a public surface — same
content, different chrome:
- Add a /docs route to the SignedOut branch wrapped in a new
PublicDocsShell component. The shell renders a fixed 56px top bar
with the PaidPilot logo + a brand-gradient "sign in" CTA (and a
"back to home" link on sm+). No sidebar, no campaign switcher —
those surfaces aren't reachable by anonymous users.
- Public shell forces dark mode (anonymous visitors haven't set a
preference and the operator-tool dark register is the brand).
- DocsPage now reads useAuth().isSignedIn and adapts the final CTA:
signed-in → button "create a project" → /projects
signed-out → button "sign in to start" → /sign-in
Subtitle copy adapts too ("build your first campaign" vs "try
paidpilot — free in beta"). The mailto stays the same.
- useAuth() works under Clerk's provider regardless of SignedIn /
SignedOut gates — same hook returns isSignedIn: false for
anonymous users instead of throwing.
Internal SPA links inside the docs body that point at signed-in
surfaces (/projects, /settings/integrations, etc.) are kept as-is —
clicking them while signed-out bounces through the SignedOut catchall
to / which renders the landing page. Not ideal but acceptable; we
could add a redirect-back hint later if a user gets confused.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rotation refactor (text quota falls through to image accounts) used
`logger.warning(...)` and `logger.info(...)` in both _call_for_prompt
and stream_answer but I never imported logging or defined a module-
level logger. Every quota retry hit `NameError: name 'logger' is not
defined`, bubbling up as the user-visible "Streaming failed: name
'logger' is not defined" on chat and silently breaking the search
reranker (it falls back to static authority on any exception, which
masked the bug there).
Visible from the Render logs:
File "language_model.py", line 538, in stream_answer
logger.warning(...)
NameError: name 'logger' is not defined
Add `import logging` + `logger = logging.getLogger(__name__)` at the
top of the module. Standard pattern, matches every other backend
service file. Smoke-tested the app boot post-fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s in public/docs/ Section interface gains two optional fields: - imageSrc: path under the public/ tree (e.g. '/docs/studio.png') - imageAlt: alt text override (defaults to section.title) When imageSrc is set the aspect-video block renders the real image with object-cover; when unset it falls back to the existing hatched placeholder treatment with the placeholder string. Both fields can cohabit on a section — placeholder stays as a fallback while the operator gets around to capturing the actual screenshot. Adds frontend/public/docs/README.md with the drop-in workflow: 1. capture 16:9 screenshot 2. save as frontend/public/docs/<section-id>.png 3. edit the section in buildSections() — add imageSrc: '/docs/x.png' Vite serves public/ unchanged so no bundling / build step is needed — drop the file, refresh, image appears. No screenshots committed in this change — the placeholder hatched blocks still render until an operator drops files in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aigns, investigations, studio Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Pluggable provider layer for web search so we can A/B Anakin's single-call search+content-extraction API against the existing Google CSE + per-result BS4 pipeline. Anakin runs first; Google CSE is the safety net on any failure or empty result set.
What lands
app/services/search_providers/— new package withSearchProviderProtocol and four implementations:GoogleCSEProvider— thin wrapper around the existingsearch_googlepath. No behaviour change.AnakinProvider—POST https://api.anakin.io/v1/searchwithX-API-Keyheader. Defensive response parser tries priority-ordered field names (results|data|items→title|name,url|link,content|extracted_content|text|...) because the public docs hide the exact shape behind login. Logs top-level payload keys when a query returns zero results so we know exactly what to adjust once we have real responses.FallbackProvider— wraps an ordered list of providers, returns the first non-empty result set. Transparently swallowsSearchProviderErrorso the chain degrades to the next provider rather than 500ing the answer flow.search_service.perform_searchrewired throughget_provider(). Default is unchanged Google CSE; settingANAKIN_API_KEY(orWEB_SEARCH_PROVIDER=anakin_then_google) switches to the Anakin → Google chain.backend/scripts/eval_search_providers.py— CLI that runs N queries through every requested provider, writes a JSON report (latency, result count, extracted-char totals, urls/titles/samples) and prints a per-provider summary.backend/scripts/eval_queries.txt— 8 starter queries spanning the operator-question taxonomy (conceptual / tactical / freshness-sensitive / primary-source).Why fallback rather than swap
Anakin is unproven for our query shape and the docs don't expose the response schema. A fallback chain means the worst case is "search behaves exactly as it does today on main" — we ship the integration with confidence and let real eval data drive the decision rather than docs marketing.
Smoke test
Ran the harness against the live endpoint with an invalid key:
Endpoint reachable ✓, request body shape accepted ✓ (would have been 400 if our JSON was malformed from Anakin's perspective), error path logs
response.text[:500]per the repo's gotcha convention ✓.Not yet
Test plan
/searchtraffic still hits Google CSE and behaves identically to mainANAKIN_API_KEYonly: chain auto-selects Anakin → Google, log line shows which provider returned resultsWEB_SEARCH_PROVIDER=anakin: Anakin-only; Google CSE never calledDecision after eval
If Anakin is clearly better on operator queries → flip the default and keep Google as fallback only. If roughly tied → keep the chain (cheaper at our volume, opens the door to Anakin Wire for competitive intel later). If clearly worse → revert the smart default, keep the spike branch on ice.
🤖 Generated with Claude Code