Minerva — Progress Tracker

For AI agents: Read this file first to understand where the project is. Update it after every meaningful task or group of tasks.

Last updated: 2026-02-15 (Session 13 - Read-triggered camera screenshot) Branch: 001-minerva-mvp Overall status: Phases 1-8 COMPLETE (T001-T067). Session 13: read-triggered camera screenshot; removed paper detection and scan button.

Session 13: Read-triggered camera screenshot

Goal: Conversational homework help — when the user stops talking (push-to-talk release) and their words contain "read", send a screenshot of the camera with the message so the model can see the homework. Remove paper-detection and scan-button UI to minimize friction and latency.

Changes:

Removed: Paper detection loop and "Paper detected" overlay in FloatingVideoOverlay; Scan button in overlay (gallery) and in BottomControlBar; handleScan and onScan wiring from session page.
Added: In useSession, when onUserMessage(text) fires, if text contains "read" (case-insensitive) and the camera is on, capture one frame via captureFrame(userCamera.videoRef.current) and call brain.handleStudentMessage(text, result); otherwise brain.handleStudentMessage(text).

Files changed:

File	Change
`src/app/student/session/page.tsx`	Removed captureFrame import, handleScan, onScan props
`src/components/session/FloatingVideoOverlay.tsx`	Removed onScan prop, DocumentOverlay, ScanFlash, detection state/loop, handleScan, Scan button
`src/components/session/BottomControlBar.tsx`	Removed onScan prop and scan button
`src/hooks/useSession.ts`	Import captureFrame; in onUserMessage, if "read" then capture frame and pass imageData

Note: src/lib/camera/detector.ts is now unused (left in repo for possible future use).

Session 12: Tool Calling Speech Fix

Problem: When Claude called tools (setContentMode, executeCanvasCommands), it would sometimes generate ONLY tool calls without any speech text. The avatar would remain silent.

Root Cause: Two issues:

The prompt didn't explicitly require speech with every response
The code was checking for a non-existent step-finish event instead of text-end

Fixes Applied:

1. Fixed stream event handling in `client.ts`

Removed check for non-existent step-finish event
Speech is now emitted when we see a tool-call event (before yielding the tool)
Added text-end handler for text-only responses
Fallback still catches edge cases

2. Updated system prompt in `prompts.ts`

Added CRITICAL: ALWAYS GENERATE SPEECH TEXT section
Explicitly tells Claude: "Never call tools without also generating speech"
Shows example response flow: generate speech FIRST, then call tools

Files Changed

File	Change
`src/lib/claude/client.ts`	Fixed multi-step stream handling for tools with `execute()`
`src/lib/claude/prompts.ts`	Added mandatory speech requirement to prompt

Multi-Step Tool Execution Flow (NEW)

When a tool has an execute() function (like getExistingVideos), the AI SDK handles it automatically:

Step 1:
  start-step → speechBuffer reset
  text-delta events → Claude's intro speech
  text-end → speech emitted
  tool-call → getExistingVideos
  tool-result → AI SDK executes, returns result

Step 2 (automatic continuation):
  start-step → speechBuffer reset
  text-delta events → Claude's follow-up based on tool result
  text-end → speech emitted

done

Key changes:

Added start-step handler to reset speechBuffer for each step
Removed speechEmitted flag - now emit speech per-step, not once
text-end emits speech immediately (not waiting for tool calls)
Safety: also emit speech on tool-call if text-end didn't fire

Session 11: Merge — SSE Streaming + Design System + Manim Videos

Merged two branches:

anton/latency-test — SSE streaming for faster time-to-first-word
HEAD — Sandbox token optimization + Manim video integration + Design system

Key Changes After Merge

Architecture:

SSE streaming pipeline: speech arrives early (~1s), avatar starts talking while remaining fields generate
respondStream() async generator on TutorBrain — uses client.messages.stream() + regex speech extraction
API route returns text/event-stream with ReadableStream
Frontend consumes SSE via consumeStream() helper in useTutorBrain

Sandbox Token Optimization (80-90% token reduction):

Claude outputs sandboxContent + sandboxAccent (not full HTML)
Frontend wraps with Twind template in buildSandboxHtml()
Subject-based accent colors (physics=blue, chemistry=emerald, etc.)

Manim Video Integration:

manimVideoFile — reuse existing video by filename
manimPrompt — generate new video (30-120s)
videoUrl — resolved URL added by server
Server auto-corrects contentMode to "video" if video fields present

Content Modes: "welcome" | "math" | "sandbox" | "video"

Push-to-Talk Enhancement:

avatarFlush() — immediately sends accumulated transcription on Space release
Fixes latency from debounce waiting

Files Touched in Merge

File	Resolution
`src/types/session.ts`	Keep sandboxContent/sandboxAccent/videoUrl (HEAD)
`src/stores/sessionStore.ts`	Keep HEAD's fields + actions
`src/lib/claude/client.ts`	Merge: SSE streaming + our Zod schema with sandbox/manim fields
`src/hooks/useTutorBrain.ts`	Merge: SSE consumption + content mode validation + video/sandbox handling
`src/hooks/useSession.ts`	Keep HEAD's fields + add avatarFlush
`src/app/api/tutor/respond/route.ts`	Merge: SSE streaming + Manim generation in result event
`src/app/student/session/page.tsx`	Keep HEAD + add avatarFlush to push-to-talk
`src/components/session/ContentMode.tsx`	Keep HEAD's sandboxContent/accent/videoUrl props
`src/components/session/SandboxPanel.tsx`	Keep HEAD's content/accent + Twind template
`progress.md`	Combined both sessions' notes

Session 11A: Design System + AI Prompt Overhaul (HEAD)

Two major changes: (1) Cohesive Soft Lavender (#A78BFA) + Aqua (#67E8F9) design identity across entire app. (2) Complete AI prompt rewrite with sandbox HTML templates and tighter speech rules.

Phase 1: Design System Foundation

globals.css — Full lavender/aqua color palette replacing defaults. --font-display variable. SVG grain texture overlay (3% opacity). Safari input fix (-webkit-appearance: none).
layout.tsx — Space Grotesk display font via next/font/google. class="dark" on <html>. Body includes ${spaceGrotesk.variable}.

Phase 2: AI Prompt Rewrite

prompts.ts — MAJOR rewrite:
- Fixed HTML skeleton for sandbox (consistent layout every time)
- 6 layout templates: centered, split, steps, comparison, chart, interactive
- Subject-based accent colors (Physics=blue, Chemistry=emerald, Biology=green, History=amber, Literature=purple, General=cyan)
- BANNED PHRASES: "Great question!", "Absolutely!", "Excellent!", "Fantastic!", "Not quite"
- USE INSTEAD: "yeah that's right", "nice, so...", "hmm what if..."
- Speech: 1-2 sentences MAX, always end with question, sound like cool older sibling
- Content routing: first response = visual, follow-ups = speech only unless needed
- Hard constraints: 3500 chars max, no CDN, no scrolling, clamp() for responsive sizing
client.ts — Added sandboxTemplate to Zod schema (enum of 6 templates)

Phase 3: Component Theming (15 files)

SandboxPanel — Fade-in transition, lavender empty state, updated viewport CSS
ChatSheet — Lavender user bubbles (bg-[#A78BFA]), violet-tinted AI bubbles, 3 bouncing lavender dots for typing indicator, lavender focus ring
BottomControlBar — Lavender join button (was green), lavender timer text, -webkit-backdrop-filter for Safari
FloatingVideoOverlay — Lavender status dots, lavender thinking pulse/glow (was blue), lavender view mode icons
Landing page — Dark bg (#0A0A0A), Space Grotesk headings, lavender TreeHacks badge, lavender feature cards with hover, lavender tech badges, lavender CTA section
Login page — font-display on title
Session page — Lavender/aqua/violet mode badge dots, -webkit-backdrop-filter on badge, aqua push-to-talk active state
Parent layout — Dark sidebar (bg-[#0E0C18]), lavender logo, lavender nav hover
Parent dashboard — font-display title, lavender/aqua stat card borders + values, lavender session badges

Session 11B: SSE Streaming Pipeline (anton/latency-test)

Major change: Rearchitected the tutor response pipeline from single JSON response to SSE streaming. Speech field is extracted early via regex and emitted immediately, so the avatar starts speaking while sandboxHtml/canvasCommands are still generating.

What Changed

respondStream() async generator — New method on TutorBrain that uses client.messages.stream() + regex-based speech extraction. Yields speech event as soon as the speech field is complete, then result event with remaining fields.
SSE API route — /api/tutor/respond now returns text/event-stream with ReadableStream. Events: speech, result, done, error. Perplexity enrichment still runs before stream starts.
Frontend SSE consumption — useTutorBrain reads SSE events via fetch() + ReadableStream reader. Avatar speaks on speech event (fire-and-forget). Sandbox/canvas/progress update on result event.
buildClaudeRequest() helper — Extracted shared message-building logic from respond() to avoid duplication with respondStream().
Prompt caching — cache_control: { type: "ephemeral" } on system prompts saves ~200-500ms after first request.
Module-level Anthropic client — Reuses HTTP connections, avoids TLS handshake per request.

Architecture Notes

Speech extraction regex: /"speech"\s*:\s*"((?:[^"\\]|\\.)*)"\s*[,}]/ — detects complete speech value in the JSON token stream. Works because speech is the first field in the Zod schema.
Two SSE events: speech (emitted early) + result (everything else, emitted when stream ends). Simpler than per-field events.
AbortController cascade: Frontend abort cancels the fetch → SSE ReadableStream cancel fires → server AbortController aborts Claude stream.
Backward compatible: respond() still exists as a non-streaming fallback.

Session 10b: Floating Video Overlay + Sandbox Viewport Fix

Major change: Replaced side-by-side react-resizable-panels video grid with a true Zoom-style floating PiP overlay. Reverted CSS design system injection that made sandbox output look generic.

Floating Video Overlay (replaces VideoGrid)

Created FloatingVideoOverlay.tsx using react-rnd — draggable + resizable floating PiP
Three view modes matching Zoom's actual behavior:
- Strip (— icon): Thin dark bar showing "Talking: Minerva" or status text
- Speaker (□ icon): One large video tile with name label + hover controls
- Gallery (⋮⋮⋮ icon): Two stacked video tiles (avatar top, camera bottom)
View mode switch icons + minimize button only visible on hover (group-hover pattern)
Video persistence: <video> elements always mounted as sr-only, <canvas> mirrors via requestAnimationFrame + drawImage() — stream never lost across mode/minimize changes
Resize handles with stripe patterns (matching Zoom): bottom (horizontal stripes), right (vertical stripes), corner (diagonal lines SVG)
lockAspectRatio for speaker mode, per-mode min/max sizes
Document detection + scan button preserved on camera tile in gallery mode
Minimizable to small pill (top-right corner)
Deleted VideoGrid.tsx, removed react-resizable-panels package

Sandbox Viewport Fix

Injected minimal CSS: html,body{margin:0;padding:0;overflow:hidden;width:100%;height:100vh;max-height:100vh;}
Added "Content MUST fit in one screen" to Claude prompt sandbox rules
Reverted CSS design system injection (user feedback: made output look "AI-ish generic")
Reverted prompt changes that increased char limit and added design patterns

Build Status

npx tsc --noEmit — 0 errors (pending verification after merge)
npm run build — compiles successfully (pre-existing DB error on /parent SSR unrelated)

Notes for Next Session

Session 11 merge is complete — SSE streaming + sandbox optimization + Manim videos
SSE latency benefit: Speech arrives ~1s, avatar starts talking immediately
Sandbox token savings: 80-90% reduction (sandboxContent + sandboxAccent vs full HTML)
Manim videos: Claude can reuse by filename or generate new (30-120s generation time)
Push-to-talk: avatarFlush() sends accumulated text immediately on Space release
Design identity: Soft Lavender (#A78BFA) primary + Aqua (#67E8F9) accent on deep purple-black (#0C0A14)
Typography: Space Grotesk (display/headlines) + Geist (body). Use font-display class for headings.
Safari: -webkit-backdrop-filter added alongside backdrop-filter in key components
FloatingVideoOverlay uses react-rnd + canvas mirroring — videos never unmount
Pre-existing build error: /parent page fails during static generation (local DB "kimsanov" doesn't exist) — unrelated to our code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minerva — Progress Tracker

Session 13: Read-triggered camera screenshot

Session 12: Tool Calling Speech Fix

1. Fixed stream event handling in `client.ts`

2. Updated system prompt in `prompts.ts`

Files Changed

Multi-Step Tool Execution Flow (NEW)

Session 11: Merge — SSE Streaming + Design System + Manim Videos

Key Changes After Merge

Files Touched in Merge

Session 11A: Design System + AI Prompt Overhaul (HEAD)

Phase 1: Design System Foundation

Phase 2: AI Prompt Rewrite

Phase 3: Component Theming (15 files)

Session 11B: SSE Streaming Pipeline (anton/latency-test)

What Changed

Architecture Notes

Session 10b: Floating Video Overlay + Sandbox Viewport Fix

Floating Video Overlay (replaces VideoGrid)

Sandbox Viewport Fix

Build Status

Notes for Next Session

FilesExpand file tree

progress.md

Latest commit

History

progress.md

File metadata and controls

Minerva — Progress Tracker

Session 13: Read-triggered camera screenshot

Session 12: Tool Calling Speech Fix

1. Fixed stream event handling in client.ts

2. Updated system prompt in prompts.ts

Files Changed

Multi-Step Tool Execution Flow (NEW)

Session 11: Merge — SSE Streaming + Design System + Manim Videos

Key Changes After Merge

Files Touched in Merge

Session 11A: Design System + AI Prompt Overhaul (HEAD)

Phase 1: Design System Foundation

Phase 2: AI Prompt Rewrite

Phase 3: Component Theming (15 files)

Session 11B: SSE Streaming Pipeline (anton/latency-test)

What Changed

Architecture Notes

Session 10b: Floating Video Overlay + Sandbox Viewport Fix

Floating Video Overlay (replaces VideoGrid)

Sandbox Viewport Fix

Build Status

Notes for Next Session

1. Fixed stream event handling in `client.ts`

2. Updated system prompt in `prompts.ts`