diff --git a/schemas/text-search-trgm/README.md b/schemas/text-search-trgm/README.md new file mode 100644 index 000000000..9a2a677e9 --- /dev/null +++ b/schemas/text-search-trgm/README.md @@ -0,0 +1,119 @@ +# Text Search Trigram Index + +![Community Contribution](https://img.shields.io/badge/OB1_COMMUNITY-Approved_Contribution-2ea44f?style=for-the-badge&logo=github) + +**Created by [@alanshurafa](https://github.com/alanshurafa)** + +> Adds a `pg_trgm` GIN index on `public.thoughts.content` so `search_thoughts_text` ILIKE fallback queries run in ~150ms instead of ~8s. + +## What It Does + +Installs the `pg_trgm` extension and creates a trigram GIN index on `public.thoughts.content`. The `search_thoughts_text` RPC from the enhanced-thoughts schema runs a tsvector phase first, then falls back to `ILIKE '%query%'` whenever tsvector returns fewer hits than requested -- which happens for most real-world queries. Without a trigram index that fallback sequential-scans the entire table. + +**Before/after (89K-thought brain):** + +| Query | Before | After | +|-------|--------|-------| +| Rare-word ILIKE fallback | ~8s (seq scan) | ~100-150ms (bitmap index scan) | +| Common-word tsvector hit | unchanged | unchanged | + +## Why It Matters + +`search_thoughts_text` powers every text search that goes through Open Brain's MCP layer. Leading-wildcard patterns like `ILIKE '%foo%'` cannot use the existing tsvector GIN index (tsvector is word-level, ILIKE is substring-level), so the planner defaults to a full sequential scan. At ~90K rows that's a 7-8 second wait on every rare-word lookup. + +`pg_trgm` breaks text into 3-character trigrams and builds a GIN index the planner *can* use for substring matching. No changes to `search_thoughts_text` are needed -- the planner picks up the new index automatically. Queries that previously seq-scanned now run as bitmap index scans. + +## Prerequisites + +- Working Open Brain setup ([guide](../../docs/01-getting-started.md)) +- [`schemas/enhanced-thoughts`](https://github.com/NateBJones-Projects/OB1/pull/191) installed (defines `search_thoughts_text` and the base tsvector index) +- Supabase project with write access to run migrations + +This migration installs without error on stock OB1 (without PR #191), but provides no measurable benefit unless `search_thoughts_text` is installed. Install PR #191 first for the full effect. + +## Credential Tracker + +Copy this block into a text editor and fill it in as you go. + +```text +TEXT SEARCH TRIGRAM INDEX -- CREDENTIAL TRACKER +-------------------------------------- + +SUPABASE (from your Open Brain setup) + Project URL: ____________ + Secret key: ____________ + +-------------------------------------- +``` + +## Steps + +1. Open your Supabase dashboard and navigate to the **SQL Editor** +2. Create a new query and paste the full contents of `schema.sql` +3. Click **Run** to execute the migration (the `CREATE INDEX` will briefly lock the `thoughts` table against writes; ~1-2 minutes at 90K rows) +4. In a new query, run `ANALYZE public.thoughts;` to refresh planner statistics so the new index is picked up immediately. `ANALYZE` cannot run inside the migration's transaction, so it must be a separate command. +5. Navigate to **Database > Extensions** and confirm `pg_trgm` is enabled +6. Navigate to **Database > Indexes** (or run the verification query below) and confirm `idx_thoughts_content_trgm` exists on `public.thoughts` + +## Expected Outcome + +After running the migration: + +- The `pg_trgm` extension is installed in the database. +- A GIN trigram index named `idx_thoughts_content_trgm` exists on `public.thoughts(content)`. +- With `ANALYZE public.thoughts;` run (Step 4), the next `search_thoughts_text` call whose ILIKE fallback fires completes in ~100-150ms instead of ~8s. + +## Verification + +Run the following in the SQL Editor. The `Bitmap Index Scan on idx_thoughts_content_trgm` line in the plan confirms the planner is using the new index: + +```sql +EXPLAIN ANALYZE +SELECT id +FROM public.thoughts +WHERE content ILIKE '%somerarewordfromyourbrain%' +LIMIT 25; +``` + +Expected plan (abbreviated): + +``` +Limit + -> Bitmap Heap Scan on thoughts + Recheck Cond: (content ~~* '%somerarewordfromyourbrain%'::text) + -> Bitmap Index Scan on idx_thoughts_content_trgm + Index Cond: (content ~~* '%somerarewordfromyourbrain%'::text) +Execution Time: ~100-200 ms +``` + +If you instead see `Seq Scan on thoughts`, the index was not created or the planner has stale statistics -- run `ANALYZE public.thoughts;` and try again. + +## Rollback + +```sql +DROP INDEX IF EXISTS public.idx_thoughts_content_trgm; +``` + +The `pg_trgm` extension is left installed; it is harmless on its own and may be used by other contributions. + +## Tradeoffs + +- **Storage:** ~20-40MB on a 90K-thought brain. Scales linearly with total content size. +- **Build lock:** Regular (non-CONCURRENT) `CREATE INDEX` briefly locks `public.thoughts` against writes during the build (~1-2 minutes at 90K rows). If you're running live capture and can't tolerate a brief write pause, switch the statement to `CREATE INDEX CONCURRENTLY` and remove the surrounding `BEGIN/COMMIT` -- concurrent index builds cannot run inside a transaction. +- **Write amplification:** Small per-row overhead on `INSERT` and `UPDATE` of `content` (the index needs to be maintained). Imperceptible at typical personal-brain write rates. + +## Troubleshooting + +**Issue: "extension pg_trgm does not exist" error** +Solution: Your Supabase project predates automatic extension availability. In the SQL Editor, run `CREATE EXTENSION pg_trgm;` as a superuser or contact Supabase support. The migration uses `CREATE EXTENSION IF NOT EXISTS`, which works on all current Supabase projects. + +**Issue: `EXPLAIN ANALYZE` still shows `Seq Scan on thoughts`** +Solution: If you somehow skipped Step 4, run `ANALYZE public.thoughts;` to refresh planner statistics, then retry. The planner needs accurate row counts before it will choose an index scan over a seq scan on small tables. + +**Issue: Migration hangs on `CREATE INDEX`** +Solution: Check for long-running transactions holding locks on `thoughts` (look at `pg_stat_activity`). The index build needs to acquire a `SHARE` lock on the table. If you can't stop the blocking transaction, switch to `CREATE INDEX CONCURRENTLY` (see Tradeoffs). + +## References + +- [PostgreSQL `pg_trgm` documentation](https://www.postgresql.org/docs/current/pgtrgm.html) -- official reference for trigram matching and index operator classes +- [`schemas/enhanced-thoughts` (PR #191)](https://github.com/NateBJones-Projects/OB1/pull/191) -- defines `search_thoughts_text`, the consumer of this index diff --git a/schemas/text-search-trgm/metadata.json b/schemas/text-search-trgm/metadata.json new file mode 100644 index 000000000..0a24c7b42 --- /dev/null +++ b/schemas/text-search-trgm/metadata.json @@ -0,0 +1,20 @@ +{ + "name": "Text Search Trigram Index", + "description": "pg_trgm GIN index on public.thoughts.content to accelerate search_thoughts_text ILIKE fallback by ~50x on rare-word queries.", + "category": "schemas", + "author": { + "name": "Alan Shurafa", + "github": "alanshurafa" + }, + "version": "1.0.0", + "requires": { + "open_brain": true, + "services": [], + "tools": [] + }, + "tags": ["performance", "search", "pg_trgm", "indexing"], + "difficulty": "beginner", + "estimated_time": "5 minutes", + "created": "2026-04-18", + "updated": "2026-04-18" +} diff --git a/schemas/text-search-trgm/schema.sql b/schemas/text-search-trgm/schema.sql new file mode 100644 index 000000000..b4c0b492e --- /dev/null +++ b/schemas/text-search-trgm/schema.sql @@ -0,0 +1,39 @@ +-- Add pg_trgm trigram GIN index to accelerate search_thoughts_text ILIKE fallback. +-- +-- Context: search_thoughts_text (from schemas/enhanced-thoughts) has a tsvector +-- phase (fast) and an ILIKE '%...%' fallback. The ILIKE fallback triggers for +-- most real queries -- tsvector usually returns fewer hits than requested and +-- the function fills in from ILIKE. Leading-wildcard ILIKE can't use the +-- tsvector GIN index, so without a trigram index ILIKE seq-scans the whole +-- thoughts table. On an 89K-row brain, that's 7-8s per rare-word query. +-- +-- Fix: pg_trgm provides trigram-based indexing that GIN can use for ILIKE +-- patterns. No changes to search_thoughts_text needed -- the Postgres planner +-- picks up the new index automatically once it exists. Rare-word queries drop +-- from ~8s to ~100-150ms. +-- +-- Prerequisites: enhanced-thoughts schema (PR #191) must be installed first. +-- This migration adds only the trigram index; tsvector index lives in +-- enhanced-thoughts. +-- +-- Tradeoffs: +-- - Storage: ~20-40MB on a 90K-thought brain; scales linearly with content size. +-- - Build lock: regular (non-CONCURRENT) CREATE INDEX briefly locks the +-- thoughts table against writes during the build (~1-2 min at 90K rows). +-- Switch to CREATE INDEX CONCURRENTLY if you're running live capture and +-- can tolerate migration-outside-transaction semantics. +-- - Write-amp: small INSERT/UPDATE overhead on content changes. Imperceptible +-- at typical personal-brain write rates. + +BEGIN; + +CREATE EXTENSION IF NOT EXISTS pg_trgm; + +CREATE INDEX IF NOT EXISTS idx_thoughts_content_trgm + ON public.thoughts + USING gin (content gin_trgm_ops); + +COMMENT ON INDEX public.idx_thoughts_content_trgm IS + 'Trigram GIN index on content for ILIKE ''%foo%'' patterns. Accelerates search_thoughts_text ILIKE fallback from ~8s to ~150ms on rare-word queries.'; + +COMMIT;