From 54d29567927cea01c22ec7f3a732a9db96546513 Mon Sep 17 00:00:00 2001 From: Alan Shurafa Date: Sat, 18 Apr 2026 18:44:07 -0400 Subject: [PATCH 1/4] [schemas] Text search trigram index for ILIKE fallback --- schemas/text-search-trgm/README.md | 112 +++++++++++++++++++++++++ schemas/text-search-trgm/metadata.json | 20 +++++ schemas/text-search-trgm/schema.sql | 39 +++++++++ 3 files changed, 171 insertions(+) create mode 100644 schemas/text-search-trgm/README.md create mode 100644 schemas/text-search-trgm/metadata.json create mode 100644 schemas/text-search-trgm/schema.sql diff --git a/schemas/text-search-trgm/README.md b/schemas/text-search-trgm/README.md new file mode 100644 index 00000000..117671b8 --- /dev/null +++ b/schemas/text-search-trgm/README.md @@ -0,0 +1,112 @@ +# Text Search Trigram Index + +> Adds a `pg_trgm` GIN index on `public.thoughts.content` so `search_thoughts_text` ILIKE fallback queries run in ~150ms instead of ~8s. + +## What It Does + +Installs the `pg_trgm` extension and creates a trigram GIN index on `public.thoughts.content`. The `search_thoughts_text` RPC from the enhanced-thoughts schema runs a tsvector phase first, then falls back to `ILIKE '%query%'` whenever tsvector returns fewer hits than requested -- which happens for most real-world queries. Without a trigram index that fallback sequential-scans the entire table. + +**Before/after (89K-thought brain):** + +| Query | Before | After | +|-------|--------|-------| +| Rare-word ILIKE fallback | ~8s (seq scan) | ~100-150ms (bitmap index scan) | +| Common-word tsvector hit | unchanged | unchanged | + +## Why It Matters + +`search_thoughts_text` powers every text search that goes through Open Brain's MCP layer. Leading-wildcard patterns like `ILIKE '%foo%'` cannot use the existing tsvector GIN index (tsvector is word-level, ILIKE is substring-level), so the planner defaults to a full sequential scan. At ~90K rows that's a 7-8 second wait on every rare-word lookup. + +`pg_trgm` breaks text into 3-character trigrams and builds a GIN index the planner *can* use for substring matching. No changes to `search_thoughts_text` are needed -- the planner picks up the new index automatically. Queries that previously seq-scanned now run as bitmap index scans. + +## Prerequisites + +- Working Open Brain setup ([guide](../../docs/01-getting-started.md)) +- [`schemas/enhanced-thoughts`](https://github.com/NateBJones-Projects/OB1/pull/191) installed (defines `search_thoughts_text` and the base tsvector index) +- Supabase project with write access to run migrations + +## Credential Tracker + +Copy this block into a text editor and fill it in as you go. + +```text +TEXT SEARCH TRIGRAM INDEX -- CREDENTIAL TRACKER +-------------------------------------- + +SUPABASE (from your Open Brain setup) + Project URL: ____________ + Secret key: ____________ + +-------------------------------------- +``` + +## Steps + +1. Open your Supabase dashboard and navigate to the **SQL Editor** +2. Create a new query and paste the full contents of `schema.sql` +3. Click **Run** to execute the migration (the `CREATE INDEX` will briefly lock the `thoughts` table against writes; ~1-2 minutes at 90K rows) +4. Navigate to **Database > Extensions** and confirm `pg_trgm` is enabled +5. Navigate to **Database > Indexes** (or run the verification query below) and confirm `idx_thoughts_content_trgm` exists on `public.thoughts` + +## Expected Outcome + +After running the migration: + +- The `pg_trgm` extension is installed in the database. +- A GIN trigram index named `idx_thoughts_content_trgm` exists on `public.thoughts(content)`. +- The next `search_thoughts_text` call whose ILIKE fallback fires will complete in ~100-150ms instead of ~8s. + +## Verification + +Run the following in the SQL Editor. The `Bitmap Index Scan on idx_thoughts_content_trgm` line in the plan confirms the planner is using the new index: + +```sql +EXPLAIN ANALYZE +SELECT id +FROM public.thoughts +WHERE content ILIKE '%somerarewordfromyourbrain%' +LIMIT 25; +``` + +Expected plan (abbreviated): + +``` +Limit + -> Bitmap Heap Scan on thoughts + Recheck Cond: (content ~~* '%somerarewordfromyourbrain%'::text) + -> Bitmap Index Scan on idx_thoughts_content_trgm + Index Cond: (content ~~* '%somerarewordfromyourbrain%'::text) +Execution Time: ~100-200 ms +``` + +If you instead see `Seq Scan on thoughts`, the index was not created or the planner has stale statistics -- run `ANALYZE public.thoughts;` and try again. + +## Rollback + +```sql +DROP INDEX IF EXISTS public.idx_thoughts_content_trgm; +``` + +The `pg_trgm` extension is left installed; it is harmless on its own and may be used by other contributions. + +## Tradeoffs + +- **Storage:** ~20-40MB on a 90K-thought brain. Scales linearly with total content size. +- **Build lock:** Regular (non-CONCURRENT) `CREATE INDEX` briefly locks `public.thoughts` against writes during the build (~1-2 minutes at 90K rows). If you're running live capture and can't tolerate a brief write pause, switch the statement to `CREATE INDEX CONCURRENTLY` and remove the surrounding `BEGIN/COMMIT` -- concurrent index builds cannot run inside a transaction. +- **Write amplification:** Small per-row overhead on `INSERT` and `UPDATE` of `content` (the index needs to be maintained). Imperceptible at typical personal-brain write rates. + +## Troubleshooting + +**Issue: "extension pg_trgm does not exist" error** +Solution: Your Supabase project predates automatic extension availability. In the SQL Editor, run `CREATE EXTENSION pg_trgm;` as a superuser or contact Supabase support. The migration uses `CREATE EXTENSION IF NOT EXISTS`, which works on all current Supabase projects. + +**Issue: `EXPLAIN ANALYZE` still shows `Seq Scan on thoughts`** +Solution: Run `ANALYZE public.thoughts;` to refresh planner statistics, then retry. The planner needs accurate row counts before it will choose an index scan over a seq scan on small tables. + +**Issue: Migration hangs on `CREATE INDEX`** +Solution: Check for long-running transactions holding locks on `thoughts` (look at `pg_stat_activity`). The index build needs to acquire a `SHARE` lock on the table. If you can't stop the blocking transaction, switch to `CREATE INDEX CONCURRENTLY` (see Tradeoffs). + +## References + +- [PostgreSQL `pg_trgm` documentation](https://www.postgresql.org/docs/current/pgtrgm.html) -- official reference for trigram matching and index operator classes +- [`schemas/enhanced-thoughts`](../enhanced-thoughts/README.md) -- defines `search_thoughts_text`, the consumer of this index diff --git a/schemas/text-search-trgm/metadata.json b/schemas/text-search-trgm/metadata.json new file mode 100644 index 00000000..0a24c7b4 --- /dev/null +++ b/schemas/text-search-trgm/metadata.json @@ -0,0 +1,20 @@ +{ + "name": "Text Search Trigram Index", + "description": "pg_trgm GIN index on public.thoughts.content to accelerate search_thoughts_text ILIKE fallback by ~50x on rare-word queries.", + "category": "schemas", + "author": { + "name": "Alan Shurafa", + "github": "alanshurafa" + }, + "version": "1.0.0", + "requires": { + "open_brain": true, + "services": [], + "tools": [] + }, + "tags": ["performance", "search", "pg_trgm", "indexing"], + "difficulty": "beginner", + "estimated_time": "5 minutes", + "created": "2026-04-18", + "updated": "2026-04-18" +} diff --git a/schemas/text-search-trgm/schema.sql b/schemas/text-search-trgm/schema.sql new file mode 100644 index 00000000..b4c0b492 --- /dev/null +++ b/schemas/text-search-trgm/schema.sql @@ -0,0 +1,39 @@ +-- Add pg_trgm trigram GIN index to accelerate search_thoughts_text ILIKE fallback. +-- +-- Context: search_thoughts_text (from schemas/enhanced-thoughts) has a tsvector +-- phase (fast) and an ILIKE '%...%' fallback. The ILIKE fallback triggers for +-- most real queries -- tsvector usually returns fewer hits than requested and +-- the function fills in from ILIKE. Leading-wildcard ILIKE can't use the +-- tsvector GIN index, so without a trigram index ILIKE seq-scans the whole +-- thoughts table. On an 89K-row brain, that's 7-8s per rare-word query. +-- +-- Fix: pg_trgm provides trigram-based indexing that GIN can use for ILIKE +-- patterns. No changes to search_thoughts_text needed -- the Postgres planner +-- picks up the new index automatically once it exists. Rare-word queries drop +-- from ~8s to ~100-150ms. +-- +-- Prerequisites: enhanced-thoughts schema (PR #191) must be installed first. +-- This migration adds only the trigram index; tsvector index lives in +-- enhanced-thoughts. +-- +-- Tradeoffs: +-- - Storage: ~20-40MB on a 90K-thought brain; scales linearly with content size. +-- - Build lock: regular (non-CONCURRENT) CREATE INDEX briefly locks the +-- thoughts table against writes during the build (~1-2 min at 90K rows). +-- Switch to CREATE INDEX CONCURRENTLY if you're running live capture and +-- can tolerate migration-outside-transaction semantics. +-- - Write-amp: small INSERT/UPDATE overhead on content changes. Imperceptible +-- at typical personal-brain write rates. + +BEGIN; + +CREATE EXTENSION IF NOT EXISTS pg_trgm; + +CREATE INDEX IF NOT EXISTS idx_thoughts_content_trgm + ON public.thoughts + USING gin (content gin_trgm_ops); + +COMMENT ON INDEX public.idx_thoughts_content_trgm IS + 'Trigram GIN index on content for ILIKE ''%foo%'' patterns. Accelerates search_thoughts_text ILIKE fallback from ~8s to ~150ms on rare-word queries.'; + +COMMIT; From 2a8c6340193fac07fbf559d069a048fead3365bd Mon Sep 17 00:00:00 2001 From: Alan Shurafa Date: Sat, 18 Apr 2026 18:56:20 -0400 Subject: [PATCH 2/4] [schemas] Fix CI Rule 13: convert broken enhanced-thoughts link to external PR URL --- schemas/text-search-trgm/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/schemas/text-search-trgm/README.md b/schemas/text-search-trgm/README.md index 117671b8..52caefc9 100644 --- a/schemas/text-search-trgm/README.md +++ b/schemas/text-search-trgm/README.md @@ -109,4 +109,4 @@ Solution: Check for long-running transactions holding locks on `thoughts` (look ## References - [PostgreSQL `pg_trgm` documentation](https://www.postgresql.org/docs/current/pgtrgm.html) -- official reference for trigram matching and index operator classes -- [`schemas/enhanced-thoughts`](../enhanced-thoughts/README.md) -- defines `search_thoughts_text`, the consumer of this index +- [`schemas/enhanced-thoughts` (PR #191)](https://github.com/NateBJones-Projects/OB1/pull/191) -- defines `search_thoughts_text`, the consumer of this index From d5d24d654a820a5ab126fb6c4fe2bf99edfa3460 Mon Sep 17 00:00:00 2001 From: Alan Shurafa Date: Sat, 18 Apr 2026 19:35:54 -0400 Subject: [PATCH 3/4] [schemas] Fix REVIEW-CLAUDE-MEDIUM-1: ANALYZE as explicit post-migration step --- schemas/text-search-trgm/README.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/schemas/text-search-trgm/README.md b/schemas/text-search-trgm/README.md index 52caefc9..4771819a 100644 --- a/schemas/text-search-trgm/README.md +++ b/schemas/text-search-trgm/README.md @@ -25,6 +25,8 @@ Installs the `pg_trgm` extension and creates a trigram GIN index on `public.thou - [`schemas/enhanced-thoughts`](https://github.com/NateBJones-Projects/OB1/pull/191) installed (defines `search_thoughts_text` and the base tsvector index) - Supabase project with write access to run migrations +This migration installs without error on stock OB1 (without PR #191), but provides no measurable benefit unless `search_thoughts_text` is installed. Install PR #191 first for the full effect. + ## Credential Tracker Copy this block into a text editor and fill it in as you go. @@ -45,8 +47,9 @@ SUPABASE (from your Open Brain setup) 1. Open your Supabase dashboard and navigate to the **SQL Editor** 2. Create a new query and paste the full contents of `schema.sql` 3. Click **Run** to execute the migration (the `CREATE INDEX` will briefly lock the `thoughts` table against writes; ~1-2 minutes at 90K rows) -4. Navigate to **Database > Extensions** and confirm `pg_trgm` is enabled -5. Navigate to **Database > Indexes** (or run the verification query below) and confirm `idx_thoughts_content_trgm` exists on `public.thoughts` +4. In a new query, run `ANALYZE public.thoughts;` to refresh planner statistics so the new index is picked up immediately. `ANALYZE` cannot run inside the migration's transaction, so it must be a separate command. +5. Navigate to **Database > Extensions** and confirm `pg_trgm` is enabled +6. Navigate to **Database > Indexes** (or run the verification query below) and confirm `idx_thoughts_content_trgm` exists on `public.thoughts` ## Expected Outcome @@ -54,7 +57,7 @@ After running the migration: - The `pg_trgm` extension is installed in the database. - A GIN trigram index named `idx_thoughts_content_trgm` exists on `public.thoughts(content)`. -- The next `search_thoughts_text` call whose ILIKE fallback fires will complete in ~100-150ms instead of ~8s. +- With `ANALYZE public.thoughts;` run (Step 4), the next `search_thoughts_text` call whose ILIKE fallback fires completes in ~100-150ms instead of ~8s. ## Verification @@ -101,7 +104,7 @@ The `pg_trgm` extension is left installed; it is harmless on its own and may be Solution: Your Supabase project predates automatic extension availability. In the SQL Editor, run `CREATE EXTENSION pg_trgm;` as a superuser or contact Supabase support. The migration uses `CREATE EXTENSION IF NOT EXISTS`, which works on all current Supabase projects. **Issue: `EXPLAIN ANALYZE` still shows `Seq Scan on thoughts`** -Solution: Run `ANALYZE public.thoughts;` to refresh planner statistics, then retry. The planner needs accurate row counts before it will choose an index scan over a seq scan on small tables. +Solution: If you somehow skipped Step 4, run `ANALYZE public.thoughts;` to refresh planner statistics, then retry. The planner needs accurate row counts before it will choose an index scan over a seq scan on small tables. **Issue: Migration hangs on `CREATE INDEX`** Solution: Check for long-running transactions holding locks on `thoughts` (look at `pg_stat_activity`). The index build needs to acquire a `SHARE` lock on the table. If you can't stop the blocking transaction, switch to `CREATE INDEX CONCURRENTLY` (see Tradeoffs). From 5388acb064ab6db49ae4a18ad6511eec558a8f0d Mon Sep 17 00:00:00 2001 From: Jonathan Edwards Date: Mon, 1 Jun 2026 14:27:44 -0400 Subject: [PATCH 4/4] docs: add community credit badge to text search trgm --- schemas/text-search-trgm/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/schemas/text-search-trgm/README.md b/schemas/text-search-trgm/README.md index 4771819a..9a2a677e 100644 --- a/schemas/text-search-trgm/README.md +++ b/schemas/text-search-trgm/README.md @@ -1,5 +1,9 @@ # Text Search Trigram Index +![Community Contribution](https://img.shields.io/badge/OB1_COMMUNITY-Approved_Contribution-2ea44f?style=for-the-badge&logo=github) + +**Created by [@alanshurafa](https://github.com/alanshurafa)** + > Adds a `pg_trgm` GIN index on `public.thoughts.content` so `search_thoughts_text` ILIKE fallback queries run in ~150ms instead of ~8s. ## What It Does