Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 32 additions & 9 deletions schemas/enhanced-thoughts/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,23 @@
# Enhanced Thoughts Columns and Utility RPCs

<div align="center">

![Community Contribution](https://img.shields.io/badge/OB1_COMMUNITY-Approved_Contribution-2ea44f?style=for-the-badge&logo=github)

**Created by [@alanshurafa](https://github.com/alanshurafa)**

</div>

> Adds structured columns and utility functions to the Open Brain thoughts table for richer classification, full-text search, statistics, and connection discovery.

## What It Does

This schema extension adds six new columns to the `thoughts` table (`type`, `sensitivity_tier`, `importance`, `quality_score`, `source_type`, `enriched`) so thoughts can be classified, filtered, and ranked without parsing the metadata JSONB every time. It also upgrades `upsert_thought` so metadata-backed writes keep those structured columns in sync. It installs three utility RPC functions:
This schema extension adds six new columns to the `thoughts` table (`type`, `sensitivity_tier`, `importance`, `quality_score`, `source_type`, `enriched`) so thoughts can be classified, filtered, and ranked without parsing the metadata JSONB every time. It also installs four RPC functions:

- **`search_thoughts_text`** -- Full-text search with boolean operators, ILIKE fallback, pagination, and result counts.
- **`brain_stats_aggregate`** -- Returns total thought count, top types, and top topics as a single JSONB payload.
- **`get_thought_connections`** -- Finds thoughts that share metadata topics or people with a given thought.
- **`backfill_thought_types(p_allowed_types TEXT[])`** -- Populates the new top-level `type` column from `metadata->>'type'`. The default allowlist covers the canonical eight values (`idea`, `task`, `person_note`, `reference`, `decision`, `lesson`, `meeting`, `journal`). Pass a custom array to accept additional values, or pass `NULL` to backfill whatever `metadata->>'type'` contains.

## Prerequisites

Expand Down Expand Up @@ -36,19 +45,33 @@ SUPABASE (from your Open Brain setup)
2. Create a new query and paste the full contents of `schema.sql`
3. Click **Run** to execute the migration
4. Open **Table Editor** and select the `thoughts` table to confirm the new columns appear: `type`, `sensitivity_tier`, `importance`, `quality_score`, `source_type`, `enriched`
5. Navigate to **Database > Functions** and verify three new functions exist: `search_thoughts_text`, `brain_stats_aggregate`, `get_thought_connections`
6. Verify `upsert_thought` still exists. The enhanced version mirrors `metadata.type`, `metadata.source`, `metadata.importance`, `metadata.quality_score`, `metadata.sensitivity_tier`, and task/idea status into top-level columns.
7. If you have existing thoughts with `type` or `source` values stored in the metadata JSONB, the backfill statements at the bottom of the script will have populated the new columns automatically
5. Navigate to **Database > Functions** and verify the new functions exist: `search_thoughts_text`, `brain_stats_aggregate`, `get_thought_connections`, `backfill_thought_types`
6. If you have existing thoughts with `type` or `source` values stored in the metadata JSONB, the script automatically calls `backfill_thought_types()` with the default canonical allowlist. If your brain uses non-canonical `type` values, re-run `SELECT backfill_thought_types(ARRAY['your','custom','types']);` or `SELECT backfill_thought_types(NULL);` to accept any value

## Expected Outcome

After running the migration:

- The `thoughts` table has six new columns with dashboard-friendly defaults.
- The `thoughts` table has six new columns with sensible defaults:
- `sensitivity_tier TEXT DEFAULT 'standard'` (canonical values: `'standard'`, `'personal'`, `'restricted'`)
- `importance SMALLINT DEFAULT 3` (scale: 1-5, where 3 is the default)
- `quality_score NUMERIC(5,2) DEFAULT 50` (scale: 0-100, where 50 is the default)
- `enriched BOOLEAN DEFAULT false`
- `type TEXT` (nullable; populated by backfill or writers)
- `source_type TEXT` (nullable; populated by backfill or writers)
- New indexes on `type`, `importance`, `source_type`, and a GIN tsvector index on `content` for fast full-text search.
- Three new RPC functions callable via the Supabase client or REST API.
- `upsert_thought` remains the canonical write path, but now keeps structured dashboard columns synchronized with metadata payloads.
- Any existing thoughts with `type` or `source` in their metadata JSONB will have those values copied into the new top-level columns.
- Four new RPC functions callable via the Supabase client or REST API (`search_thoughts_text`, `brain_stats_aggregate`, `get_thought_connections`, `backfill_thought_types`).
- Any existing thoughts with `type` or `source` in their metadata JSONB will have those values copied into the new top-level columns (via `backfill_thought_types()` for `type` with the canonical allowlist, plus an inline `UPDATE` for `source_type`).

## Security

This schema follows stock Open Brain's "service_role only" posture:

- `brain_stats_aggregate` and `get_thought_connections` are `SECURITY DEFINER` with `SET search_path = public` (defense in depth against search-path hijacks). They can read the full `thoughts` table regardless of RLS.
- `search_thoughts_text` is `SECURITY INVOKER` and respects RLS.
- **None of the three RPCs are granted to `anon`.** Execute privilege is limited to `authenticated` and `service_role`. The publishable anon key cannot call them.

If you want to expose any of these to `anon` (for example, a public-read dashboard), add your own `GRANT EXECUTE ... TO anon;` in a follow-up migration and confirm that `p_exclude_restricted := true` (the default) plus your sensitivity-tier hygiene gives you the exposure surface you actually want. This is an explicit opt-in: the default stance is private.

## Troubleshooting

Expand All @@ -59,4 +82,4 @@ Solution: These are safe to ignore. The `ADD COLUMN IF NOT EXISTS` syntax preven
Solution: Confirm your thoughts have content populated. Try a simple query first (single word, no operators). If using boolean operators, ensure the syntax matches websearch format ("quoted phrases", word AND word, -excluded).

**Issue: brain_stats_aggregate returns empty types or topics**
Solution: The function filters by `created_at`. Pass `p_since_days := 0` for all-time stats. Also confirm that your thoughts have the `type` column populated (run the backfill UPDATE if needed).
Solution: The function filters by `created_at`. Pass `p_since_days := 0` for all-time stats. Also confirm that your thoughts have the `type` column populated. If you use non-canonical type values in `metadata->>'type'` (anything outside `idea`, `task`, `person_note`, `reference`, `decision`, `lesson`, `meeting`, `journal`), call the backfill RPC with your own allowlist, e.g. `SELECT backfill_thought_types(ARRAY['idea','task','article','quote']);`, or `SELECT backfill_thought_types(NULL);` to accept whatever is present.
2 changes: 1 addition & 1 deletion schemas/enhanced-thoughts/metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@
"difficulty": "beginner",
"estimated_time": "15 minutes",
"created": "2026-04-06",
"updated": "2026-04-06"
"updated": "2026-04-17"
}
69 changes: 57 additions & 12 deletions schemas/enhanced-thoughts/schema.sql
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ RETURNS TABLE (
total_count BIGINT
)
LANGUAGE plpgsql
VOLATILE
STABLE
SET statement_timeout = '25s'
AS $$
BEGIN
Expand Down Expand Up @@ -84,7 +84,7 @@ BEGIN
AND (SELECT count(*) FROM tsvector_hits) < (p_limit + p_offset)
AND t.content ILIKE '%' || q.raw_query || '%'
AND t.metadata @> coalesce(p_filter, '{}'::jsonb)
AND t.id NOT IN (SELECT th.hit_id FROM tsvector_hits th)
AND NOT EXISTS (SELECT 1 FROM tsvector_hits th WHERE th.hit_id = t.id)
LIMIT 500
),
all_hits AS (
Expand Down Expand Up @@ -118,8 +118,10 @@ BEGIN
ELSE 0
END
)
+ (coalesce(t.importance, 5) / 20.0)::real
+ (coalesce(t.quality_score, 0.50) / 500.0)::real
-- importance is 1..5; max bonus 5/20 = 0.25
+ (coalesce(t.importance, 3) / 20.0)::real
-- quality_score is 0..100; max bonus 100/500 = 0.20
+ (coalesce(t.quality_score, 50) / 500.0)::real
)::real AS rank
FROM public.thoughts t
CROSS JOIN query_input q
Expand All @@ -138,8 +140,12 @@ BEGIN
END;
$$;

-- Do NOT grant to `anon`. Stock Open Brain keeps `thoughts` behind RLS
-- (service_role only). Broadening execution to the publishable anon key
-- would expose the entire brain to anyone who knows the project URL.
-- See README "Security" section.
GRANT EXECUTE ON FUNCTION search_thoughts_text(TEXT, INTEGER, JSONB, INTEGER)
TO authenticated, anon, service_role;
TO authenticated, service_role;

-- ============================================================
-- 3. BRAIN STATS AGGREGATE RPC
Expand Down Expand Up @@ -195,8 +201,10 @@ BEGIN
END;
$$;

-- Do NOT grant to `anon`. This RPC is SECURITY DEFINER and would bypass
-- RLS on the thoughts table. See README "Security" section.
GRANT EXECUTE ON FUNCTION brain_stats_aggregate(INTEGER, BOOLEAN)
TO authenticated, anon, service_role;
TO authenticated, service_role;

-- ============================================================
-- 4. THOUGHT CONNECTIONS RPC
Expand All @@ -220,6 +228,7 @@ RETURNS TABLE (
overlap_count INT
)
LANGUAGE plpgsql
STABLE
SECURITY DEFINER
SET search_path = public
AS $$
Expand Down Expand Up @@ -266,7 +275,7 @@ BEGIN
) AS shared_people
FROM thoughts bt
WHERE bt.id != p_thought_id
AND (NOT p_exclude_restricted OR bt.sensitivity_tier != 'restricted')
AND (NOT p_exclude_restricted OR bt.sensitivity_tier IS DISTINCT FROM 'restricted')
AND (
EXISTS (
SELECT 1 FROM jsonb_array_elements_text(bt.metadata->'topics') val
Expand All @@ -288,19 +297,55 @@ BEGIN
END;
$$;

-- Do NOT grant to `anon`. This RPC is SECURITY DEFINER and exposes
-- a 200-char content preview plus metadata for any thought by UUID;
-- granting to anon would let anyone with the project URL pull content.
-- See README "Security" section.
GRANT EXECUTE ON FUNCTION get_thought_connections(UUID, INT, BOOLEAN)
TO authenticated, anon, service_role;
TO authenticated, service_role;

-- ============================================================
-- 5. BACKFILL EXISTING DATA
-- Populates new columns from metadata for rows that already
-- exist. Safe to run multiple times (WHERE ... IS NULL guard).
-- ============================================================

-- Backfill type from metadata
UPDATE thoughts SET type = metadata->>'type'
WHERE type IS NULL AND metadata->>'type' IS NOT NULL
AND metadata->>'type' IN ('idea','task','person_note','reference','decision','lesson','meeting','journal');
-- Backfill `type` from metadata. Wrapped in an RPC so callers can
-- override the allowlist. Default allowlist matches the canonical
-- Open Brain type vocabulary; pass NULL to accept any string value
-- present in metadata->>'type'.
CREATE OR REPLACE FUNCTION backfill_thought_types(
p_allowed_types TEXT[] DEFAULT ARRAY[
'idea','task','person_note','reference',
'decision','lesson','meeting','journal'
]
)
RETURNS BIGINT
LANGUAGE plpgsql
VOLATILE
SET search_path = public
AS $$
DECLARE
v_updated BIGINT;
BEGIN
UPDATE public.thoughts
SET type = metadata->>'type'
WHERE type IS NULL
AND metadata->>'type' IS NOT NULL
AND (p_allowed_types IS NULL OR metadata->>'type' = ANY(p_allowed_types));

GET DIAGNOSTICS v_updated = ROW_COUNT;
RETURN v_updated;
END;
$$;

-- Do NOT grant to `anon`. This RPC writes to the thoughts table.
GRANT EXECUTE ON FUNCTION backfill_thought_types(TEXT[])
TO authenticated, service_role;

-- Run the backfill with the default allowlist so the paste-and-run
-- flow still auto-populates `type` for canonical values.
SELECT backfill_thought_types();

-- Backfill source_type from metadata
UPDATE thoughts SET source_type = metadata->>'source'
Expand Down
Loading