Skip to content

fix(search): match snake_case identifiers in BM25 queries#326

Closed
sonwr wants to merge 1 commit intotobi:mainfrom
sonwr:fix/search-snake-case-identifiers
Closed

fix(search): match snake_case identifiers in BM25 queries#326
sonwr wants to merge 1 commit intotobi:mainfrom
sonwr:fix/search-snake-case-identifiers

Conversation

@sonwr
Copy link
Copy Markdown
Contributor

@sonwr sonwr commented Mar 8, 2026

This fixes BM25 search for snake_case identifiers such as atomic_write_json.

What changed:

  • replace non-alphanumeric characters with spaces instead of removing them during FTS5 term sanitization
  • align the CLI-side sanitization logic with the store implementation
  • add an end-to-end CLI test covering snake_case identifier search
  • update structured search tests for the new tokenization behavior

Why:
SQLite FTS5 with the unicode61 tokenizer splits snake_case into separate tokens at index time. The previous query sanitization removed underscores, turning atomic_write_json into atomicwritejson, which could not match the indexed tokens.

How I tested it:

  • reproduced locally before the change:
    • search "atomic_write_json" returned []
    • search "atomic write json" returned the expected document
  • verified locally after the change:
    • search "atomic_write_json" returns the expected document
    • search "parse_http_response" returns the expected document
  • ran targeted tests:
    • test/structured-search.test.ts
    • test/cli.test.ts

Fixes #305

@sonwr
Copy link
Copy Markdown
Contributor Author

sonwr commented Mar 12, 2026

Thanks. This change is small and still looks directionally fine, but the branch has gone stale and is now conflicting with the base branch. Rather than keep a conflicting PR open without active reviewer traction, I am closing this out here. If it still makes sense against current main later, I would prefer to reopen it as a fresh rebase or smaller follow-up.

@sonwr sonwr closed this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BM25 search fails on snake_case identifiers (sanitizeFTS5Term strips underscores)

1 participant