feat(filter): add batch filtering and confidence calibration by foundatron · Pull Request #25 · foundatron/tentacle

foundatron · 2026-03-11T02:22:19Z

Closes #7

Changes

tentacle/llm/prompts.py — Add FILTER_BATCH_SYSTEM and FILTER_BATCH_USER prompt templates

FILTER_BATCH_SYSTEM: Instructs the LLM to score multiple articles, returning a JSON array of {"index": N, "relevance": 0.XX, "reasoning": "..."} with 1-based indices.
FILTER_BATCH_USER: Formats a numbered list of title+abstract pairs ([1] Title: ... / Abstract: ...).

tentacle/llm/filter.py — Add filter_batch() function; filter_article() unchanged

filter_batch(client, articles, *, model, threshold, batch_size=10) -> list[tuple[float, str]]: Chunks articles into batches of batch_size, sends each batch in a single LLM call, parses JSON array response.
On JSON parse failure for entire batch: fall back to filter_article() individually for every article in that batch.
On partial parse failure (valid JSON but missing/malformed/out-of-range entries): use parsed results for successful entries, fall back to filter_article() for missing ones.
Returns results in input order.
Set max_tokens proportionally (e.g., batch_size * 100).
Log a warning on any fallback so degraded batches are visible in production logs.

tentacle/cli.py — Update _run_scan() to use filter_batch()

Replace the per-article filter_article() loop with a filter_batch() call over new_articles, then build relevant_articles from the results using the same threshold check.

Review Findings

Errors: 0
Warnings: 3
Nits: 4
Assessment: NEEDS CHANGES

The most impactful fix is #4 (clamp relevance scores in the batch path to match filter_article behavior). #2 (token budget) is a latent reliability issue that will cause silent cost waste. #1 is worth a defensive assertion. The code is otherwise well-structured with solid test coverage and good fallback design.

Replace per-article filter_article() loop in cli.py with filter_batch(), which sends article batches in a single LLM call. Falls back to individual filter_article() calls on full JSON parse failure or missing/out-of-range entries. Uses 1-based indices in prompts for LLM reliability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

foundatron merged commit 3d5b909 into main Mar 11, 2026
1 check passed

foundatron deleted the issue-7 branch March 11, 2026 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(filter): add batch filtering and confidence calibration#25

feat(filter): add batch filtering and confidence calibration#25
foundatron merged 1 commit intomainfrom
issue-7

foundatron commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

foundatron commented Mar 11, 2026

Changes

Review Findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant