Rule Batching Optimization (Implemented & Disabled) by ayo6706 · Pull Request #54 · TRocket-Labs/vectorlint

ayo6706 · 2026-01-13T14:28:45Z

This PR implements the infrastructure for Rule Batching Optimization, a strategy to evaluate multiple rules in a single LLM call to reduce token usage and latency.

Status: The feature is DISABLED BY DEFAULT (BatchRules=false).

Reason: Extensive validation confirmed that batching significantly degrades accuracy due to LLM cognitive load limitations ("Lost in the Middle" phenomenon).

Implementation Details

1. New Core Components

BatchedCheckEvaluator: A new evaluator class that:
- Chunks content once.
- Constructs a single "Multi-Rule" prompt.
- Parses a new JSON schema (BatchedCheckLLMResult) to map violations back to specific Rule IDs.
BatchedPromptBuilder: Dynamically assembles multiple rule definitions into a single system prompt using a "Task-Based" structure.

2. Orchestrator Updates

Modified evaluateFile in src/cli/orchestrator.ts to:
- Partition rules into Batchable (Check rules, Base evaluator) and Non-Batchable (Judge rules, Custom evaluators).
- Route batchable rules to the new evaluator.
- Gracefully fall back to individual evaluation if the batch request fails.

3. Configuration

Added BatchRules (boolean) and MaxRulesPerBatch (number) to .vectorlint.ini.
Default: false (Safety first).

Validation & Analysis

Conducted analysis validation experiments to measure accuracy before enabling the feature.

Methodology

Test File: tests/fixtures/ai-pattern/negation-pattern.md (complex content with negation patterns)
Model: gpt-5.1
Process: Compared BatchRules=false (Baseline) vs BatchRules=true (Experiment)
Metric: Intersection of findings by Rule ID + Line Number
Targets: >95% overlap with baseline, >50% token reduction

Experiments & Findings

Report	Batch Size	Overlap	Token Reduction	Hallucinations	Status
V5	4 rules	37.5%	37%	~9%	FAIL
V6	2 rules	59%	21%	~5%	FAIL
Baseline	1 rule	100%	0%	0%	PASS

Key Metrics Comparison

Metric	Baseline	Batch=4 (V5)	Batch=2 (V6)
Warnings Found	32	34	37
Input Tokens	50,570	31,996	39,868
LLM Requests	24	6	12
Missed Findings	0	18	13

Root Cause Analysis

Why Batching Fails

Lost in the Middle: Complex rules (negation-contrast patterns, structural analysis) are systematically missed when multiple rules compete for attention in one prompt.
Context Bleed (Hallucination): The model applies the logic of one rule to another, creating false positives where the "sentiment" of Rule A bleeds into Rule B's evaluation.
Inconsistent Rule Application: The Repetition rule found 7 issues in batched mode vs 2 in baseline - the model's interpretation varies significantly based on prompt structure.

Batch Size Impact

Batch Size	Observation
4 rules	Missed 18 findings (56% loss), especially negation patterns on lines 3, 37, 58, 130, 136
2 rules	Missed 13 findings (41% loss), recovered negation patterns on line 3, but still missed lines 37, 58, 130, 136
1 rule (baseline)	All findings recovered, no "lost in the middle" effect

Detailed Examples of Missed Findings

Line:Col	Rule	Quoted Text	Description	Missed By
1:102	AIPattern	"don't just need tools, they need integrated platforms"	Rhetorical structure adds flair but no substance	Both V5 & V6
3:15	AIPattern	"doesn't simply improve productivity"	Introduces and dismisses idea never discussed	V5 only
37:1	AIPattern	"isn't just a comment bot"	Negation-contrast without prior setup	Both V5 & V6
58:323	AIPattern	"doesn't have X, but lacks Y"	Redundant negative contrasts	Both V5 & V6
130:50	AIPattern	"Instead of trying to do everything"	Artificial contrast	Both V5 & V6
136:31	AIPattern	"doesn't generate, doesn't provide, doesn't help"	Repeated "doesn'ts" - templated AI phrasing	Both V5 & V6

Process Note: "Implement to Validate"

The original plan required validation before core engineering. However, to scientifically validate the batching hypothesis, the batching infrastructure (Evaluator, Prompt Builder, Schema) had to be built first to run the experiments.

Therefore, this PR includes the full implementation of the optimization, but respects the quality gate by shipping it in a disabled state. This preserves the engineering work for future R&D (e.g., when models improve) without risking production quality.

Future Improvements

If batching is revisited, consider:

Rule-type-aware batching: Only batch simple buzzword rules together; keep complex structural rules individual
Hybrid approach: Use batching for first-pass scanning, then verify edge cases individually
Prompt engineering: Experiment with stronger rule separation in the prompt format
Model improvements: Test with newer models that may have better "lost in the middle" handling

Artifacts

BATCHING_COMPARISON_REPORT_V5.md: Batch size 4 analysis with full findings and descriptions
BATCHING_COMPARISON_REPORT_V6.md: Batch size 2 analysis showing improved overlap (59% vs 37.5%)
scripts/measure-batching-accuracy.ts: Reusable validation tool

Summary

Criterion	Target	V5 (Batch=4)	V6 (Batch=2)	Verdict
Overlap with Baseline	>95%	37.5%	59%	FAIL
Token Reduction	>50%	37%	21%	FAIL
Hallucination Rate	0%	~9%	~5%	MARGINAL

Recommendation: Feature remains DISABLED (BatchRules=false). The infrastructure is preserved for future experimentation when LLM capabilities improve.

- Add batchRules and maxRulesPerBatch options to EvaluationOptions interface - Create EvaluateBatchedRulesParams interface for batched evaluation parameters - Create EvaluateBatchedRulesResult interface to track batched evaluation results - Implement buildBatchedCheckLLMSchema() function to generate JSON schema for multiple rules - Add BatchedCheckLLMResult type to represent LLM output for batched evaluations - Enable evaluating multiple rules in a single LLM call for improved efficiency

- Create new BatchedPromptBuilder module to combine multiple Check rules into single prompts - Implement formatRuleForBatch() to format individual rules with clear task numbering - Add buildBatchedCheckPrompt() to construct system prompts with rule preamble and verification - Implement extractBatchedRuleContexts() to extract essential rule metadata from PromptFile objects - Add groupIntoBatches() utility to partition rules into configurable batch sizes - Define BatchedRuleContext interface for rule metadata (id, name, body) - Include comprehensive system preamble with mission, protocol, and output format instructions - Enables efficient batch evaluation of multiple rules in a single LLM call

- Implement BatchedCheckEvaluator class to process multiple Check-type rules in single LLM calls - Add support for batching rules to reduce token usage by sending content only once per batch - Implement rule batching logic with configurable max rules per batch to mitigate "lost in the middle" problem - Add content chunking with configurable thresholds (600 word threshold, 500 word max chunk size) - Integrate with batched prompt builder and schema for structured LLM responses - Add violation merging and score calculation across multiple chunks and batches - Include comprehensive TODO noting that batching showed 60-90% accuracy loss in validation and is currently disabled by default - Add token usage aggregation and distribution across evaluated rules - Support document-level evaluation mode that disables chunking when required

- Add `batchRules` boolean flag to enable/disable batched rule evaluation - Add `maxRulesPerBatch` integer option to control batch size (1-20, default 5) - These options allow users to optimize rule evaluation performance by processing multiple rules in parallel batches

- Add BatchRules and MaxRulesPerBatch configuration keys to config loader - Parse BatchRules as boolean from config file (accepts "true", "1", or "false") - Parse MaxRulesPerBatch as integer with validation (must be between 1 and 20) - Propagate batch configuration options through CLI commands to evaluation context - Export BatchedCheckEvaluator and related utilities from evaluators module - Update module documentation to reference batched rule evaluation capability - Enables users to configure batch evaluation behavior via configuration file

… handling

- Create new script to compare batched vs non-batched rule evaluation results - Implement violation key normalization for accurate comparison across runs - Add support for automatic baseline and batched evaluation execution - Include detailed accuracy reporting with overlap percentage metrics - Support both auto mode (runs evaluations) and manual mode (provides instructions) - Add JSON output format for programmatic result processing - Include verbose mode for detailed violation comparison analysis - Validate that batching optimization doesn't degrade evaluation quality

…matching

coderabbitai · 2026-01-13T14:28:52Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ayo6706 added 10 commits January 13, 2026 13:28

fix(schemas): Allow null values for OpenAI system_fingerprint field

559ed89

feat(orchestrator): Add batched rule evaluation support with fallback…

fbbd86c

… handling

chore: Update report

d9cec6a

feat(scripts): Improve batching accuracy measurement with line-based …

3b21d35

…matching

Update batch rule prompt and generate report

d0805b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule Batching Optimization (Implemented & Disabled)#54

Rule Batching Optimization (Implemented & Disabled)#54
ayo6706 wants to merge 11 commits intomainfrom
ft/rule-batch-optimaztion

ayo6706 commented Jan 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 13, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ayo6706 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Details

1. New Core Components

2. Orchestrator Updates

3. Configuration

Validation & Analysis

Methodology

Experiments & Findings

Key Metrics Comparison

Root Cause Analysis

Why Batching Fails

Batch Size Impact

Detailed Examples of Missed Findings

Process Note: "Implement to Validate"

Future Improvements

Artifacts

Summary

Uh oh!

coderabbitai bot commented Jan 13, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ayo6706 commented Jan 13, 2026 •

edited

Loading