Rule Batching Optimization (Implemented & Disabled)#54
Draft
Rule Batching Optimization (Implemented & Disabled)#54
Conversation
- Add batchRules and maxRulesPerBatch options to EvaluationOptions interface - Create EvaluateBatchedRulesParams interface for batched evaluation parameters - Create EvaluateBatchedRulesResult interface to track batched evaluation results - Implement buildBatchedCheckLLMSchema() function to generate JSON schema for multiple rules - Add BatchedCheckLLMResult type to represent LLM output for batched evaluations - Enable evaluating multiple rules in a single LLM call for improved efficiency
- Create new BatchedPromptBuilder module to combine multiple Check rules into single prompts - Implement formatRuleForBatch() to format individual rules with clear task numbering - Add buildBatchedCheckPrompt() to construct system prompts with rule preamble and verification - Implement extractBatchedRuleContexts() to extract essential rule metadata from PromptFile objects - Add groupIntoBatches() utility to partition rules into configurable batch sizes - Define BatchedRuleContext interface for rule metadata (id, name, body) - Include comprehensive system preamble with mission, protocol, and output format instructions - Enables efficient batch evaluation of multiple rules in a single LLM call
- Implement BatchedCheckEvaluator class to process multiple Check-type rules in single LLM calls - Add support for batching rules to reduce token usage by sending content only once per batch - Implement rule batching logic with configurable max rules per batch to mitigate "lost in the middle" problem - Add content chunking with configurable thresholds (600 word threshold, 500 word max chunk size) - Integrate with batched prompt builder and schema for structured LLM responses - Add violation merging and score calculation across multiple chunks and batches - Include comprehensive TODO noting that batching showed 60-90% accuracy loss in validation and is currently disabled by default - Add token usage aggregation and distribution across evaluated rules - Support document-level evaluation mode that disables chunking when required
- Add `batchRules` boolean flag to enable/disable batched rule evaluation - Add `maxRulesPerBatch` integer option to control batch size (1-20, default 5) - These options allow users to optimize rule evaluation performance by processing multiple rules in parallel batches
- Add BatchRules and MaxRulesPerBatch configuration keys to config loader - Parse BatchRules as boolean from config file (accepts "true", "1", or "false") - Parse MaxRulesPerBatch as integer with validation (must be between 1 and 20) - Propagate batch configuration options through CLI commands to evaluation context - Export BatchedCheckEvaluator and related utilities from evaluators module - Update module documentation to reference batched rule evaluation capability - Enables users to configure batch evaluation behavior via configuration file
- Create new script to compare batched vs non-batched rule evaluation results - Implement violation key normalization for accurate comparison across runs - Add support for automatic baseline and batched evaluation execution - Include detailed accuracy reporting with overlap percentage metrics - Support both auto mode (runs evaluations) and manual mode (provides instructions) - Add JSON output format for programmatic result processing - Include verbose mode for detailed violation comparison analysis - Validate that batching optimization doesn't degrade evaluation quality
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the infrastructure for Rule Batching Optimization, a strategy to evaluate multiple rules in a single LLM call to reduce token usage and latency.
Implementation Details
1. New Core Components
BatchedCheckEvaluator: A new evaluator class that:BatchedCheckLLMResult) to map violations back to specific Rule IDs.BatchedPromptBuilder: Dynamically assembles multiple rule definitions into a single system prompt using a "Task-Based" structure.2. Orchestrator Updates
evaluateFileinsrc/cli/orchestrator.tsto:3. Configuration
BatchRules(boolean) andMaxRulesPerBatch(number) to.vectorlint.ini.false(Safety first).Validation & Analysis
Conducted analysis validation experiments to measure accuracy before enabling the feature.
Methodology
tests/fixtures/ai-pattern/negation-pattern.md(complex content with negation patterns)BatchRules=false(Baseline) vsBatchRules=true(Experiment)Experiments & Findings
Key Metrics Comparison
Root Cause Analysis
Why Batching Fails
Lost in the Middle: Complex rules (negation-contrast patterns, structural analysis) are systematically missed when multiple rules compete for attention in one prompt.
Context Bleed (Hallucination): The model applies the logic of one rule to another, creating false positives where the "sentiment" of Rule A bleeds into Rule B's evaluation.
Inconsistent Rule Application: The Repetition rule found 7 issues in batched mode vs 2 in baseline - the model's interpretation varies significantly based on prompt structure.
Batch Size Impact
Detailed Examples of Missed Findings
Process Note: "Implement to Validate"
The original plan required validation before core engineering. However, to scientifically validate the batching hypothesis, the batching infrastructure (Evaluator, Prompt Builder, Schema) had to be built first to run the experiments.
Therefore, this PR includes the full implementation of the optimization, but respects the quality gate by shipping it in a disabled state. This preserves the engineering work for future R&D (e.g., when models improve) without risking production quality.
Future Improvements
If batching is revisited, consider:
Artifacts
BATCHING_COMPARISON_REPORT_V5.md: Batch size 4 analysis with full findings and descriptionsBATCHING_COMPARISON_REPORT_V6.md: Batch size 2 analysis showing improved overlap (59% vs 37.5%)scripts/measure-batching-accuracy.ts: Reusable validation toolSummary
Recommendation: Feature remains DISABLED (
BatchRules=false). The infrastructure is preserved for future experimentation when LLM capabilities improve.