Skip to content

feat: Implement content hashing and result caching with full output replay#42

Open
ayo6706 wants to merge 11 commits intomainfrom
ft/content-cache
Open

feat: Implement content hashing and result caching with full output replay#42
ayo6706 wants to merge 11 commits intomainfrom
ft/content-cache

Conversation

@ayo6706
Copy link
Collaborator

@ayo6706 ayo6706 commented Dec 18, 2025

This PR introduces a caching mechanism to significantly reduce LLM token costs and improve execution speed for unchanged content.

Key Changes

  • Cache Module: Added src/cache/ to handle SHA256 hashing of file content and prompt configurations (including metadata).
  • Orchestrator Integration:
    • evaluateFile now checks .vectorlint/cache.json before triggering LLM calls.
    • On cache hits, the system performs a full fidelity replay, printing all original issues (errors/warnings) and quality scores exactly as they appeared in the initial run.
    • On cache misses, results (including granular issue details and quality scores) are collected and persisted to disk.
  • CLI Options:
    • Added --full: Forces a full re-evaluation, ignoring the cache.
    • Added --no-cache: Disables reading from or writing to the cache.

Impact

  • Drastically reduces API costs for repeated runs on large documentation sets.
  • Maintains identical CLI output (including colors and formatting) regardless of whether a result is live or cached.

Summary by CodeRabbit

  • New Features

    • Persistent, versioned disk-backed result cache with safe load/save, replayed cached results, and cache-hit indicators.
    • Deterministic content and prompt hashing that produces compact cache keys.
    • CLI flags to force full evaluations or disable caching.
  • Types & Validation

    • Structured cache types and a comprehensive validation schema for cached data.
    • Minor widening of a public score field type for reporting.
  • Tests

    • Comprehensive tests for hashing, cache-key generation, persistence, and cache operations.

✏️ Tip: You can customize this high-level summary in your review settings.

- Create cache module with index, types, and store implementations
- Add CacheKey interface for unique cache entry identification using file path and content/prompt hashes
- Add CachedIssue interface for minimal issue data storage and replay
- Add CachedResult interface to store evaluation results with errors, warnings, and operational status
- Add CacheData interface for versioned cache entry management
- Add CacheOptions interface for cache configuration (enabled, forceFullRun, cacheDir)
- Update EvaluationOptions to include cacheEnabled and forceFullRun flags
- Update ProcessViolationsParams, ProcessCriterionParams, and ProcessPromptResultParams with optional issueCollector
- Update EvaluateFileParams to accept cacheStore, promptsHash, and useCache options
- Update EvaluateFileResult to include wasCacheHit flag for cache hit detection
- Add CLI schema options for --full and --noCache flags
- Enable caching support for improved performance on repeated evaluations
- Add CacheStore class for persistent cache management with disk I/O
- Stores cache in .vectorlint/cache.json by default
- Implements get, set, has, clear, and size operations
- Includes version checking for future cache migrations
- Gracefully handles corrupted cache files by starting fresh
- Only writes to disk when cache is modified (dirty flag)
- Add content hashing utilities for cache key generation
- Implement hashContent() to compute SHA256 of normalized file content
- Implement hashPrompts() to hash prompt configurations for rule change detection
- Implement createCacheKeyString() to combine file path and hashes into cache keys
- Normalize content by trimming whitespace and standardizing line endings
- Add error handling to prevent cache failures from blocking execution
- Add --full and --no-cache CLI options to control cache behavior
- Pass cacheEnabled and forceFullRun flags to orchestrator
- Implement cache store integration in evaluateFile function
- Add issueCollector parameter to reportIssue and related functions
- Filter runRules from scanPaths configuration before passing to orchestrator
- Create cache key from content and prompts hash for cache lookups
- Store and retrieve cached issues and scores from cache store
- Skip evaluation for cached files and merge cached results with new ones
- Add comprehensive cache store tests for validation and edge cases
@coderabbitai
Copy link

coderabbitai bot commented Dec 18, 2025

📝 Walkthrough

Walkthrough

Adds a disk-backed, versioned cache with content/prompt hashing, cache-key generation, new CLI flags (--full, --no-cache), and orchestrator integration to check, replay, and persist cached evaluation results (versioned .vectorlint/cache.json).

Changes

Cohort / File(s) Change Summary
Cache types
src/cache/types.ts
New exported interfaces: CacheKey, CachedIssue, CachedEvaluationSummary, CachedScore (optional components), CachedResult, CacheData, CacheOptions.
Cache storage implementation
src/cache/cache-store.ts
New exported CacheStore class (constructor (cwd?: string, cacheDir?: string)) with get, set, has, clear, size, save; versioned cache file, dirty-tracking, dir creation, resilient load/save and pretty JSON persistence.
Content & prompt hashing
src/cache/content-hasher.ts
New exports: hashContent(content), hashPrompts(prompts), createCacheKeyString(filePath, contentHash, promptsHash); normalizes input, SHA256 hex, truncates to 16 chars.
Cache barrel export
src/cache/index.ts
New barrel re-exporting ./types, ./content-hasher, and ./cache-store.
Orchestrator integration
src/cli/orchestrator.ts
Threads cache through evaluation: computes content/prompts hashes, builds cache keys, checks/replays cached issues & scores (adds wasCacheHit), sets cache on miss, persists cache after run; many helper signatures extended to accept issueCollector and cache params.
CLI wiring & commands
src/cli/commands.ts, src/schemas/cli-schemas.ts
Adds CLI options full and noCache (schema + flags), maps to forceFullRun and cacheEnabled, adjusts output format handling.
CLI types extended
src/cli/types.ts
Adds cacheEnabled?, forceFullRun? to EvaluationOptions; adds issueCollector?: CachedIssue[] to processing params; extends EvaluateFileParams (cacheStore, promptsHash, useCache) and EvaluateFileResult (wasCacheHit); adds scoreComponents? to ErrorTrackingResult.
Cache schema validation
src/schemas/cache-schema.ts
New exported CACHE_SCHEMA (zod) with nested schemas for issues, evaluation summaries, score components, cached scores, and cached results; top-level { version, entries }.
Reporter type tweak
src/output/reporter.ts
Broadened EvaluationSummary.score type to explicitly allow undefined (`score?: number
Tests
tests/cache-store.test.ts
New tests for hashContent, hashPrompts, createCacheKeyString, and CacheStore behaviors (persistence, dir creation, get/set/has/clear/size, and hashing invariants).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as CLI
    participant Orch as Orchestrator
    participant Hasher as Hasher
    participant Cache as CacheStore
    participant Eval as Evaluator

    CLI->>Orch: evaluateFiles(targets, options)
    Orch->>Cache: instantiate CacheStore (if cacheEnabled)
    Orch->>Hasher: hashPrompts(prompts)
    loop per file
        Orch->>Eval: read file content
        Eval-->>Orch: fileContent
        Orch->>Hasher: hashContent(fileContent)
        Orch->>Hasher: createCacheKeyString(filePath, contentHash, promptsHash)
        Orch->>Cache: has(cacheKey)?
        alt cache hit (useCache && !forceFullRun)
            Cache-->>Orch: true
            Orch->>Cache: get(cacheKey)
            Cache-->>Orch: CachedResult
            Orch->>Orch: replay issues & scores (wasCacheHit = true)
        else cache miss
            Orch->>Eval: run evaluation (prompts, file)
            Eval-->>Orch: issues, scores
            Orch->>Orch: collect issue data (issueCollector)
            Orch->>Cache: set(cacheKey, CachedResult)
        end
    end
    Orch->>Cache: save()
    Cache-->>Orch: persisted
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hop through files and hash each line,

I tuck results safe in a cache of mine,
Prompts and content snug, I store them tight,
Replay a whisper — or run it fresh tonight,
Nibbling bytes and saving time. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main objective: implementing a content hashing and caching mechanism with full output replay functionality, which aligns with the substantial changes across the cache module, orchestrator, and CLI integration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ft/content-cache

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7325aad and b77a753.

📒 Files selected for processing (1)
  • src/cli/commands.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cli/commands.ts

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
src/cache/content-hasher.ts (3)

9-9: Consider handling standalone CR characters in line ending normalization.

The current regex only converts CRLF (\r\n) to LF, but standalone CR characters (\r) would remain. For comprehensive normalization, consider replacing all \r occurrences:

-    const normalized = content.replace(/\r\n/g, '\n').trim();
+    const normalized = content.replace(/\r\n/g, '\n').replace(/\r/g, '\n').trim();

22-27: JSON.stringify on meta could produce non-deterministic output.

JSON.stringify(p.meta) may produce different string representations if object key ordering varies across JavaScript engines or versions, potentially causing cache misses for semantically identical metadata.

Consider using a deterministic JSON serialization approach or sorting keys explicitly:

🔎 View suggested fix
     const parts = sorted.map(p => ({
         id: p.id,
-        meta: JSON.stringify(p.meta),
+        meta: JSON.stringify(p.meta, Object.keys(p.meta).sort()),
         body: p.body.trim(),
         pack: p.pack || '',
     }));

43-43: Hash truncation reduces collision resistance.

Truncating SHA256 hashes to 16 characters (64 bits) significantly reduces collision resistance from 2^128 to 2^32 for birthday attacks. While this may be acceptable for a local cache with moderate file counts, consider the risk as the repository scales.

For repositories with thousands of files, consider increasing the truncation length to 24-32 characters for better collision resistance.

src/cli/commands.ts (1)

140-143: Extract duplicated scanPaths transformation logic.

The same scanPaths transformation pattern appears twice in this file. Consider extracting it into a helper function to reduce duplication and improve maintainability.

🔎 View suggested refactor

Add a helper function near the top of the file:

function normalizeScanPaths(scanPaths: FilePatternConfig[]) {
  return scanPaths.map(({ runRules, ...rest }) => ({
    ...rest,
    ...(runRules !== undefined ? { runRules } : {})
  }));
}

Then replace both occurrences:

-        scanPaths: config.scanPaths.map(({ runRules, ...rest }) => ({
-          ...rest,
-          ...(runRules !== undefined ? { runRules } : {})
-        })),
+        scanPaths: normalizeScanPaths(config.scanPaths),

Also applies to: 175-178

src/cache/cache-store.ts (2)

28-48: Add runtime validation for parsed cache data.

The JSON.parse result is cast directly to CacheData without runtime validation. If the cache file is corrupted or manually edited, this could cause runtime errors downstream.

Consider adding a validation function or using a schema validator (e.g., Zod) to ensure the loaded data conforms to the expected structure before casting:

🔎 View suggested improvement
private load(): CacheData {
    try {
        if (existsSync(this.cacheFile)) {
            const raw = readFileSync(this.cacheFile, 'utf-8');
            const parsed = JSON.parse(raw);
            
            // Basic validation
            if (!parsed || typeof parsed !== 'object' || 
                typeof parsed.version !== 'number' || 
                !parsed.entries || typeof parsed.entries !== 'object') {
                console.warn(`[vectorlint] Invalid cache structure, starting fresh`);
                return { version: CACHE_VERSION, entries: {} };
            }
            
            const cacheData = parsed as CacheData;

            // Version check for future migrations
            if (cacheData.version !== CACHE_VERSION) {
                console.warn(`[vectorlint] Cache version mismatch, clearing cache`);
                return { version: CACHE_VERSION, entries: {} };
            }

            return cacheData;
        }
    } catch (e) {
        // If cache is corrupted, start fresh
        const msg = e instanceof Error ? e.message : String(e);
        console.warn(`[vectorlint] Could not read cache (${msg}), starting fresh`);
    }

    return { version: CACHE_VERSION, entries: {} };
}

59-61: Consider using hasOwnProperty for cleaner key checking.

While the in operator works correctly for Records, using hasOwnProperty or Object.prototype.hasOwnProperty.call() is more explicit about checking own properties:

 has(key: string): boolean {
-    return key in this.data.entries;
+    return Object.prototype.hasOwnProperty.call(this.data.entries, key);
 }
src/cli/orchestrator.ts (1)

648-700: Duplicate hashContent computation.

hashContent(content) is computed at line 650 (cache lookup) and again at line 829 (cache storage). Compute it once and reuse:

🔎 Suggested fix
   const content = readFileSync(file, 'utf-8');
   const relFile = path.relative(process.cwd(), file) || file;
+  const contentHash = promptsHash ? hashContent(content) : '';

   // Check cache before running evaluations
   if (useCache && cacheStore && promptsHash) {
-    const contentHash = hashContent(content);
     const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
     const cached = cacheStore.get(cacheKey);
     // ... cache hit handling
   }

   // ... evaluation logic ...

   // Store result in cache
   if (cacheStore && promptsHash) {
-    const contentHash = hashContent(content);
     const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
     // ... cache storage
   }

Also applies to: 826-847

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 659ea2e and f523402.

📒 Files selected for processing (9)
  • src/cache/cache-store.ts (1 hunks)
  • src/cache/content-hasher.ts (1 hunks)
  • src/cache/index.ts (1 hunks)
  • src/cache/types.ts (1 hunks)
  • src/cli/commands.ts (3 hunks)
  • src/cli/orchestrator.ts (18 hunks)
  • src/cli/types.ts (6 hunks)
  • src/schemas/cli-schemas.ts (1 hunks)
  • tests/cache-store.test.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/cache-store.test.ts (3)
src/cache/content-hasher.ts (3)
  • hashContent (8-11)
  • hashPrompts (17-31)
  • createCacheKeyString (37-44)
src/cache/cache-store.ts (1)
  • CacheStore (13-90)
src/cache/types.ts (1)
  • CachedResult (37-47)
src/cli/types.ts (2)
src/cache/types.ts (1)
  • CachedIssue (16-25)
src/cache/cache-store.ts (1)
  • CacheStore (13-90)
src/cache/cache-store.ts (1)
src/cache/types.ts (2)
  • CacheData (49-52)
  • CachedResult (37-47)
src/cli/orchestrator.ts (5)
src/cli/types.ts (2)
  • ReportIssueParams (58-70)
  • EvaluateFileResult (147-150)
src/cache/types.ts (2)
  • CachedIssue (16-25)
  • CachedScore (33-36)
src/cache/content-hasher.ts (3)
  • hashContent (8-11)
  • createCacheKeyString (37-44)
  • hashPrompts (17-31)
src/output/reporter.ts (3)
  • printFileHeader (23-29)
  • EvaluationSummary (6-10)
  • printEvaluationSummaries (137-174)
src/cache/cache-store.ts (1)
  • CacheStore (13-90)
🔇 Additional comments (12)
src/cache/types.ts (1)

1-58: LGTM! Well-structured cache type definitions.

The cache type system is comprehensive and well-designed:

  • Clear separation between cache keys, cached data, and cache configuration
  • CachedIssue captures all necessary fields for replay
  • CachedResult includes both granular details and summary metrics
  • Version field in CacheData enables future migrations
  • Optional fields allow flexibility in what gets cached
src/cache/index.ts (1)

1-3: LGTM! Clean barrel export.

src/schemas/cli-schemas.ts (1)

13-14: LGTM! CLI options properly defined.

The new full and noCache options are correctly defined with appropriate boolean defaults.

src/cli/commands.ts (2)

30-31: LGTM! Cache control flags properly added.

The new CLI options align with the caching feature requirements.


179-180: LGTM! Cache options correctly propagated.

The inversion of noCache to cacheEnabled and direct mapping of full to forceFullRun are correct.

src/cache/cache-store.ts (1)

72-89: LGTM! Save method has appropriate error handling.

The save method correctly:

  • Checks dirty flag to avoid unnecessary writes
  • Creates cache directory with recursive option
  • Formats JSON with indentation for readability
  • Handles errors gracefully without failing the main run
tests/cache-store.test.ts (2)

9-103: LGTM! Comprehensive hashing utility tests.

The test suite thoroughly validates:

  • Hash consistency for identical inputs
  • Line ending normalization (CRLF vs LF)
  • Whitespace trimming
  • Hash differences for different content
  • Prompt ordering invariance
  • Detection of changes in prompt body and metadata
  • Cache key truncation behavior

105-256: LGTM! Thorough CacheStore integration tests.

The test suite provides excellent coverage of CacheStore functionality:

  • Retrieval of missing keys
  • Store and retrieve operations
  • Disk persistence and reload
  • Automatic cache directory creation
  • Clear operations
  • Size reporting
  • Key existence checking

The proper setup/teardown ensures test isolation and prevents side effects.

src/cli/types.ts (1)

12-12: LGTM! Cache integration types are well-structured.

The type extensions cleanly integrate caching into the evaluation pipeline:

  • issueCollector is threaded through the processing pipeline for cache replay
  • EvaluateFileParams gains the necessary cache-related fields (cacheStore, promptsHash, useCache)
  • wasCacheHit flag enables tracking and reporting of cache effectiveness
  • All additions are optional, maintaining backward compatibility

The type design supports the full cache functionality while keeping the changes minimal and focused.

Also applies to: 30-31, 95-95, 104-104, 122-122, 142-144, 149-149

src/cli/orchestrator.ts (3)

59-85: LGTM!

The optional issueCollector pattern is clean and non-intrusive. The collected issue data correctly matches the CachedIssue interface.


827-847: Verify: --full updates cache despite bypassing reads.

When --full is used, useCache is false (line 872), so cache lookups are skipped. However, results are still written to cache (line 828 only checks cacheStore && promptsHash). If the intent is for --full to refresh the cache, this is correct. If --full should bypass caching entirely, add a guard:

-  if (cacheStore && promptsHash) {
+  if (useCache && cacheStore && promptsHash) {

857-935: LGTM!

The cache initialization logic correctly handles the three modes:

  • --no-cache: No CacheStore created, caching fully disabled.
  • Normal: Full cache read/write.
  • --full: Cache reads bypassed, but fresh results still written.

The verbose logging and cache hit tracking are well implemented.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/cache/cache-store.ts (1)

76-93: Consider atomic write pattern for cache persistence.

The current implementation writes directly to cache.json (lines 85-86), which could leave the cache file in a corrupted state if the process is interrupted (e.g., Ctrl+C) or crashes during the write operation.

Using a write-then-rename pattern would make the save operation atomic and prevent cache corruption.

🔎 View atomic write pattern
 save(): void {
     if (!this.dirty) return;

     try {
         // Create cache directory if missing
         if (!existsSync(this.cacheDir)) {
             mkdirSync(this.cacheDir, { recursive: true });
         }

         const json = JSON.stringify(this.data, null, 2);
-        writeFileSync(this.cacheFile, json, 'utf-8');
+        // Atomic write: write to temp file, then rename
+        const tempFile = `${this.cacheFile}.tmp`;
+        writeFileSync(tempFile, json, 'utf-8');
+        
+        // Rename is atomic on most filesystems
+        const { renameSync } = require('fs');
+        renameSync(tempFile, this.cacheFile);
+        
         this.dirty = false;
     } catch (e) {
         // Don't fail the run if cache can't be written
         const msg = e instanceof Error ? e.message : String(e);
         console.warn(`[vectorlint] Warning: Could not save cache: ${msg}`);
     }
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f523402 and eef5729.

📒 Files selected for processing (1)
  • src/cache/cache-store.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cache/cache-store.ts (1)
src/cache/types.ts (2)
  • CacheData (49-52)
  • CachedResult (37-47)
🔇 Additional comments (3)
src/cache/cache-store.ts (3)

1-11: LGTM! Clean setup with versioning support.

The imports and constants are well-structured. The cache versioning mechanism (line 9) is a good practice for future-proofing schema changes.


23-27: Constructor properly initializes paths and state.

The use of path.resolve and path.join ensures safe path handling, and loading the cache during construction is appropriate for this use case.


54-74: CRUD methods are well-implemented.

The cache interface methods are clean and correctly manage the dirty flag. The use of the in operator for has() (line 64) properly checks for key existence including keys with undefined values.

- Add CachedScoreComponent interface to store minimal score data for replay
- Extend CachedScore interface with optional components field for JSON output
- Add scoreComponents field to ErrorTrackingResult for tracking score data
- Update routePromptResult to initialize and populate scoreComponents
- Modify evaluateFile to handle cached score replay for both Line and JSON formats
- Refactor allScores map to use CachedScore objects instead of raw summaries
- Simplify cache storage by converting map to array using Array.from()
- Enable JSON formatter to output evaluation scores from cached data
- Improve cache replay logic to support multiple output formats
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eef5729 and 219d150.

📒 Files selected for processing (3)
  • src/cache/types.ts (1 hunks)
  • src/cli/orchestrator.ts (20 hunks)
  • src/cli/types.ts (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/cli/types.ts
  • src/cache/types.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/cli/orchestrator.ts (7)
src/cli/types.ts (2)
  • ReportIssueParams (59-71)
  • EvaluateFileResult (148-151)
src/cache/types.ts (2)
  • CachedIssue (16-25)
  • CachedScore (49-53)
src/cache/content-hasher.ts (3)
  • hashContent (8-11)
  • createCacheKeyString (37-44)
  • hashPrompts (17-31)
src/output/reporter.ts (2)
  • printFileHeader (23-29)
  • EvaluationSummary (6-10)
src/output/json-formatter.ts (1)
  • JsonFormatter (54-94)
src/output/rdjson-formatter.ts (1)
  • RdJsonFormatter (54-134)
src/cache/cache-store.ts (1)
  • CacheStore (17-94)
🪛 GitHub Actions: Lint
src/cli/orchestrator.ts

[error] 822-822: ESLint: Unsafe assignment of an any value. @typescript-eslint/no-unsafe-assignment. Command failed: npm run lint.

🪛 GitHub Check: ESLint
src/cli/orchestrator.ts

[failure] 822-822:
Unexpected any. Specify a different type


[failure] 822-822:
Unsafe assignment of an any value

🔇 Additional comments (6)
src/cli/orchestrator.ts (6)

16-16: LGTM - Cache module integration.

The new cache imports are properly structured and all items are actively used throughout the orchestrator for the caching mechanism.


59-85: Well-structured dual-purpose reporting.

The modification to accept an optional issueCollector cleanly separates caching concerns from reporting, allowing the same function to both report issues and accumulate them for cache storage.


182-408: Correct parameter threading for cache collection.

The issueCollector parameter is properly threaded through the helper functions and consistently passed to all reportIssue calls, maintaining clean data flow for cache accumulation.


482-595: Proper integration of cache collection in result routing.

The function correctly propagates issueCollector through all processing paths and extends return values to include scoreComponents, enabling complete cache storage of evaluation results.


650-711: Cache replay implementation looks complete.

The cache check and replay logic properly handles both Line and JSON output formats. The JSON format score replay (lines 690-699) addresses the previous review comment by replaying cached components via addEvaluationScore, ensuring JSON output includes evaluation scores on cache hits.


876-950: Solid cache management implementation.

The cache initialization, file processing loop, and persistence logic are well-structured:

  • Conditional initialization based on cacheEnabled flag
  • Efficient single computation of promptsHash for all files
  • useCache properly respects both cacheEnabled and forceFullRun flags
  • Cache persistence after processing all files
  • Good observability with verbose logging

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/schemas/cache-schema.ts (1)

3-6: Define a proper schema for cache entries instead of using z.any().

Using z.any() for the entries values defeats type safety and prevents validation of the cached data structure. Based on the cache types, entries should validate the CachedResult structure (errors, warnings, issues, scores, etc.).

🔎 Proposed schema with proper validation
+const CACHED_ISSUE_SCHEMA = z.object({
+  line: z.number(),
+  column: z.number(),
+  severity: z.string(),
+  summary: z.string(),
+  ruleName: z.string(),
+  suggestion: z.string().optional(),
+  scoreText: z.string().optional(),
+  match: z.string().optional()
+});
+
+const CACHED_SCORE_COMPONENT_SCHEMA = z.object({
+  criterion: z.string(),
+  rawScore: z.number(),
+  maxScore: z.number(),
+  weightedScore: z.number(),
+  weightedMaxScore: z.number(),
+  normalizedScore: z.number(),
+  normalizedMaxScore: z.number()
+});
+
+const CACHED_EVALUATION_SUMMARY_SCHEMA = z.object({
+  id: z.string(),
+  scoreText: z.string(),
+  score: z.number().optional()
+});
+
+const CACHED_SCORE_SCHEMA = z.object({
+  ruleName: z.string(),
+  items: z.array(CACHED_EVALUATION_SUMMARY_SCHEMA),
+  components: z.array(CACHED_SCORE_COMPONENT_SCHEMA).optional()
+});
+
+const CACHED_RESULT_SCHEMA = z.object({
+  errors: z.number(),
+  warnings: z.number(),
+  hadOperationalErrors: z.boolean(),
+  hadSeverityErrors: z.boolean(),
+  requestFailures: z.number(),
+  issues: z.array(CACHED_ISSUE_SCHEMA),
+  scores: z.array(CACHED_SCORE_SCHEMA),
+  timestamp: z.number()
+});
+
 export const CACHE_SCHEMA = z.object({
     version: z.number(),
-    entries: z.record(z.string(), z.any())
+    entries: z.record(z.string(), CACHED_RESULT_SCHEMA)
 });

This ensures cached data integrity and catches corruption or schema mismatches during load.

src/cli/orchestrator.ts (1)

650-711: Optimize: Avoid duplicate cache key calculation.

The cache key (contentHash and cacheKey) is computed twice: once for the cache check (lines 652-653) and again for cache storage (lines 847-848). Consider computing once and reusing.

🔎 Suggested optimization
  const { file, options, jsonFormatter, cacheStore, promptsHash, useCache } = params;
  const { prompts, provider, searchProvider, concurrency, scanPaths, outputFormat = OutputFormat.Line } = options;

  const content = readFileSync(file, 'utf-8');
  const relFile = path.relative(process.cwd(), file) || file;

+ // Compute cache key once if caching is enabled
+ let contentHash: string | undefined;
+ let cacheKey: string | undefined;
+ if (cacheStore && promptsHash) {
+   contentHash = hashContent(content);
+   cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
+ }
+
  // Check cache before running evaluations
- if (useCache && cacheStore && promptsHash) {
-   const contentHash = hashContent(content);
-   const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
+ if (useCache && cacheStore && cacheKey) {
    const cached = cacheStore.get(cacheKey);

    if (cached) {
      // ... cache replay logic ...
    }
  }

  // ... evaluation logic ...

  // Store result in cache
- if (cacheStore && promptsHash) {
-   const contentHash = hashContent(content);
-   const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
-
+ if (cacheStore && cacheKey) {
    const cachedScores: CachedScore[] = Array.from(allScores.values());

    cacheStore.set(cacheKey, {
      // ... cached result ...
    });
  }

Also applies to: 845-862

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 219d150 and 994a77a.

📒 Files selected for processing (4)
  • src/cache/cache-store.ts (1 hunks)
  • src/cache/types.ts (1 hunks)
  • src/cli/orchestrator.ts (20 hunks)
  • src/schemas/cache-schema.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/cache/cache-store.ts
  • src/cache/types.ts
🔇 Additional comments (6)
src/cli/orchestrator.ts (6)

16-16: Cache imports look good.

The imports are appropriate for the caching functionality being added.


59-85: Issue collection implementation is clean.

The optional issueCollector parameter maintains backward compatibility while enabling cache functionality. The collected fields align with the CachedIssue type.


183-227: Consistent threading of issue collector.

The issueCollector parameter is properly threaded through the evaluation pipeline (locateAndReportViolations, extractAndReportCriterion, routePromptResult, and reportIssue calls), maintaining its optional nature throughout.

Also applies to: 239-387, 483-594


650-711: Cache replay logic is well-implemented.

The cache hit path properly:

  • Computes the cache key from content and prompts hashes
  • Replays cached issues with all original fields
  • Handles score replay for both Line and JSON output formats
  • Returns the cached result with wasCacheHit: true

718-825: Cache data collection is correctly implemented.

The code properly:

  • Initializes the issueCollector array for capturing issues
  • Types allScores as Map<string, CachedScore> to store score data
  • Threads issueCollector through the evaluation pipeline
  • Stores score components alongside score entries for complete cache fidelity

Note: The past review comment about an unsafe type assertion at line 822 appears to have been resolved—the current code uses proper TypeScript optional spreading without as any.


876-950: Cache initialization and persistence are well-structured.

The implementation correctly:

  • Determines useCache based on cacheEnabled and forceFullRun flags
  • Initializes the CacheStore and computes promptsHash once for all files
  • Threads cache parameters through to evaluateFile
  • Tracks cache hit statistics
  • Persists the cache after all evaluations complete
  • Provides verbose logging when enabled

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/output/reporter.ts (1)

9-9: Redundant explicit undefined in optional property type.

The optional property syntax score?: number already permits undefined as a value. Explicitly adding | undefined is redundant in TypeScript and doesn't change the type's behavior.

🔎 Simplify the type annotation
-  score?: number | undefined;
+  score?: number;
src/schemas/cache-schema.ts (4)

6-15: Consider adding validation constraints to line and column numbers.

The line and column fields accept any number, including negatives or zero. Adding minimum constraints would prevent semantically invalid values.

🔎 Suggested refinements
 const CACHED_ISSUE_SCHEMA = z.object({
-    line: z.number(),
-    column: z.number(),
+    line: z.number().int().min(1),
+    column: z.number().int().min(1),
     severity: z.nativeEnum(Severity),
     summary: z.string(),
     ruleName: z.string(),
     suggestion: z.string().optional(),
     scoreText: z.string().optional(),
     match: z.string().optional(),
 });

17-40: Score schemas are well-structured.

The evaluation summary, score component, and grouped score schemas provide good granularity for cached results. The hierarchical composition is clean.

If you want additional runtime safety, consider adding range constraints to numeric score fields (e.g., z.number().min(0) for scores that shouldn't be negative), but this is not essential if upstream logic already enforces valid ranges.


42-54: CACHED_RESULT_SCHEMA structure aligns well with PR objectives.

The schema correctly captures evaluation results including errors, warnings, operational state, and optional detailed outputs. Using z.unknown() for jsonOutput is the right approach for arbitrary JSON structures.

Consider adding z.number().int().nonnegative() constraints to errors, warnings, and requestFailures to prevent invalid negative counts. For timestamp, you could add .positive() to ensure it's a valid Unix timestamp, but these are defensive refinements rather than critical fixes.


56-59: CACHE_SCHEMA provides a clean, versioned cache structure.

The top-level schema with version and entries map is well-designed for managing cached evaluation results. Exporting only the top-level schema maintains good encapsulation.

Consider adding .int().positive() to the version field to enforce valid version numbers:

🔎 Optional refinement
 export const CACHE_SCHEMA = z.object({
-    version: z.number(),
+    version: z.number().int().positive(),
     entries: z.record(z.string(), CACHED_RESULT_SCHEMA)
 });
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 994a77a and 8608786.

📒 Files selected for processing (5)
  • src/cache/cache-store.ts (1 hunks)
  • src/cache/content-hasher.ts (1 hunks)
  • src/cache/types.ts (1 hunks)
  • src/output/reporter.ts (1 hunks)
  • src/schemas/cache-schema.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/cache/cache-store.ts
  • src/cache/types.ts
  • src/cache/content-hasher.ts

@@ -0,0 +1,59 @@
import { z } from 'zod';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Zod 3.25.76 has critical TypeScript compatibility issues—consider downgrading or upgrading to Zod 4.

Zod version 3.25.76 introduced breaking changes that require TypeScript 5+ and drop support for TypeScript 4.9. No direct security vulnerabilities exist in this version, but the bundle size is significantly increased because Zod 4 is bundled within Zod 3. Either downgrade to 3.22.4 (the last compatible version) if your project targets TypeScript 4.9, or upgrade to Zod 4 for the latest stability and performance improvements.

🤖 Prompt for AI Agents
In src/schemas/cache-schema.ts around line 1, the project currently imports Zod;
the review warns that Zod 3.25.76 is incompatible with TypeScript <5 and bloated
due to bundled Zod 4 — either pin Zod to 3.22.4 if you must remain on TypeScript
4.9 or upgrade to Zod 4.x: update package.json (change version), run npm/yarn
install to refresh lockfile, then if upgrading to Zod 4, update any Zod API
changes in code (adjust imports/validation calls/types as needed) and ensure
CI/tsconfig target is set to TS5+ before running tests and typechecks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfd1a53 and 7325aad.

📒 Files selected for processing (1)
  • src/cli/commands.ts
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

  • src/cli/commands.ts
🧠 Learnings (4)
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/boundaries/env-parser.ts : All environment variables must be validated via Zod schemas in `src/boundaries/env-parser.ts`

Applied to files:

  • src/cli/commands.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Recursively load all `.md` files from within each rule pack; multiple packs can be used simultaneously

Applied to files:

  • src/cli/commands.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Organize rules into subdirectories (packs) within RulesPath; use company names (e.g., Acme, TechCorp, Stripe) to indicate which style guide the rules implement

Applied to files:

  • src/cli/commands.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to .vectorlint.ini : Use `[glob/pattern]` sections in `.vectorlint.ini` to specify which packs run on which files

Applied to files:

  • src/cli/commands.ts
🧬 Code graph analysis (1)
src/cli/commands.ts (5)
src/config/constants.ts (1)
  • DEFAULT_CONFIG_FILENAME (6-6)
src/boundaries/cli-parser.ts (1)
  • parseCliOptions (4-15)
src/errors/index.ts (1)
  • handleUnknownError (46-51)
src/providers/search-provider.ts (1)
  • SearchProvider (5-7)
src/providers/perplexity-provider.ts (1)
  • PerplexitySearchProvider (11-54)
🔇 Additional comments (2)
src/cli/commands.ts (2)

180-188: LGTM: OutputFormat mapping is well-implemented.

The mapping from CLI string options to the OutputFormat enum is clean and includes a sensible default fallback to OutputFormat.Line. This provides type safety and integrates well with the conditional summary printing logic.


203-204: LGTM: Cache integration is correctly implemented.

The new cache-related options (cacheEnabled and forceFullRun) are properly derived from CLI flags and passed to the orchestrator. The conditional summary printing for OutputFormat.Line ensures structured output formats remain clean.

Also applies to: 208-208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants