feat: Implement content hashing and result caching with full output replay by ayo6706 · Pull Request #42 · TRocket-Labs/vectorlint

ayo6706 · 2025-12-18T19:02:34Z

This PR introduces a caching mechanism to significantly reduce LLM token costs and improve execution speed for unchanged content.

Key Changes

Cache Module: Added src/cache/ to handle SHA256 hashing of file content and prompt configurations (including metadata).
Orchestrator Integration:
- evaluateFile now checks .vectorlint/cache.json before triggering LLM calls.
- On cache hits, the system performs a full fidelity replay, printing all original issues (errors/warnings) and quality scores exactly as they appeared in the initial run.
- On cache misses, results (including granular issue details and quality scores) are collected and persisted to disk.
CLI Options:
- Added --full: Forces a full re-evaluation, ignoring the cache.
- Added --no-cache: Disables reading from or writing to the cache.

Impact

Drastically reduces API costs for repeated runs on large documentation sets.
Maintains identical CLI output (including colors and formatting) regardless of whether a result is live or cached.

Summary by CodeRabbit

New Features
- Persistent, versioned disk-backed result cache with safe load/save, replayed cached results, and cache-hit indicators.
- Deterministic content and prompt hashing that produces compact cache keys.
- CLI flags to force full evaluations or disable caching.
Types & Validation
- Structured cache types and a comprehensive validation schema for cached data.
- Minor widening of a public score field type for reporting.
Tests
- Comprehensive tests for hashing, cache-key generation, persistence, and cache operations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Create cache module with index, types, and store implementations - Add CacheKey interface for unique cache entry identification using file path and content/prompt hashes - Add CachedIssue interface for minimal issue data storage and replay - Add CachedResult interface to store evaluation results with errors, warnings, and operational status - Add CacheData interface for versioned cache entry management - Add CacheOptions interface for cache configuration (enabled, forceFullRun, cacheDir) - Update EvaluationOptions to include cacheEnabled and forceFullRun flags - Update ProcessViolationsParams, ProcessCriterionParams, and ProcessPromptResultParams with optional issueCollector - Update EvaluateFileParams to accept cacheStore, promptsHash, and useCache options - Update EvaluateFileResult to include wasCacheHit flag for cache hit detection - Add CLI schema options for --full and --noCache flags - Enable caching support for improved performance on repeated evaluations

- Add CacheStore class for persistent cache management with disk I/O - Stores cache in .vectorlint/cache.json by default - Implements get, set, has, clear, and size operations - Includes version checking for future cache migrations - Gracefully handles corrupted cache files by starting fresh - Only writes to disk when cache is modified (dirty flag) - Add content hashing utilities for cache key generation - Implement hashContent() to compute SHA256 of normalized file content - Implement hashPrompts() to hash prompt configurations for rule change detection - Implement createCacheKeyString() to combine file path and hashes into cache keys - Normalize content by trimming whitespace and standardizing line endings - Add error handling to prevent cache failures from blocking execution

- Add --full and --no-cache CLI options to control cache behavior - Pass cacheEnabled and forceFullRun flags to orchestrator - Implement cache store integration in evaluateFile function - Add issueCollector parameter to reportIssue and related functions - Filter runRules from scanPaths configuration before passing to orchestrator - Create cache key from content and prompts hash for cache lookups - Store and retrieve cached issues and scores from cache store - Skip evaluation for cached files and merge cached results with new ones - Add comprehensive cache store tests for validation and edge cases

coderabbitai · 2025-12-18T19:02:45Z

📝 Walkthrough

Walkthrough

Adds a disk-backed, versioned cache with content/prompt hashing, cache-key generation, new CLI flags (--full, --no-cache), and orchestrator integration to check, replay, and persist cached evaluation results (versioned .vectorlint/cache.json).

Changes

Cohort / File(s)	Change Summary
Cache types `src/cache/types.ts`	New exported interfaces: `CacheKey`, `CachedIssue`, `CachedEvaluationSummary`, `CachedScore` (optional `components`), `CachedResult`, `CacheData`, `CacheOptions`.
Cache storage implementation `src/cache/cache-store.ts`	New exported `CacheStore` class (constructor `(cwd?: string, cacheDir?: string)`) with `get`, `set`, `has`, `clear`, `size`, `save`; versioned cache file, dirty-tracking, dir creation, resilient load/save and pretty JSON persistence.
Content & prompt hashing `src/cache/content-hasher.ts`	New exports: `hashContent(content)`, `hashPrompts(prompts)`, `createCacheKeyString(filePath, contentHash, promptsHash)`; normalizes input, SHA256 hex, truncates to 16 chars.
Cache barrel export `src/cache/index.ts`	New barrel re-exporting `./types`, `./content-hasher`, and `./cache-store`.
Orchestrator integration `src/cli/orchestrator.ts`	Threads cache through evaluation: computes content/prompts hashes, builds cache keys, checks/replays cached issues & scores (adds `wasCacheHit`), sets cache on miss, persists cache after run; many helper signatures extended to accept `issueCollector` and cache params.
CLI wiring & commands `src/cli/commands.ts`, `src/schemas/cli-schemas.ts`	Adds CLI options `full` and `noCache` (schema + flags), maps to `forceFullRun` and `cacheEnabled`, adjusts output format handling.
CLI types extended `src/cli/types.ts`	Adds `cacheEnabled?`, `forceFullRun?` to `EvaluationOptions`; adds `issueCollector?: CachedIssue[]` to processing params; extends `EvaluateFileParams` (`cacheStore`, `promptsHash`, `useCache`) and `EvaluateFileResult` (`wasCacheHit`); adds `scoreComponents?` to `ErrorTrackingResult`.
Cache schema validation `src/schemas/cache-schema.ts`	New exported `CACHE_SCHEMA` (zod) with nested schemas for issues, evaluation summaries, score components, cached scores, and cached results; top-level `{ version, entries }`.
Reporter type tweak `src/output/reporter.ts`	Broadened `EvaluationSummary.score` type to explicitly allow `undefined` (`score?: number
Tests `tests/cache-store.test.ts`	New tests for `hashContent`, `hashPrompts`, `createCacheKeyString`, and `CacheStore` behaviors (persistence, dir creation, get/set/has/clear/size, and hashing invariants).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as CLI
    participant Orch as Orchestrator
    participant Hasher as Hasher
    participant Cache as CacheStore
    participant Eval as Evaluator

    CLI->>Orch: evaluateFiles(targets, options)
    Orch->>Cache: instantiate CacheStore (if cacheEnabled)
    Orch->>Hasher: hashPrompts(prompts)
    loop per file
        Orch->>Eval: read file content
        Eval-->>Orch: fileContent
        Orch->>Hasher: hashContent(fileContent)
        Orch->>Hasher: createCacheKeyString(filePath, contentHash, promptsHash)
        Orch->>Cache: has(cacheKey)?
        alt cache hit (useCache && !forceFullRun)
            Cache-->>Orch: true
            Orch->>Cache: get(cacheKey)
            Cache-->>Orch: CachedResult
            Orch->>Orch: replay issues & scores (wasCacheHit = true)
        else cache miss
            Orch->>Eval: run evaluation (prompts, file)
            Eval-->>Orch: issues, scores
            Orch->>Orch: collect issue data (issueCollector)
            Orch->>Cache: set(cacheKey, CachedResult)
        end
    end
    Orch->>Cache: save()
    Cache-->>Orch: persisted

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: improve Issue Location Accuracy with Line Numbering and Fuzzy Matching #46 — Modifies CLI orchestration and related types; likely overlaps with evaluateFile/evaluateFiles signature and Process* parameter changes.

Poem

🐰 I hop through files and hash each line,

I tuck results safe in a cache of mine,
Prompts and content snug, I store them tight,
Replay a whisper — or run it fresh tonight,
Nibbling bytes and saving time. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarizes the main objective: implementing a content hashing and caching mechanism with full output replay functionality, which aligns with the substantial changes across the cache module, orchestrator, and CLI integration.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ft/content-cache

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7325aad and b77a753.

📒 Files selected for processing (1)

src/cli/commands.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/cli/commands.ts

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

src/cache/content-hasher.ts (3)
9-9: Consider handling standalone CR characters in line ending normalization.

The current regex only converts CRLF (\r\n) to LF, but standalone CR characters (\r) would remain. For comprehensive normalization, consider replacing all \r occurrences:
-    const normalized = content.replace(/\r\n/g, '\n').trim();
+    const normalized = content.replace(/\r\n/g, '\n').replace(/\r/g, '\n').trim();
22-27: JSON.stringify on meta could produce non-deterministic output.

JSON.stringify(p.meta) may produce different string representations if object key ordering varies across JavaScript engines or versions, potentially causing cache misses for semantically identical metadata.

Consider using a deterministic JSON serialization approach or sorting keys explicitly:
🔎 View suggested fix
     const parts = sorted.map(p => ({
         id: p.id,
-        meta: JSON.stringify(p.meta),
+        meta: JSON.stringify(p.meta, Object.keys(p.meta).sort()),
         body: p.body.trim(),
         pack: p.pack || '',
     }));
43-43: Hash truncation reduces collision resistance.

Truncating SHA256 hashes to 16 characters (64 bits) significantly reduces collision resistance from 2^128 to 2^32 for birthday attacks. While this may be acceptable for a local cache with moderate file counts, consider the risk as the repository scales.

For repositories with thousands of files, consider increasing the truncation length to 24-32 characters for better collision resistance.
src/cli/commands.ts (1)
140-143: Extract duplicated scanPaths transformation logic.

The same scanPaths transformation pattern appears twice in this file. Consider extracting it into a helper function to reduce duplication and improve maintainability.
🔎 View suggested refactor

Add a helper function near the top of the file:
function normalizeScanPaths(scanPaths: FilePatternConfig[]) {
  return scanPaths.map(({ runRules, ...rest }) => ({
    ...rest,
    ...(runRules !== undefined ? { runRules } : {})
  }));
}
Then replace both occurrences:
-        scanPaths: config.scanPaths.map(({ runRules, ...rest }) => ({
-          ...rest,
-          ...(runRules !== undefined ? { runRules } : {})
-        })),
+        scanPaths: normalizeScanPaths(config.scanPaths),
Also applies to: 175-178
src/cache/cache-store.ts (2)
28-48: Add runtime validation for parsed cache data.

The JSON.parse result is cast directly to CacheData without runtime validation. If the cache file is corrupted or manually edited, this could cause runtime errors downstream.

Consider adding a validation function or using a schema validator (e.g., Zod) to ensure the loaded data conforms to the expected structure before casting:
🔎 View suggested improvement
private load(): CacheData {
    try {
        if (existsSync(this.cacheFile)) {
            const raw = readFileSync(this.cacheFile, 'utf-8');
            const parsed = JSON.parse(raw);
            
            // Basic validation
            if (!parsed || typeof parsed !== 'object' || 
                typeof parsed.version !== 'number' || 
                !parsed.entries || typeof parsed.entries !== 'object') {
                console.warn(`[vectorlint] Invalid cache structure, starting fresh`);
                return { version: CACHE_VERSION, entries: {} };
            }
            
            const cacheData = parsed as CacheData;

            // Version check for future migrations
            if (cacheData.version !== CACHE_VERSION) {
                console.warn(`[vectorlint] Cache version mismatch, clearing cache`);
                return { version: CACHE_VERSION, entries: {} };
            }

            return cacheData;
        }
    } catch (e) {
        // If cache is corrupted, start fresh
        const msg = e instanceof Error ? e.message : String(e);
        console.warn(`[vectorlint] Could not read cache (${msg}), starting fresh`);
    }

    return { version: CACHE_VERSION, entries: {} };
}
59-61: Consider using hasOwnProperty for cleaner key checking.

While the in operator works correctly for Records, using hasOwnProperty or Object.prototype.hasOwnProperty.call() is more explicit about checking own properties:
 has(key: string): boolean {
-    return key in this.data.entries;
+    return Object.prototype.hasOwnProperty.call(this.data.entries, key);
 }
src/cli/orchestrator.ts (1)
648-700: Duplicate hashContent computation.

hashContent(content) is computed at line 650 (cache lookup) and again at line 829 (cache storage). Compute it once and reuse:
🔎 Suggested fix
   const content = readFileSync(file, 'utf-8');
   const relFile = path.relative(process.cwd(), file) || file;
+  const contentHash = promptsHash ? hashContent(content) : '';

   // Check cache before running evaluations
   if (useCache && cacheStore && promptsHash) {
-    const contentHash = hashContent(content);
     const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
     const cached = cacheStore.get(cacheKey);
     // ... cache hit handling
   }

   // ... evaluation logic ...

   // Store result in cache
   if (cacheStore && promptsHash) {
-    const contentHash = hashContent(content);
     const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
     // ... cache storage
   }
Also applies to: 826-847

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 659ea2e and f523402.

📒 Files selected for processing (9)

src/cache/cache-store.ts (1 hunks)
src/cache/content-hasher.ts (1 hunks)
src/cache/index.ts (1 hunks)
src/cache/types.ts (1 hunks)
src/cli/commands.ts (3 hunks)
src/cli/orchestrator.ts (18 hunks)
src/cli/types.ts (6 hunks)
src/schemas/cli-schemas.ts (1 hunks)
tests/cache-store.test.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

tests/cache-store.test.ts (3)

src/cache/content-hasher.ts (3)

hashContent (8-11)

hashPrompts (17-31)

createCacheKeyString (37-44)

src/cache/cache-store.ts (1)

CacheStore (13-90)

src/cache/types.ts (1)

CachedResult (37-47)

src/cli/types.ts (2)

src/cache/types.ts (1)

CachedIssue (16-25)

src/cache/cache-store.ts (1)

CacheStore (13-90)

src/cache/cache-store.ts (1)

src/cache/types.ts (2)

CacheData (49-52)

CachedResult (37-47)

src/cli/orchestrator.ts (5)

src/cli/types.ts (2)

ReportIssueParams (58-70)

EvaluateFileResult (147-150)

src/cache/types.ts (2)

CachedIssue (16-25)

CachedScore (33-36)

src/cache/content-hasher.ts (3)

hashContent (8-11)

createCacheKeyString (37-44)

hashPrompts (17-31)

src/output/reporter.ts (3)

printFileHeader (23-29)

EvaluationSummary (6-10)

printEvaluationSummaries (137-174)

src/cache/cache-store.ts (1)

CacheStore (13-90)

🔇 Additional comments (12)

src/cache/types.ts (1)

1-58: LGTM! Well-structured cache type definitions.

The cache type system is comprehensive and well-designed:

Clear separation between cache keys, cached data, and cache configuration

CachedIssue captures all necessary fields for replay

CachedResult includes both granular details and summary metrics

Version field in CacheData enables future migrations

Optional fields allow flexibility in what gets cached

src/cache/index.ts (1)

1-3: LGTM! Clean barrel export.

src/schemas/cli-schemas.ts (1)

13-14: LGTM! CLI options properly defined.

The new full and noCache options are correctly defined with appropriate boolean defaults.

src/cli/commands.ts (2)

30-31: LGTM! Cache control flags properly added.

The new CLI options align with the caching feature requirements.

179-180: LGTM! Cache options correctly propagated.

The inversion of noCache to cacheEnabled and direct mapping of full to forceFullRun are correct.

src/cache/cache-store.ts (1)

72-89: LGTM! Save method has appropriate error handling.

The save method correctly:

Checks dirty flag to avoid unnecessary writes

Creates cache directory with recursive option

Formats JSON with indentation for readability

Handles errors gracefully without failing the main run

tests/cache-store.test.ts (2)

9-103: LGTM! Comprehensive hashing utility tests.

The test suite thoroughly validates:

Hash consistency for identical inputs

Line ending normalization (CRLF vs LF)

Whitespace trimming

Hash differences for different content

Prompt ordering invariance

Detection of changes in prompt body and metadata

Cache key truncation behavior

105-256: LGTM! Thorough CacheStore integration tests.

The test suite provides excellent coverage of CacheStore functionality:

Retrieval of missing keys

Store and retrieve operations

Disk persistence and reload

Automatic cache directory creation

Clear operations

Size reporting

Key existence checking

The proper setup/teardown ensures test isolation and prevents side effects.

src/cli/types.ts (1)

12-12: LGTM! Cache integration types are well-structured.

The type extensions cleanly integrate caching into the evaluation pipeline:

issueCollector is threaded through the processing pipeline for cache replay

EvaluateFileParams gains the necessary cache-related fields (cacheStore, promptsHash, useCache)

wasCacheHit flag enables tracking and reporting of cache effectiveness

All additions are optional, maintaining backward compatibility

The type design supports the full cache functionality while keeping the changes minimal and focused.

Also applies to: 30-31, 95-95, 104-104, 122-122, 142-144, 149-149
src/cli/orchestrator.ts (3)
59-85: LGTM!

The optional issueCollector pattern is clean and non-intrusive. The collected issue data correctly matches the CachedIssue interface.

827-847: Verify: --full updates cache despite bypassing reads.

When --full is used, useCache is false (line 872), so cache lookups are skipped. However, results are still written to cache (line 828 only checks cacheStore && promptsHash). If the intent is for --full to refresh the cache, this is correct. If --full should bypass caching entirely, add a guard:
-  if (cacheStore && promptsHash) {
+  if (useCache && cacheStore && promptsHash) {
857-935: LGTM!

The cache initialization logic correctly handles the three modes:

--no-cache: No CacheStore created, caching fully disabled.

Normal: Full cache read/write.

--full: Cache reads bypassed, but fresh results still written.

The verbose logging and cache hit tracking are well implemented.

src/cli/orchestrator.ts

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/cache/cache-store.ts (1)

76-93: Consider atomic write pattern for cache persistence.

The current implementation writes directly to cache.json (lines 85-86), which could leave the cache file in a corrupted state if the process is interrupted (e.g., Ctrl+C) or crashes during the write operation.

Using a write-then-rename pattern would make the save operation atomic and prevent cache corruption.

🔎 View atomic write pattern

 save(): void {
     if (!this.dirty) return;

     try {
         // Create cache directory if missing
         if (!existsSync(this.cacheDir)) {
             mkdirSync(this.cacheDir, { recursive: true });
         }

         const json = JSON.stringify(this.data, null, 2);
-        writeFileSync(this.cacheFile, json, 'utf-8');
+        // Atomic write: write to temp file, then rename
+        const tempFile = `${this.cacheFile}.tmp`;
+        writeFileSync(tempFile, json, 'utf-8');
+        
+        // Rename is atomic on most filesystems
+        const { renameSync } = require('fs');
+        renameSync(tempFile, this.cacheFile);
+        
         this.dirty = false;
     } catch (e) {
         // Don't fail the run if cache can't be written
         const msg = e instanceof Error ? e.message : String(e);
         console.warn(`[vectorlint] Warning: Could not save cache: ${msg}`);
     }
 }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f523402 and eef5729.

📒 Files selected for processing (1)

src/cache/cache-store.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/cache/cache-store.ts (1)

src/cache/types.ts (2)

CacheData (49-52)

CachedResult (37-47)

🔇 Additional comments (3)

src/cache/cache-store.ts (3)

1-11: LGTM! Clean setup with versioning support.

The imports and constants are well-structured. The cache versioning mechanism (line 9) is a good practice for future-proofing schema changes.

23-27: Constructor properly initializes paths and state.

The use of path.resolve and path.join ensures safe path handling, and loading the cache during construction is appropriate for this use case.

54-74: CRUD methods are well-implemented.

The cache interface methods are clean and correctly manage the dirty flag. The use of the in operator for has() (line 64) properly checks for key existence including keys with undefined values.

src/cache/cache-store.ts

- Add CachedScoreComponent interface to store minimal score data for replay - Extend CachedScore interface with optional components field for JSON output - Add scoreComponents field to ErrorTrackingResult for tracking score data - Update routePromptResult to initialize and populate scoreComponents - Modify evaluateFile to handle cached score replay for both Line and JSON formats - Refactor allScores map to use CachedScore objects instead of raw summaries - Simplify cache storage by converting map to array using Array.from() - Enable JSON formatter to output evaluation scores from cached data - Improve cache replay logic to support multiple output formats

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eef5729 and 219d150.

📒 Files selected for processing (3)

src/cache/types.ts (1 hunks)
src/cli/orchestrator.ts (20 hunks)
src/cli/types.ts (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/cli/types.ts
src/cache/types.ts

🧰 Additional context used

🧬 Code graph analysis (1)

src/cli/orchestrator.ts (7)

src/cli/types.ts (2)

ReportIssueParams (59-71)

EvaluateFileResult (148-151)

src/cache/types.ts (2)

CachedIssue (16-25)

CachedScore (49-53)

src/cache/content-hasher.ts (3)

hashContent (8-11)

createCacheKeyString (37-44)

hashPrompts (17-31)

src/output/reporter.ts (2)

printFileHeader (23-29)

EvaluationSummary (6-10)

src/output/json-formatter.ts (1)

JsonFormatter (54-94)

src/output/rdjson-formatter.ts (1)

RdJsonFormatter (54-134)

src/cache/cache-store.ts (1)

CacheStore (17-94)

🪛 GitHub Actions: Lint

src/cli/orchestrator.ts

[error] 822-822: ESLint: Unsafe assignment of an any value. @typescript-eslint/no-unsafe-assignment. Command failed: npm run lint.

🪛 GitHub Check: ESLint

src/cli/orchestrator.ts

[failure] 822-822:
Unexpected any. Specify a different type

[failure] 822-822:
Unsafe assignment of an any value

🔇 Additional comments (6)

src/cli/orchestrator.ts (6)

16-16: LGTM - Cache module integration.

The new cache imports are properly structured and all items are actively used throughout the orchestrator for the caching mechanism.

59-85: Well-structured dual-purpose reporting.

The modification to accept an optional issueCollector cleanly separates caching concerns from reporting, allowing the same function to both report issues and accumulate them for cache storage.

182-408: Correct parameter threading for cache collection.

The issueCollector parameter is properly threaded through the helper functions and consistently passed to all reportIssue calls, maintaining clean data flow for cache accumulation.

482-595: Proper integration of cache collection in result routing.

The function correctly propagates issueCollector through all processing paths and extends return values to include scoreComponents, enabling complete cache storage of evaluation results.

650-711: Cache replay implementation looks complete.

The cache check and replay logic properly handles both Line and JSON output formats. The JSON format score replay (lines 690-699) addresses the previous review comment by replaying cached components via addEvaluationScore, ensuring JSON output includes evaluation scores on cache hits.

876-950: Solid cache management implementation.

The cache initialization, file processing loop, and persistence logic are well-structured:

Conditional initialization based on cacheEnabled flag

Efficient single computation of promptsHash for all files

useCache properly respects both cacheEnabled and forceFullRun flags

Cache persistence after processing all files

Good observability with verbose logging

src/cli/orchestrator.ts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/schemas/cache-schema.ts (1)

3-6: Define a proper schema for cache entries instead of using z.any().

Using z.any() for the entries values defeats type safety and prevents validation of the cached data structure. Based on the cache types, entries should validate the CachedResult structure (errors, warnings, issues, scores, etc.).

🔎 Proposed schema with proper validation

+const CACHED_ISSUE_SCHEMA = z.object({
+  line: z.number(),
+  column: z.number(),
+  severity: z.string(),
+  summary: z.string(),
+  ruleName: z.string(),
+  suggestion: z.string().optional(),
+  scoreText: z.string().optional(),
+  match: z.string().optional()
+});
+
+const CACHED_SCORE_COMPONENT_SCHEMA = z.object({
+  criterion: z.string(),
+  rawScore: z.number(),
+  maxScore: z.number(),
+  weightedScore: z.number(),
+  weightedMaxScore: z.number(),
+  normalizedScore: z.number(),
+  normalizedMaxScore: z.number()
+});
+
+const CACHED_EVALUATION_SUMMARY_SCHEMA = z.object({
+  id: z.string(),
+  scoreText: z.string(),
+  score: z.number().optional()
+});
+
+const CACHED_SCORE_SCHEMA = z.object({
+  ruleName: z.string(),
+  items: z.array(CACHED_EVALUATION_SUMMARY_SCHEMA),
+  components: z.array(CACHED_SCORE_COMPONENT_SCHEMA).optional()
+});
+
+const CACHED_RESULT_SCHEMA = z.object({
+  errors: z.number(),
+  warnings: z.number(),
+  hadOperationalErrors: z.boolean(),
+  hadSeverityErrors: z.boolean(),
+  requestFailures: z.number(),
+  issues: z.array(CACHED_ISSUE_SCHEMA),
+  scores: z.array(CACHED_SCORE_SCHEMA),
+  timestamp: z.number()
+});
+
 export const CACHE_SCHEMA = z.object({
     version: z.number(),
-    entries: z.record(z.string(), z.any())
+    entries: z.record(z.string(), CACHED_RESULT_SCHEMA)
 });

This ensures cached data integrity and catches corruption or schema mismatches during load.

src/cli/orchestrator.ts (1)

650-711: Optimize: Avoid duplicate cache key calculation.

The cache key (contentHash and cacheKey) is computed twice: once for the cache check (lines 652-653) and again for cache storage (lines 847-848). Consider computing once and reusing.

🔎 Suggested optimization

  const { file, options, jsonFormatter, cacheStore, promptsHash, useCache } = params;
  const { prompts, provider, searchProvider, concurrency, scanPaths, outputFormat = OutputFormat.Line } = options;

  const content = readFileSync(file, 'utf-8');
  const relFile = path.relative(process.cwd(), file) || file;

+ // Compute cache key once if caching is enabled
+ let contentHash: string | undefined;
+ let cacheKey: string | undefined;
+ if (cacheStore && promptsHash) {
+   contentHash = hashContent(content);
+   cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
+ }
+
  // Check cache before running evaluations
- if (useCache && cacheStore && promptsHash) {
-   const contentHash = hashContent(content);
-   const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
+ if (useCache && cacheStore && cacheKey) {
    const cached = cacheStore.get(cacheKey);

    if (cached) {
      // ... cache replay logic ...
    }
  }

  // ... evaluation logic ...

  // Store result in cache
- if (cacheStore && promptsHash) {
-   const contentHash = hashContent(content);
-   const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash);
-
+ if (cacheStore && cacheKey) {
    const cachedScores: CachedScore[] = Array.from(allScores.values());

    cacheStore.set(cacheKey, {
      // ... cached result ...
    });
  }

Also applies to: 845-862

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 219d150 and 994a77a.

📒 Files selected for processing (4)

src/cache/cache-store.ts (1 hunks)
src/cache/types.ts (1 hunks)
src/cli/orchestrator.ts (20 hunks)
src/schemas/cache-schema.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/cache/cache-store.ts
src/cache/types.ts

🔇 Additional comments (6)

src/cli/orchestrator.ts (6)

16-16: Cache imports look good.

The imports are appropriate for the caching functionality being added.

59-85: Issue collection implementation is clean.

The optional issueCollector parameter maintains backward compatibility while enabling cache functionality. The collected fields align with the CachedIssue type.

183-227: Consistent threading of issue collector.

The issueCollector parameter is properly threaded through the evaluation pipeline (locateAndReportViolations, extractAndReportCriterion, routePromptResult, and reportIssue calls), maintaining its optional nature throughout.

Also applies to: 239-387, 483-594

650-711: Cache replay logic is well-implemented.

The cache hit path properly:

Computes the cache key from content and prompts hashes

Replays cached issues with all original fields

Handles score replay for both Line and JSON output formats

Returns the cached result with wasCacheHit: true

718-825: Cache data collection is correctly implemented.

The code properly:

Initializes the issueCollector array for capturing issues

Types allScores as Map<string, CachedScore> to store score data

Threads issueCollector through the evaluation pipeline

Stores score components alongside score entries for complete cache fidelity

Note: The past review comment about an unsafe type assertion at line 822 appears to have been resolved—the current code uses proper TypeScript optional spreading without as any.

876-950: Cache initialization and persistence are well-structured.

The implementation correctly:

Determines useCache based on cacheEnabled and forceFullRun flags

Initializes the CacheStore and computes promptsHash once for all files

Threads cache parameters through to evaluateFile

Tracks cache hit statistics

Persists the cache after all evaluations complete

Provides verbose logging when enabled

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

src/output/reporter.ts (1)
9-9: Redundant explicit undefined in optional property type.

The optional property syntax score?: number already permits undefined as a value. Explicitly adding | undefined is redundant in TypeScript and doesn't change the type's behavior.
🔎 Simplify the type annotation
-  score?: number | undefined;
+  score?: number;
src/schemas/cache-schema.ts (4)
6-15: Consider adding validation constraints to line and column numbers.

The line and column fields accept any number, including negatives or zero. Adding minimum constraints would prevent semantically invalid values.
🔎 Suggested refinements
 const CACHED_ISSUE_SCHEMA = z.object({
-    line: z.number(),
-    column: z.number(),
+    line: z.number().int().min(1),
+    column: z.number().int().min(1),
     severity: z.nativeEnum(Severity),
     summary: z.string(),
     ruleName: z.string(),
     suggestion: z.string().optional(),
     scoreText: z.string().optional(),
     match: z.string().optional(),
 });
17-40: Score schemas are well-structured.

The evaluation summary, score component, and grouped score schemas provide good granularity for cached results. The hierarchical composition is clean.

If you want additional runtime safety, consider adding range constraints to numeric score fields (e.g., z.number().min(0) for scores that shouldn't be negative), but this is not essential if upstream logic already enforces valid ranges.

42-54: CACHED_RESULT_SCHEMA structure aligns well with PR objectives.

The schema correctly captures evaluation results including errors, warnings, operational state, and optional detailed outputs. Using z.unknown() for jsonOutput is the right approach for arbitrary JSON structures.

Consider adding z.number().int().nonnegative() constraints to errors, warnings, and requestFailures to prevent invalid negative counts. For timestamp, you could add .positive() to ensure it's a valid Unix timestamp, but these are defensive refinements rather than critical fixes.

56-59: CACHE_SCHEMA provides a clean, versioned cache structure.

The top-level schema with version and entries map is well-designed for managing cached evaluation results. Exporting only the top-level schema maintains good encapsulation.

Consider adding .int().positive() to the version field to enforce valid version numbers:
🔎 Optional refinement
 export const CACHE_SCHEMA = z.object({
-    version: z.number(),
+    version: z.number().int().positive(),
     entries: z.record(z.string(), CACHED_RESULT_SCHEMA)
 });

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 994a77a and 8608786.

📒 Files selected for processing (5)

src/cache/cache-store.ts (1 hunks)
src/cache/content-hasher.ts (1 hunks)
src/cache/types.ts (1 hunks)
src/output/reporter.ts (1 hunks)
src/schemas/cache-schema.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/cache/cache-store.ts
src/cache/types.ts
src/cache/content-hasher.ts

coderabbitai · 2025-12-19T08:11:57Z

src/schemas/cache-schema.ts

@@ -0,0 +1,59 @@
+import { z } from 'zod';


⚠️ Potential issue | 🟠 Major

Zod 3.25.76 has critical TypeScript compatibility issues—consider downgrading or upgrading to Zod 4.

Zod version 3.25.76 introduced breaking changes that require TypeScript 5+ and drop support for TypeScript 4.9. No direct security vulnerabilities exist in this version, but the bundle size is significantly increased because Zod 4 is bundled within Zod 3. Either downgrade to 3.22.4 (the last compatible version) if your project targets TypeScript 4.9, or upgrade to Zod 4 for the latest stability and performance improvements.

🤖 Prompt for AI Agents

In src/schemas/cache-schema.ts around line 1, the project currently imports Zod; the review warns that Zod 3.25.76 is incompatible with TypeScript <5 and bloated due to bundled Zod 4 — either pin Zod to 3.22.4 if you must remain on TypeScript 4.9 or upgrade to Zod 4.x: update package.json (change version), run npm/yarn install to refresh lockfile, then if upgrading to Zod 4, update any Zod API changes in code (adjust imports/validation calls/types as needed) and ensure CI/tsconfig target is set to TS5+ before running tests and typechecks.

…o ft/content-cache

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfd1a53 and 7325aad.

📒 Files selected for processing (1)

src/cli/commands.ts

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

src/cli/commands.ts

🧠 Learnings (4)

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/boundaries/env-parser.ts : All environment variables must be validated via Zod schemas in `src/boundaries/env-parser.ts`

Applied to files:

src/cli/commands.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Recursively load all `.md` files from within each rule pack; multiple packs can be used simultaneously

Applied to files:

src/cli/commands.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Organize rules into subdirectories (packs) within RulesPath; use company names (e.g., Acme, TechCorp, Stripe) to indicate which style guide the rules implement

Applied to files:

src/cli/commands.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to .vectorlint.ini : Use `[glob/pattern]` sections in `.vectorlint.ini` to specify which packs run on which files

Applied to files:

src/cli/commands.ts

🧬 Code graph analysis (1)

src/cli/commands.ts (5)

src/config/constants.ts (1)

DEFAULT_CONFIG_FILENAME (6-6)

src/boundaries/cli-parser.ts (1)

parseCliOptions (4-15)

src/errors/index.ts (1)

handleUnknownError (46-51)

src/providers/search-provider.ts (1)

SearchProvider (5-7)

src/providers/perplexity-provider.ts (1)

PerplexitySearchProvider (11-54)

🔇 Additional comments (2)

src/cli/commands.ts (2)

180-188: LGTM: OutputFormat mapping is well-implemented.

The mapping from CLI string options to the OutputFormat enum is clean and includes a sensible default fallback to OutputFormat.Line. This provides type safety and integrates well with the conditional summary printing logic.

203-204: LGTM: Cache integration is correctly implemented.

The new cache-related options (cacheEnabled and forceFullRun) are properly derived from CLI flags and passed to the orchestrator. The conditional summary printing for OutputFormat.Line ensures structured output formats remain clean.

Also applies to: 208-208

src/cli/commands.ts

ayo6706 added 3 commits December 18, 2025 19:33

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

src/cli/orchestrator.ts Outdated Show resolved Hide resolved

docs(cache): Add schema version documentation to cache store

eef5729

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

src/cache/cache-store.ts Show resolved Hide resolved

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

src/cli/orchestrator.ts Show resolved Hide resolved

Clean Up eslint errors

994a77a

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

Clean up cache implementation

8608786

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

ayo6706 and others added 3 commits December 19, 2025 09:14

Add new line

912aa37

Merge branch 'main' of https://github.com/TRocket-Labs/vectorlint int…

bfd1a53

…o ft/content-cache

fix: convert CLI output option string to OutputFormat enum

7325aad

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

src/cli/commands.ts Show resolved Hide resolved

src/cli/commands.ts Outdated Show resolved Hide resolved

fix: correct --config option syntax and extract scanPaths transformation

b77a753

Conversation

ayo6706 commented Dec 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Impact

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ayo6706 commented Dec 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 18, 2025 •

edited

Loading