feat: Implement content hashing and result caching with full output replay#42
feat: Implement content hashing and result caching with full output replay#42
Conversation
- Create cache module with index, types, and store implementations - Add CacheKey interface for unique cache entry identification using file path and content/prompt hashes - Add CachedIssue interface for minimal issue data storage and replay - Add CachedResult interface to store evaluation results with errors, warnings, and operational status - Add CacheData interface for versioned cache entry management - Add CacheOptions interface for cache configuration (enabled, forceFullRun, cacheDir) - Update EvaluationOptions to include cacheEnabled and forceFullRun flags - Update ProcessViolationsParams, ProcessCriterionParams, and ProcessPromptResultParams with optional issueCollector - Update EvaluateFileParams to accept cacheStore, promptsHash, and useCache options - Update EvaluateFileResult to include wasCacheHit flag for cache hit detection - Add CLI schema options for --full and --noCache flags - Enable caching support for improved performance on repeated evaluations
- Add CacheStore class for persistent cache management with disk I/O - Stores cache in .vectorlint/cache.json by default - Implements get, set, has, clear, and size operations - Includes version checking for future cache migrations - Gracefully handles corrupted cache files by starting fresh - Only writes to disk when cache is modified (dirty flag) - Add content hashing utilities for cache key generation - Implement hashContent() to compute SHA256 of normalized file content - Implement hashPrompts() to hash prompt configurations for rule change detection - Implement createCacheKeyString() to combine file path and hashes into cache keys - Normalize content by trimming whitespace and standardizing line endings - Add error handling to prevent cache failures from blocking execution
- Add --full and --no-cache CLI options to control cache behavior - Pass cacheEnabled and forceFullRun flags to orchestrator - Implement cache store integration in evaluateFile function - Add issueCollector parameter to reportIssue and related functions - Filter runRules from scanPaths configuration before passing to orchestrator - Create cache key from content and prompts hash for cache lookups - Store and retrieve cached issues and scores from cache store - Skip evaluation for cached files and merge cached results with new ones - Add comprehensive cache store tests for validation and edge cases
📝 WalkthroughWalkthroughAdds a disk-backed, versioned cache with content/prompt hashing, cache-key generation, new CLI flags (--full, --no-cache), and orchestrator integration to check, replay, and persist cached evaluation results (versioned .vectorlint/cache.json). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI
participant Orch as Orchestrator
participant Hasher as Hasher
participant Cache as CacheStore
participant Eval as Evaluator
CLI->>Orch: evaluateFiles(targets, options)
Orch->>Cache: instantiate CacheStore (if cacheEnabled)
Orch->>Hasher: hashPrompts(prompts)
loop per file
Orch->>Eval: read file content
Eval-->>Orch: fileContent
Orch->>Hasher: hashContent(fileContent)
Orch->>Hasher: createCacheKeyString(filePath, contentHash, promptsHash)
Orch->>Cache: has(cacheKey)?
alt cache hit (useCache && !forceFullRun)
Cache-->>Orch: true
Orch->>Cache: get(cacheKey)
Cache-->>Orch: CachedResult
Orch->>Orch: replay issues & scores (wasCacheHit = true)
else cache miss
Orch->>Eval: run evaluation (prompts, file)
Eval-->>Orch: issues, scores
Orch->>Orch: collect issue data (issueCollector)
Orch->>Cache: set(cacheKey, CachedResult)
end
end
Orch->>Cache: save()
Cache-->>Orch: persisted
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (7)
src/cache/content-hasher.ts (3)
9-9: Consider handling standalone CR characters in line ending normalization.The current regex only converts CRLF (
\r\n) to LF, but standalone CR characters (\r) would remain. For comprehensive normalization, consider replacing all\roccurrences:- const normalized = content.replace(/\r\n/g, '\n').trim(); + const normalized = content.replace(/\r\n/g, '\n').replace(/\r/g, '\n').trim();
22-27: JSON.stringify on meta could produce non-deterministic output.
JSON.stringify(p.meta)may produce different string representations if object key ordering varies across JavaScript engines or versions, potentially causing cache misses for semantically identical metadata.Consider using a deterministic JSON serialization approach or sorting keys explicitly:
🔎 View suggested fix
const parts = sorted.map(p => ({ id: p.id, - meta: JSON.stringify(p.meta), + meta: JSON.stringify(p.meta, Object.keys(p.meta).sort()), body: p.body.trim(), pack: p.pack || '', }));
43-43: Hash truncation reduces collision resistance.Truncating SHA256 hashes to 16 characters (64 bits) significantly reduces collision resistance from 2^128 to 2^32 for birthday attacks. While this may be acceptable for a local cache with moderate file counts, consider the risk as the repository scales.
For repositories with thousands of files, consider increasing the truncation length to 24-32 characters for better collision resistance.
src/cli/commands.ts (1)
140-143: Extract duplicated scanPaths transformation logic.The same scanPaths transformation pattern appears twice in this file. Consider extracting it into a helper function to reduce duplication and improve maintainability.
🔎 View suggested refactor
Add a helper function near the top of the file:
function normalizeScanPaths(scanPaths: FilePatternConfig[]) { return scanPaths.map(({ runRules, ...rest }) => ({ ...rest, ...(runRules !== undefined ? { runRules } : {}) })); }Then replace both occurrences:
- scanPaths: config.scanPaths.map(({ runRules, ...rest }) => ({ - ...rest, - ...(runRules !== undefined ? { runRules } : {}) - })), + scanPaths: normalizeScanPaths(config.scanPaths),Also applies to: 175-178
src/cache/cache-store.ts (2)
28-48: Add runtime validation for parsed cache data.The
JSON.parseresult is cast directly toCacheDatawithout runtime validation. If the cache file is corrupted or manually edited, this could cause runtime errors downstream.Consider adding a validation function or using a schema validator (e.g., Zod) to ensure the loaded data conforms to the expected structure before casting:
🔎 View suggested improvement
private load(): CacheData { try { if (existsSync(this.cacheFile)) { const raw = readFileSync(this.cacheFile, 'utf-8'); const parsed = JSON.parse(raw); // Basic validation if (!parsed || typeof parsed !== 'object' || typeof parsed.version !== 'number' || !parsed.entries || typeof parsed.entries !== 'object') { console.warn(`[vectorlint] Invalid cache structure, starting fresh`); return { version: CACHE_VERSION, entries: {} }; } const cacheData = parsed as CacheData; // Version check for future migrations if (cacheData.version !== CACHE_VERSION) { console.warn(`[vectorlint] Cache version mismatch, clearing cache`); return { version: CACHE_VERSION, entries: {} }; } return cacheData; } } catch (e) { // If cache is corrupted, start fresh const msg = e instanceof Error ? e.message : String(e); console.warn(`[vectorlint] Could not read cache (${msg}), starting fresh`); } return { version: CACHE_VERSION, entries: {} }; }
59-61: Consider using hasOwnProperty for cleaner key checking.While the
inoperator works correctly for Records, usinghasOwnPropertyorObject.prototype.hasOwnProperty.call()is more explicit about checking own properties:has(key: string): boolean { - return key in this.data.entries; + return Object.prototype.hasOwnProperty.call(this.data.entries, key); }src/cli/orchestrator.ts (1)
648-700: DuplicatehashContentcomputation.
hashContent(content)is computed at line 650 (cache lookup) and again at line 829 (cache storage). Compute it once and reuse:🔎 Suggested fix
const content = readFileSync(file, 'utf-8'); const relFile = path.relative(process.cwd(), file) || file; + const contentHash = promptsHash ? hashContent(content) : ''; // Check cache before running evaluations if (useCache && cacheStore && promptsHash) { - const contentHash = hashContent(content); const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash); const cached = cacheStore.get(cacheKey); // ... cache hit handling } // ... evaluation logic ... // Store result in cache if (cacheStore && promptsHash) { - const contentHash = hashContent(content); const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash); // ... cache storage }Also applies to: 826-847
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/cache/cache-store.ts(1 hunks)src/cache/content-hasher.ts(1 hunks)src/cache/index.ts(1 hunks)src/cache/types.ts(1 hunks)src/cli/commands.ts(3 hunks)src/cli/orchestrator.ts(18 hunks)src/cli/types.ts(6 hunks)src/schemas/cli-schemas.ts(1 hunks)tests/cache-store.test.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/cache-store.test.ts (3)
src/cache/content-hasher.ts (3)
hashContent(8-11)hashPrompts(17-31)createCacheKeyString(37-44)src/cache/cache-store.ts (1)
CacheStore(13-90)src/cache/types.ts (1)
CachedResult(37-47)
src/cli/types.ts (2)
src/cache/types.ts (1)
CachedIssue(16-25)src/cache/cache-store.ts (1)
CacheStore(13-90)
src/cache/cache-store.ts (1)
src/cache/types.ts (2)
CacheData(49-52)CachedResult(37-47)
src/cli/orchestrator.ts (5)
src/cli/types.ts (2)
ReportIssueParams(58-70)EvaluateFileResult(147-150)src/cache/types.ts (2)
CachedIssue(16-25)CachedScore(33-36)src/cache/content-hasher.ts (3)
hashContent(8-11)createCacheKeyString(37-44)hashPrompts(17-31)src/output/reporter.ts (3)
printFileHeader(23-29)EvaluationSummary(6-10)printEvaluationSummaries(137-174)src/cache/cache-store.ts (1)
CacheStore(13-90)
🔇 Additional comments (12)
src/cache/types.ts (1)
1-58: LGTM! Well-structured cache type definitions.The cache type system is comprehensive and well-designed:
- Clear separation between cache keys, cached data, and cache configuration
CachedIssuecaptures all necessary fields for replayCachedResultincludes both granular details and summary metrics- Version field in
CacheDataenables future migrations- Optional fields allow flexibility in what gets cached
src/cache/index.ts (1)
1-3: LGTM! Clean barrel export.src/schemas/cli-schemas.ts (1)
13-14: LGTM! CLI options properly defined.The new
fullandnoCacheoptions are correctly defined with appropriate boolean defaults.src/cli/commands.ts (2)
30-31: LGTM! Cache control flags properly added.The new CLI options align with the caching feature requirements.
179-180: LGTM! Cache options correctly propagated.The inversion of
noCachetocacheEnabledand direct mapping offulltoforceFullRunare correct.src/cache/cache-store.ts (1)
72-89: LGTM! Save method has appropriate error handling.The save method correctly:
- Checks dirty flag to avoid unnecessary writes
- Creates cache directory with recursive option
- Formats JSON with indentation for readability
- Handles errors gracefully without failing the main run
tests/cache-store.test.ts (2)
9-103: LGTM! Comprehensive hashing utility tests.The test suite thoroughly validates:
- Hash consistency for identical inputs
- Line ending normalization (CRLF vs LF)
- Whitespace trimming
- Hash differences for different content
- Prompt ordering invariance
- Detection of changes in prompt body and metadata
- Cache key truncation behavior
105-256: LGTM! Thorough CacheStore integration tests.The test suite provides excellent coverage of CacheStore functionality:
- Retrieval of missing keys
- Store and retrieve operations
- Disk persistence and reload
- Automatic cache directory creation
- Clear operations
- Size reporting
- Key existence checking
The proper setup/teardown ensures test isolation and prevents side effects.
src/cli/types.ts (1)
12-12: LGTM! Cache integration types are well-structured.The type extensions cleanly integrate caching into the evaluation pipeline:
issueCollectoris threaded through the processing pipeline for cache replayEvaluateFileParamsgains the necessary cache-related fields (cacheStore,promptsHash,useCache)wasCacheHitflag enables tracking and reporting of cache effectiveness- All additions are optional, maintaining backward compatibility
The type design supports the full cache functionality while keeping the changes minimal and focused.
Also applies to: 30-31, 95-95, 104-104, 122-122, 142-144, 149-149
src/cli/orchestrator.ts (3)
59-85: LGTM!The optional
issueCollectorpattern is clean and non-intrusive. The collected issue data correctly matches theCachedIssueinterface.
827-847: Verify:--fullupdates cache despite bypassing reads.When
--fullis used,useCacheis false (line 872), so cache lookups are skipped. However, results are still written to cache (line 828 only checkscacheStore && promptsHash). If the intent is for--fullto refresh the cache, this is correct. If--fullshould bypass caching entirely, add a guard:- if (cacheStore && promptsHash) { + if (useCache && cacheStore && promptsHash) {
857-935: LGTM!The cache initialization logic correctly handles the three modes:
--no-cache: NoCacheStorecreated, caching fully disabled.- Normal: Full cache read/write.
--full: Cache reads bypassed, but fresh results still written.The verbose logging and cache hit tracking are well implemented.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/cache/cache-store.ts (1)
76-93: Consider atomic write pattern for cache persistence.The current implementation writes directly to
cache.json(lines 85-86), which could leave the cache file in a corrupted state if the process is interrupted (e.g., Ctrl+C) or crashes during the write operation.Using a write-then-rename pattern would make the save operation atomic and prevent cache corruption.
🔎 View atomic write pattern
save(): void { if (!this.dirty) return; try { // Create cache directory if missing if (!existsSync(this.cacheDir)) { mkdirSync(this.cacheDir, { recursive: true }); } const json = JSON.stringify(this.data, null, 2); - writeFileSync(this.cacheFile, json, 'utf-8'); + // Atomic write: write to temp file, then rename + const tempFile = `${this.cacheFile}.tmp`; + writeFileSync(tempFile, json, 'utf-8'); + + // Rename is atomic on most filesystems + const { renameSync } = require('fs'); + renameSync(tempFile, this.cacheFile); + this.dirty = false; } catch (e) { // Don't fail the run if cache can't be written const msg = e instanceof Error ? e.message : String(e); console.warn(`[vectorlint] Warning: Could not save cache: ${msg}`); } }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cache/cache-store.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cache/cache-store.ts (1)
src/cache/types.ts (2)
CacheData(49-52)CachedResult(37-47)
🔇 Additional comments (3)
src/cache/cache-store.ts (3)
1-11: LGTM! Clean setup with versioning support.The imports and constants are well-structured. The cache versioning mechanism (line 9) is a good practice for future-proofing schema changes.
23-27: Constructor properly initializes paths and state.The use of
path.resolveandpath.joinensures safe path handling, and loading the cache during construction is appropriate for this use case.
54-74: CRUD methods are well-implemented.The cache interface methods are clean and correctly manage the dirty flag. The use of the
inoperator forhas()(line 64) properly checks for key existence including keys with undefined values.
- Add CachedScoreComponent interface to store minimal score data for replay - Extend CachedScore interface with optional components field for JSON output - Add scoreComponents field to ErrorTrackingResult for tracking score data - Update routePromptResult to initialize and populate scoreComponents - Modify evaluateFile to handle cached score replay for both Line and JSON formats - Refactor allScores map to use CachedScore objects instead of raw summaries - Simplify cache storage by converting map to array using Array.from() - Enable JSON formatter to output evaluation scores from cached data - Improve cache replay logic to support multiple output formats
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/cache/types.ts(1 hunks)src/cli/orchestrator.ts(20 hunks)src/cli/types.ts(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/cli/types.ts
- src/cache/types.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/cli/orchestrator.ts (7)
src/cli/types.ts (2)
ReportIssueParams(59-71)EvaluateFileResult(148-151)src/cache/types.ts (2)
CachedIssue(16-25)CachedScore(49-53)src/cache/content-hasher.ts (3)
hashContent(8-11)createCacheKeyString(37-44)hashPrompts(17-31)src/output/reporter.ts (2)
printFileHeader(23-29)EvaluationSummary(6-10)src/output/json-formatter.ts (1)
JsonFormatter(54-94)src/output/rdjson-formatter.ts (1)
RdJsonFormatter(54-134)src/cache/cache-store.ts (1)
CacheStore(17-94)
🪛 GitHub Actions: Lint
src/cli/orchestrator.ts
[error] 822-822: ESLint: Unsafe assignment of an any value. @typescript-eslint/no-unsafe-assignment. Command failed: npm run lint.
🪛 GitHub Check: ESLint
src/cli/orchestrator.ts
[failure] 822-822:
Unexpected any. Specify a different type
[failure] 822-822:
Unsafe assignment of an any value
🔇 Additional comments (6)
src/cli/orchestrator.ts (6)
16-16: LGTM - Cache module integration.The new cache imports are properly structured and all items are actively used throughout the orchestrator for the caching mechanism.
59-85: Well-structured dual-purpose reporting.The modification to accept an optional
issueCollectorcleanly separates caching concerns from reporting, allowing the same function to both report issues and accumulate them for cache storage.
182-408: Correct parameter threading for cache collection.The
issueCollectorparameter is properly threaded through the helper functions and consistently passed to allreportIssuecalls, maintaining clean data flow for cache accumulation.
482-595: Proper integration of cache collection in result routing.The function correctly propagates
issueCollectorthrough all processing paths and extends return values to includescoreComponents, enabling complete cache storage of evaluation results.
650-711: Cache replay implementation looks complete.The cache check and replay logic properly handles both Line and JSON output formats. The JSON format score replay (lines 690-699) addresses the previous review comment by replaying cached
componentsviaaddEvaluationScore, ensuring JSON output includes evaluation scores on cache hits.
876-950: Solid cache management implementation.The cache initialization, file processing loop, and persistence logic are well-structured:
- Conditional initialization based on
cacheEnabledflag- Efficient single computation of
promptsHashfor all filesuseCacheproperly respects bothcacheEnabledandforceFullRunflags- Cache persistence after processing all files
- Good observability with verbose logging
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/schemas/cache-schema.ts (1)
3-6: Define a proper schema for cache entries instead of usingz.any().Using
z.any()for the entries values defeats type safety and prevents validation of the cached data structure. Based on the cache types, entries should validate theCachedResultstructure (errors, warnings, issues, scores, etc.).🔎 Proposed schema with proper validation
+const CACHED_ISSUE_SCHEMA = z.object({ + line: z.number(), + column: z.number(), + severity: z.string(), + summary: z.string(), + ruleName: z.string(), + suggestion: z.string().optional(), + scoreText: z.string().optional(), + match: z.string().optional() +}); + +const CACHED_SCORE_COMPONENT_SCHEMA = z.object({ + criterion: z.string(), + rawScore: z.number(), + maxScore: z.number(), + weightedScore: z.number(), + weightedMaxScore: z.number(), + normalizedScore: z.number(), + normalizedMaxScore: z.number() +}); + +const CACHED_EVALUATION_SUMMARY_SCHEMA = z.object({ + id: z.string(), + scoreText: z.string(), + score: z.number().optional() +}); + +const CACHED_SCORE_SCHEMA = z.object({ + ruleName: z.string(), + items: z.array(CACHED_EVALUATION_SUMMARY_SCHEMA), + components: z.array(CACHED_SCORE_COMPONENT_SCHEMA).optional() +}); + +const CACHED_RESULT_SCHEMA = z.object({ + errors: z.number(), + warnings: z.number(), + hadOperationalErrors: z.boolean(), + hadSeverityErrors: z.boolean(), + requestFailures: z.number(), + issues: z.array(CACHED_ISSUE_SCHEMA), + scores: z.array(CACHED_SCORE_SCHEMA), + timestamp: z.number() +}); + export const CACHE_SCHEMA = z.object({ version: z.number(), - entries: z.record(z.string(), z.any()) + entries: z.record(z.string(), CACHED_RESULT_SCHEMA) });This ensures cached data integrity and catches corruption or schema mismatches during load.
src/cli/orchestrator.ts (1)
650-711: Optimize: Avoid duplicate cache key calculation.The cache key (contentHash and cacheKey) is computed twice: once for the cache check (lines 652-653) and again for cache storage (lines 847-848). Consider computing once and reusing.
🔎 Suggested optimization
const { file, options, jsonFormatter, cacheStore, promptsHash, useCache } = params; const { prompts, provider, searchProvider, concurrency, scanPaths, outputFormat = OutputFormat.Line } = options; const content = readFileSync(file, 'utf-8'); const relFile = path.relative(process.cwd(), file) || file; + // Compute cache key once if caching is enabled + let contentHash: string | undefined; + let cacheKey: string | undefined; + if (cacheStore && promptsHash) { + contentHash = hashContent(content); + cacheKey = createCacheKeyString(relFile, contentHash, promptsHash); + } + // Check cache before running evaluations - if (useCache && cacheStore && promptsHash) { - const contentHash = hashContent(content); - const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash); + if (useCache && cacheStore && cacheKey) { const cached = cacheStore.get(cacheKey); if (cached) { // ... cache replay logic ... } } // ... evaluation logic ... // Store result in cache - if (cacheStore && promptsHash) { - const contentHash = hashContent(content); - const cacheKey = createCacheKeyString(relFile, contentHash, promptsHash); - + if (cacheStore && cacheKey) { const cachedScores: CachedScore[] = Array.from(allScores.values()); cacheStore.set(cacheKey, { // ... cached result ... }); }Also applies to: 845-862
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cache/cache-store.ts(1 hunks)src/cache/types.ts(1 hunks)src/cli/orchestrator.ts(20 hunks)src/schemas/cache-schema.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/cache/cache-store.ts
- src/cache/types.ts
🔇 Additional comments (6)
src/cli/orchestrator.ts (6)
16-16: Cache imports look good.The imports are appropriate for the caching functionality being added.
59-85: Issue collection implementation is clean.The optional
issueCollectorparameter maintains backward compatibility while enabling cache functionality. The collected fields align with theCachedIssuetype.
183-227: Consistent threading of issue collector.The
issueCollectorparameter is properly threaded through the evaluation pipeline (locateAndReportViolations, extractAndReportCriterion, routePromptResult, and reportIssue calls), maintaining its optional nature throughout.Also applies to: 239-387, 483-594
650-711: Cache replay logic is well-implemented.The cache hit path properly:
- Computes the cache key from content and prompts hashes
- Replays cached issues with all original fields
- Handles score replay for both Line and JSON output formats
- Returns the cached result with
wasCacheHit: true
718-825: Cache data collection is correctly implemented.The code properly:
- Initializes the
issueCollectorarray for capturing issues- Types
allScoresasMap<string, CachedScore>to store score data- Threads
issueCollectorthrough the evaluation pipeline- Stores score components alongside score entries for complete cache fidelity
Note: The past review comment about an unsafe type assertion at line 822 appears to have been resolved—the current code uses proper TypeScript optional spreading without
as any.
876-950: Cache initialization and persistence are well-structured.The implementation correctly:
- Determines
useCachebased oncacheEnabledandforceFullRunflags- Initializes the
CacheStoreand computespromptsHashonce for all files- Threads cache parameters through to
evaluateFile- Tracks cache hit statistics
- Persists the cache after all evaluations complete
- Provides verbose logging when enabled
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
src/output/reporter.ts (1)
9-9: Redundant explicitundefinedin optional property type.The optional property syntax
score?: numberalready permitsundefinedas a value. Explicitly adding| undefinedis redundant in TypeScript and doesn't change the type's behavior.🔎 Simplify the type annotation
- score?: number | undefined; + score?: number;src/schemas/cache-schema.ts (4)
6-15: Consider adding validation constraints to line and column numbers.The
lineandcolumnfields accept any number, including negatives or zero. Adding minimum constraints would prevent semantically invalid values.🔎 Suggested refinements
const CACHED_ISSUE_SCHEMA = z.object({ - line: z.number(), - column: z.number(), + line: z.number().int().min(1), + column: z.number().int().min(1), severity: z.nativeEnum(Severity), summary: z.string(), ruleName: z.string(), suggestion: z.string().optional(), scoreText: z.string().optional(), match: z.string().optional(), });
17-40: Score schemas are well-structured.The evaluation summary, score component, and grouped score schemas provide good granularity for cached results. The hierarchical composition is clean.
If you want additional runtime safety, consider adding range constraints to numeric score fields (e.g.,
z.number().min(0)for scores that shouldn't be negative), but this is not essential if upstream logic already enforces valid ranges.
42-54: CACHED_RESULT_SCHEMA structure aligns well with PR objectives.The schema correctly captures evaluation results including errors, warnings, operational state, and optional detailed outputs. Using
z.unknown()forjsonOutputis the right approach for arbitrary JSON structures.Consider adding
z.number().int().nonnegative()constraints toerrors,warnings, andrequestFailuresto prevent invalid negative counts. Fortimestamp, you could add.positive()to ensure it's a valid Unix timestamp, but these are defensive refinements rather than critical fixes.
56-59: CACHE_SCHEMA provides a clean, versioned cache structure.The top-level schema with version and entries map is well-designed for managing cached evaluation results. Exporting only the top-level schema maintains good encapsulation.
Consider adding
.int().positive()to theversionfield to enforce valid version numbers:🔎 Optional refinement
export const CACHE_SCHEMA = z.object({ - version: z.number(), + version: z.number().int().positive(), entries: z.record(z.string(), CACHED_RESULT_SCHEMA) });
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/cache/cache-store.ts(1 hunks)src/cache/content-hasher.ts(1 hunks)src/cache/types.ts(1 hunks)src/output/reporter.ts(1 hunks)src/schemas/cache-schema.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/cache/cache-store.ts
- src/cache/types.ts
- src/cache/content-hasher.ts
| @@ -0,0 +1,59 @@ | |||
| import { z } from 'zod'; | |||
There was a problem hiding this comment.
Zod 3.25.76 has critical TypeScript compatibility issues—consider downgrading or upgrading to Zod 4.
Zod version 3.25.76 introduced breaking changes that require TypeScript 5+ and drop support for TypeScript 4.9. No direct security vulnerabilities exist in this version, but the bundle size is significantly increased because Zod 4 is bundled within Zod 3. Either downgrade to 3.22.4 (the last compatible version) if your project targets TypeScript 4.9, or upgrade to Zod 4 for the latest stability and performance improvements.
🤖 Prompt for AI Agents
In src/schemas/cache-schema.ts around line 1, the project currently imports Zod;
the review warns that Zod 3.25.76 is incompatible with TypeScript <5 and bloated
due to bundled Zod 4 — either pin Zod to 3.22.4 if you must remain on TypeScript
4.9 or upgrade to Zod 4.x: update package.json (change version), run npm/yarn
install to refresh lockfile, then if upgrading to Zod 4, update any Zod API
changes in code (adjust imports/validation calls/types as needed) and ensure
CI/tsconfig target is set to TS5+ before running tests and typechecks.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/commands.ts
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with noany; useunknown+ schema validation for external data
Use custom error types with proper inheritance; catch blocks useunknowntype
Files:
src/cli/commands.ts
🧠 Learnings (4)
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/boundaries/env-parser.ts : All environment variables must be validated via Zod schemas in `src/boundaries/env-parser.ts`
Applied to files:
src/cli/commands.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Recursively load all `.md` files from within each rule pack; multiple packs can be used simultaneously
Applied to files:
src/cli/commands.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Organize rules into subdirectories (packs) within RulesPath; use company names (e.g., Acme, TechCorp, Stripe) to indicate which style guide the rules implement
Applied to files:
src/cli/commands.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to .vectorlint.ini : Use `[glob/pattern]` sections in `.vectorlint.ini` to specify which packs run on which files
Applied to files:
src/cli/commands.ts
🧬 Code graph analysis (1)
src/cli/commands.ts (5)
src/config/constants.ts (1)
DEFAULT_CONFIG_FILENAME(6-6)src/boundaries/cli-parser.ts (1)
parseCliOptions(4-15)src/errors/index.ts (1)
handleUnknownError(46-51)src/providers/search-provider.ts (1)
SearchProvider(5-7)src/providers/perplexity-provider.ts (1)
PerplexitySearchProvider(11-54)
🔇 Additional comments (2)
src/cli/commands.ts (2)
180-188: LGTM: OutputFormat mapping is well-implemented.The mapping from CLI string options to the
OutputFormatenum is clean and includes a sensible default fallback toOutputFormat.Line. This provides type safety and integrates well with the conditional summary printing logic.
203-204: LGTM: Cache integration is correctly implemented.The new cache-related options (
cacheEnabledandforceFullRun) are properly derived from CLI flags and passed to the orchestrator. The conditional summary printing forOutputFormat.Lineensures structured output formats remain clean.Also applies to: 208-208
This PR introduces a caching mechanism to significantly reduce LLM token costs and improve execution speed for unchanged content.
Key Changes
src/cache/to handle SHA256 hashing of file content and prompt configurations (including metadata).evaluateFilenow checks.vectorlint/cache.jsonbefore triggering LLM calls.--full: Forces a full re-evaluation, ignoring the cache.--no-cache: Disables reading from or writing to the cache.Impact
Summary by CodeRabbit
New Features
Types & Validation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.