feat: add negative data cache for repeatedly not-found data IDs (PE-8985)#630
feat: add negative data cache for repeatedly not-found data IDs (PE-8985)#630
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a NegativeDataCache feature: new NEGATIVE_CACHE_* env configs, header constant, Prometheus metrics, a NegativeDataCache class (miss-tracker + negative-cache with LRU, TTL, promotion/eviction), unit tests, and wiring into system and data handlers to short‑circuit repeated misses and evict on success. Changes
Sequence DiagramsequenceDiagram
actor Client
participant DataHandler as Data Handler
participant NegCache as NegativeDataCache
participant MissTracker as Miss Tracker (LRU)
participant NegCacheStore as Negative Cache (LRU)
participant Metrics as Metrics Registry
Client->>DataHandler: Request data (ID)
DataHandler->>NegCache: isNegativelyCached(ID)?
NegCache->>NegCacheStore: Check entry
alt Cache Hit
NegCacheStore-->>NegCache: true
NegCache->>Metrics: Increment hits_total
NegCache-->>DataHandler: true
DataHandler-->>Client: 404 + X-AR-IO-Negative-Cache
else Cache Miss
NegCacheStore-->>NegCache: false
NegCache->>Metrics: Increment misses_total
NegCache-->>DataHandler: false
DataHandler->>DataHandler: Attempt retrieval
alt Retrieval Fails
DataHandler->>NegCache: recordMiss(ID)
NegCache->>MissTracker: Update/create miss entry
alt Promotion thresholds met
MissTracker-->>NegCache: promote
NegCache->>NegCacheStore: Add to negative cache
NegCache->>Metrics: Increment promotions_total
end
DataHandler-->>Client: 404
else Retrieval Succeeds
DataHandler->>NegCache: evict(ID)
NegCache->>NegCacheStore: Remove if present
NegCache->>Metrics: Increment evictions_total
DataHandler-->>Client: 200 + data
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/system.ts (1)
224-232: Add TSDoc for the shared negative-cache singleton.This export is now part of the app-wide system surface, so it should document that it is process-wide and shared across handlers.
As per coding guidelines, "Add or improve TSDoc comments when modifying code to enhance documentation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/system.ts` around lines 224 - 232, Add a TSDoc block above the exported negativeDataCache constant (the NegativeDataCache singleton) that clearly states this instance is process-wide, shared across all request handlers/modules, and that its behavior is controlled by the listed config options (NEGATIVE_CACHE_ENABLED, NEGATIVE_CACHE_MAX_SIZE, NEGATIVE_CACHE_TTL_MS, NEGATIVE_CACHE_MISS_COUNT_THRESHOLD, NEGATIVE_CACHE_MISS_THRESHOLD_MS); include brief notes on its purpose (caching negative lookup results), intended lifetime, and any concurrency/side-effect considerations so callers understand that this is a global singleton used by the application.src/data/negative-data-cache.test.ts (1)
155-191: Avoid asserting onprom-clientinternals.These checks reach into
Counter.hashMap, which is not a public API. Prefer the public metric getters/registry so this test does not break on aprom-clientupgrade.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/data/negative-data-cache.test.ts` around lines 155 - 191, The test is inspecting prom-client internals (Counter.hashMap) which is fragile; instead retrieve metric values via the prom-client public API (register.getSingleMetric or the metric's .get() method) for the counters used in this file (metrics.negativeCacheHitsTotal, metrics.negativeCacheMissesTotal, metrics.negativeCachePromotionsTotal, metrics.negativeCacheEvictionsTotal). Update the assertions to call the public getter (e.g., register.getSingleMetric('negative_cache_hits_total')?.get() or metrics.negativeCacheHitsTotal.get()) and extract the value for the no-label series (or matching label set) to compute deltas after calling createCache, cache.isNegativelyCached, cache.recordMiss, and cache.evict, rather than reaching into .hashMap.src/data/negative-data-cache.ts (1)
18-116: Add TSDoc for the new public cache API.This introduces a reusable class with a non-trivial constructor contract and three public entry points, but none of the promotion/eviction semantics are documented. A short TSDoc block on the class plus
isNegativelyCached,recordMiss, andevictwould make the threshold/TTL behavior much easier to consume correctly. As per coding guidelines,**/*.{ts,tsx,js}: Add or improve TSDoc comments when modifying code to enhance documentation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/data/negative-data-cache.ts` around lines 18 - 116, Add TSDoc to the NegativeDataCache class and its public API (constructor, isNegativelyCached, recordMiss, evict) describing the constructor contract (parameters: enabled, maxSize, ttlMs, missCountThreshold, missDurationMs, optional now) and the semantics: isNegativelyCached returns whether an id is in the negative cache (and increments metrics), recordMiss updates the missTracker (firstSeenAt/lastSeenAt/count), promotes an id into negativeCache when count >= missCountThreshold AND now - firstSeenAt >= missDurationMs (then increments promotion metric and logs), and evict removes id from negativeCache and missTracker (increments eviction metric and logs); also mention LRU cache behavior, TTL for negativeCache, and that methods are no-ops when enabled is false where applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/data/negative-data-cache.ts`:
- Around line 60-71: In isNegativelyCached(id: string) replace
negativeCache.has(id) with a negativeCache.get(id) lookup so access updates
recency and prunes stale entries; treat a truthy get result as a hit (increment
metrics.negativeCacheHitsTotal and return true) and for misses increment
metrics.negativeCacheMissesTotal and return false; after any get that may have
removed stale entries, call updateGauges() only when negativeCache.size has
actually changed (compare before/after) to keep the negativeCacheSize gauge in
sync.
In `@src/routes/data/handlers.ts`:
- Line 1421: When evicting the original id you must also evict the
manifest-resolved target to avoid stale 404s: after calling
negativeDataCache?.evict(id) add logic to check for the manifest-resolved id
(the value used in sendManifestResponse(), e.g. resolvedId or however the
resolved target is named in the handler) and call
negativeDataCache?.evict(resolvedId) as well (guarding for undefined). Update
the code path that serves manifests (sendManifestResponse) to ensure the
resolved identifier used there is the same symbol you evict so both the request
id and its resolved manifest id are removed from the negative cache.
- Around line 849-850: The handler is learning misses for any exception from
getDataAttributes()/getData() by unconditionally calling
negativeDataCache.recordMiss(id) and setting span http.status_code to 404;
change this so you only treat authoritative "not found" responses as
negative-cacheable: in the catch blocks around getDataAttributes() and getData()
check the error type or status (e.g., an explicit NotFoundError, a 404 HTTP
error, or the repository layer's sentinel for missing entities) and only then
call negativeDataCache.recordMiss(id) and span.setAttribute('http.status_code',
404); for all other exceptions (transient DB errors, timeouts, 5xxs) rethrow or
handle without recording a miss. Ensure the same conditional logic is applied to
the other similar catch sites that currently call recordMiss(id).
---
Nitpick comments:
In `@src/data/negative-data-cache.test.ts`:
- Around line 155-191: The test is inspecting prom-client internals
(Counter.hashMap) which is fragile; instead retrieve metric values via the
prom-client public API (register.getSingleMetric or the metric's .get() method)
for the counters used in this file (metrics.negativeCacheHitsTotal,
metrics.negativeCacheMissesTotal, metrics.negativeCachePromotionsTotal,
metrics.negativeCacheEvictionsTotal). Update the assertions to call the public
getter (e.g., register.getSingleMetric('negative_cache_hits_total')?.get() or
metrics.negativeCacheHitsTotal.get()) and extract the value for the no-label
series (or matching label set) to compute deltas after calling createCache,
cache.isNegativelyCached, cache.recordMiss, and cache.evict, rather than
reaching into .hashMap.
In `@src/data/negative-data-cache.ts`:
- Around line 18-116: Add TSDoc to the NegativeDataCache class and its public
API (constructor, isNegativelyCached, recordMiss, evict) describing the
constructor contract (parameters: enabled, maxSize, ttlMs, missCountThreshold,
missDurationMs, optional now) and the semantics: isNegativelyCached returns
whether an id is in the negative cache (and increments metrics), recordMiss
updates the missTracker (firstSeenAt/lastSeenAt/count), promotes an id into
negativeCache when count >= missCountThreshold AND now - firstSeenAt >=
missDurationMs (then increments promotion metric and logs), and evict removes id
from negativeCache and missTracker (increments eviction metric and logs); also
mention LRU cache behavior, TTL for negativeCache, and that methods are no-ops
when enabled is false where applicable.
In `@src/system.ts`:
- Around line 224-232: Add a TSDoc block above the exported negativeDataCache
constant (the NegativeDataCache singleton) that clearly states this instance is
process-wide, shared across all request handlers/modules, and that its behavior
is controlled by the listed config options (NEGATIVE_CACHE_ENABLED,
NEGATIVE_CACHE_MAX_SIZE, NEGATIVE_CACHE_TTL_MS,
NEGATIVE_CACHE_MISS_COUNT_THRESHOLD, NEGATIVE_CACHE_MISS_THRESHOLD_MS); include
brief notes on its purpose (caching negative lookup results), intended lifetime,
and any concurrency/side-effect considerations so callers understand that this
is a global singleton used by the application.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 670458fc-cd4b-4077-95d4-94249e7b30f8
📒 Files selected for processing (9)
docs/envs.mdsrc/config.tssrc/constants.tssrc/data/negative-data-cache.test.tssrc/data/negative-data-cache.tssrc/metrics.tssrc/routes/data/handlers.tssrc/routes/data/index.tssrc/system.ts
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #630 +/- ##
===========================================
+ Coverage 76.63% 76.82% +0.19%
===========================================
Files 104 105 +1
Lines 35572 35885 +313
Branches 2587 2618 +31
===========================================
+ Hits 27259 27569 +310
- Misses 8278 8281 +3
Partials 35 35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a negative data cache to reduce repeated upstream work for consistently missing data IDs by short-circuiting future requests with 404s, with new configuration and Prometheus observability.
Changes:
- Introduces
NegativeDataCachewith miss-tracking → promotion → TTL-based negative caching. - Integrates negative-cache check / miss recording / eviction into raw and standard data handlers.
- Adds new env vars, response header, and Prometheus counters/gauges for the feature.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/system.ts | Instantiates and exports the NegativeDataCache singleton from config. |
| src/routes/data/index.ts | Wires negativeDataCache into handler factories. |
| src/routes/data/handlers.ts | Uses negative cache to short-circuit 404s, record misses, and evict on success. |
| src/metrics.ts | Adds negative-cache counters/gauges. |
| src/data/negative-data-cache.ts | Implements two-phase miss tracker + negative cache with TTL and metrics/logging. |
| src/data/negative-data-cache.test.ts | Adds unit tests for promotion, eviction, TTL, LRU behavior, and metrics. |
| src/constants.ts | Adds X-AR-IO-Negative-Cache header name constant. |
| src/config.ts | Adds NEGATIVE_CACHE_* env var configuration. |
| docs/envs.md | Documents new negative-cache env vars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| message: error.message, | ||
| stack: error.stack, | ||
| }); | ||
| negativeDataCache?.recordMiss(id); |
There was a problem hiding this comment.
negativeDataCache.recordMiss(id) runs on any exception from getDataAttributes(). Since CompositeDataAttributesSource can throw on upstream errors (not just “not found”), this can poison the negative cache and return persistent 404s for IDs that do exist. Only record a miss when the error is a definite not-found signal (e.g., 404/status code or a dedicated error type).
| negativeDataCache?.recordMiss(id); | |
| const isNotFoundError = | |
| error?.status === 404 || | |
| error?.statusCode === 404 || | |
| error?.code === 'NOT_FOUND'; | |
| if (isNotFoundError) { | |
| negativeDataCache?.recordMiss(id); | |
| } |
There was a problem hiding this comment.
Already addressed in the current code — both getDataAttributes() catch blocks explicitly avoid calling recordMiss() with the comment: "Don't record a negative cache miss here — getDataAttributes throwing indicates a transient infrastructure error, not that the data doesn't exist." (lines 902-904 and 1386-1388).
| message: error.message, | ||
| stack: error.stack, | ||
| }); | ||
| negativeDataCache?.recordMiss(id); |
There was a problem hiding this comment.
negativeDataCache.recordMiss(id) is called for any getData() exception (excluding AbortError). If getData() fails transiently (network, circuit breaker, peer outages), this will incorrectly negative-cache the ID and short-circuit subsequent requests with 404. Limit recordMiss to confirmed “not found” failures (e.g., a typed not-found error or explicit 404 response) and don’t negative-cache generic retrieval failures.
| negativeDataCache?.recordMiss(id); | |
| if ( | |
| error?.status === 404 || | |
| error?.statusCode === 404 || | |
| error?.name === 'NotFoundError' || | |
| error?.code === 'DATA_NOT_FOUND' | |
| ) { | |
| negativeDataCache?.recordMiss(id); | |
| } |
There was a problem hiding this comment.
Not changing this — same rationale as the earlier CodeRabbit discussion. The promotion thresholds (10 misses over 4 hours) provide sufficient safety margin against sporadic transient errors. Fast re-promotion after eviction uses short initial TTLs with exponential backoff, and evict-on-success self-heals when data becomes available. Adding error-type discrimination would require threading error semantics through the composite data source layers for marginal benefit.
| this.updateGauges(); | ||
| } | ||
|
|
||
| private updateGauges(): void { |
There was a problem hiding this comment.
updateGauges() reports cache sizes via LRUCache.size, but expired TTL entries may still be counted until the cache is accessed/purged (lru-cache only evicts stale entries opportunistically). This can make negativeCacheSize/missTrackerSize misleading. Consider purging stale entries before reading size (e.g., purgeStale()) or enabling an approach that keeps the gauge accurate with TTL-based eviction.
| private updateGauges(): void { | |
| private updateGauges(): void { | |
| // Ensure TTL-expired entries are removed before reporting sizes | |
| this.negativeCache.purgeStale(); | |
| this.missTracker.purgeStale(); |
There was a problem hiding this comment.
Not changing this. With ttlResolution: 0, staleness is checked on every get() call, so expired entries are pruned on access. The gauge will be slightly overstated between accesses but purgeStale() has a meaningful performance cost (iterates the full cache) and updateGauges() is called on every recordMiss() and evict(). The marginal metric accuracy improvement doesn't justify the overhead.
| this.negativeCache.set(id, true); | ||
| this.missTracker.delete(id); | ||
| metrics.negativeCachePromotionsTotal.inc(); | ||
| this.log.info('ID promoted to negative cache', { id }); | ||
| } | ||
|
|
||
| this.updateGauges(); | ||
| } | ||
|
|
||
| evict(id: string): void { | ||
| if (this.negativeCache.has(id)) { | ||
| metrics.negativeCacheEvictionsTotal.inc(); | ||
| this.log.info('ID evicted from negative cache', { id }); | ||
| } |
There was a problem hiding this comment.
This emits info-level logs on every promotion/eviction. Depending on traffic patterns, this can become high-volume and add significant log noise/cost. Consider lowering to debug, sampling, or gating behind a separate verbosity flag while keeping metrics as the primary signal.
There was a problem hiding this comment.
Keeping this at info. The promotion thresholds (10 misses over 4 hours by default) make these events infrequent enough that info is appropriate — these are operationally significant events that operators should see without enabling debug logging.
| it('metrics increment correctly', () => { | ||
| const hitsBefore = | ||
| (metrics.negativeCacheHitsTotal as any).hashMap[''].value ?? 0; | ||
| const missesBefore = | ||
| (metrics.negativeCacheMissesTotal as any).hashMap[''].value ?? 0; | ||
| const promotionsBefore = | ||
| (metrics.negativeCachePromotionsTotal as any).hashMap[''].value ?? 0; | ||
| const evictionsBefore = | ||
| (metrics.negativeCacheEvictionsTotal as any).hashMap[''].value ?? 0; |
There was a problem hiding this comment.
The metrics assertions read prom-client internals via (counter as any).hashMap, which is not part of the public API and is prone to break across prom-client versions. Prefer using the public metric APIs (e.g., Counter.get()/register.getSingleMetric().get()) to fetch the current value so the test remains stable.
There was a problem hiding this comment.
Already addressed — the tests no longer use (counter as any).hashMap. They use the public .get() API (see getValue helper in the current test file).
| negativeDataCache?.recordMiss(id); | ||
| span.setAttribute('http.status_code', 404); | ||
| sendNotFound(res); |
There was a problem hiding this comment.
negativeDataCache.recordMiss(id) is called on any exception from getDataAttributes(). That catch can represent transient upstream/DB failures (not a confirmed 404), so this can promote valid IDs into the negative cache and force 404s for the TTL. Consider recording a miss only when the failure is definitively “not found” (e.g., error has a 404 status / specific not-found error type), and avoid negative-caching on generic/internal errors (potentially return 5xx instead).
| negativeDataCache?.recordMiss(id); | |
| span.setAttribute('http.status_code', 404); | |
| sendNotFound(res); | |
| const status = error?.statusCode ?? error?.status; | |
| const code = error?.code; | |
| const isNotFoundError = | |
| status === 404 || | |
| code === 'NOT_FOUND' || | |
| code === 'NotFoundError'; | |
| if (isNotFoundError) { | |
| negativeDataCache?.recordMiss(id); | |
| span.setAttribute('http.status_code', 404); | |
| sendNotFound(res); | |
| } else { | |
| span.setAttribute('http.status_code', 502); | |
| res | |
| .status(502) | |
| .send('Upstream error while retrieving data attributes'); | |
| } |
There was a problem hiding this comment.
Same as above — already addressed in the current code. The getDataAttributes() catch blocks do not call recordMiss().
| log.warn('Unable to retrieve contiguous data:', { | ||
| dataId: id, | ||
| message: error.message, | ||
| stack: error.stack, | ||
| }); | ||
| negativeDataCache?.recordMiss(id); | ||
| sendNotFound(res); |
There was a problem hiding this comment.
negativeDataCache.recordMiss(id) is invoked for any getData() failure (other than AbortError). getData() can fail for transient reasons (timeouts, upstream errors), and negative-caching those failures can incorrectly short-circuit future requests with 404s. Record misses only for confirmed not-found cases (e.g., explicit not-found error / 404), and avoid negative-caching generic retrieval failures.
There was a problem hiding this comment.
Same as above — intentionally not changing. See discussion on the CodeRabbit thread for full rationale.
…985) Two-phase cache that tracks IDs consistently missing over configurable count and duration thresholds, then short-circuits future requests with 404 responses. Reduces unnecessary upstream load from repeated requests for non-existent data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
has() does not update LRU recency, so frequently checked negative IDs could be evicted despite being hot. get() properly updates recency and prunes stale TTL entries on access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b50f7e0 to
f1bba3b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/data/negative-data-cache.test.ts (1)
155-192: Consider using a more stable approach for metric assertions.The test accesses the internal
hashMapstructure of prom-client counters (e.g.,(metrics.negativeCacheHitsTotal as any).hashMap[''].value). This relies on prom-client internals that could change between versions.Consider using the public
.get()method on the metric if available, or accepting this as a pragmatic approach for internal testing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/data/negative-data-cache.test.ts` around lines 155 - 192, The test reads prom-client internals (hashMap) to assert metric deltas; replace those fragile accesses by calling the public get() on each counter (metrics.negativeCacheHitsTotal.get(), metrics.negativeCacheMissesTotal.get(), metrics.negativeCachePromotionsTotal.get(), metrics.negativeCacheEvictionsTotal.get()) and extract the matching metric value (e.g., find the entry with labels {} or label '' then use its .value) to compute before/after deltas for assertions; update the four places in the test (hitsBefore/hitsAfter, missesBefore/missesAfter, promotionsBefore/promotionsAfter, evictionsBefore/evictionsAfter) to use this stable .get() approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/data/negative-data-cache.test.ts`:
- Around line 155-192: The test reads prom-client internals (hashMap) to assert
metric deltas; replace those fragile accesses by calling the public get() on
each counter (metrics.negativeCacheHitsTotal.get(),
metrics.negativeCacheMissesTotal.get(),
metrics.negativeCachePromotionsTotal.get(),
metrics.negativeCacheEvictionsTotal.get()) and extract the matching metric value
(e.g., find the entry with labels {} or label '' then use its .value) to compute
before/after deltas for assertions; update the four places in the test
(hitsBefore/hitsAfter, missesBefore/missesAfter,
promotionsBefore/promotionsAfter, evictionsBefore/evictionsAfter) to use this
stable .get() approach.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8bc66ab8-83ff-4ae6-ba95-0ff9eefd166f
📒 Files selected for processing (9)
docs/envs.mdsrc/config.tssrc/constants.tssrc/data/negative-data-cache.test.tssrc/data/negative-data-cache.tssrc/metrics.tssrc/routes/data/handlers.tssrc/routes/data/index.tssrc/system.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- src/routes/data/handlers.ts
- docs/envs.md
- src/config.ts
Inject time into lru-cache via perf.now and ttlResolution: 0 so TTL tests don't rely on setTimeout. Use public prom-client API for metrics assertions instead of internal hashMap access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/data/negative-data-cache.ts (2)
18-42: Add TSDoc for the exported cache API.This introduces a new exported class and public methods, but none of the promotion/lookup/eviction behavior is documented yet. Please add TSDoc on the class and each public method so the thresholds and side effects are discoverable.
As per coding guidelines,
**/*.{ts,tsx,js}: Add or improve TSDoc comments when modifying code to enhance documentation.Also applies to: 62-124
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/data/negative-data-cache.ts` around lines 18 - 42, Add TSDoc comments to the exported NegativeDataCache class and every public method (including the constructor and any methods that perform promotion, lookup, add, eviction or clearing) describing purpose, parameters, return values, thresholds (missCountThreshold, missDurationMs, ttlMs, maxSize), side effects (when entries are promoted to negativeCache, when missTracker is incremented, eviction behavior) and any important usage notes; specifically document the NegativeDataCache class, its constructor signature, and each public method so callers can discover the cache semantics, threshold behavior and observable side effects.
54-59: Remove hardcodedttlResolution: 0from production cache and use the default or make it configurable.The
ttlResolution: 0setting forceslru-cacheto check time on every staleness test, reducing performance. Per the lru-cache documentation,ttlResolution: 0is "theoretically unnecessary" and the defaultttlResolution: 1is recommended for production caches. The commit message shows this setting was added for deterministic testing with an injectednow()function, but it applies unconditionally to the production code path as well.Either:
- Use the default
ttlResolution(1) in production and only set to 0 in tests, or- Make
ttlResolutiona constructor parameter (likenowalready is) to allow test control without hardcoding in production.Since
isNegativelyCached()runs on the request path, the performance overhead is avoidable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/data/negative-data-cache.ts` around lines 54 - 59, The LRUCache instantiation in the NegativeDataCache (this.negativeCache) currently hardcodes ttlResolution: 0 which hurts production performance; change the constructor for NegativeDataCache to accept an optional ttlResolution (or default to undefined/1) alongside the existing now param and pass that value into the LRUCache options (or omit the option so the library default is used), ensuring tests can pass 0 while production uses the default resolution; update the constructor signature and the LRUCache options creation in the class that initializes this.negativeCache to reference the new ttlResolution parameter instead of the hardcoded 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/data/negative-data-cache.ts`:
- Around line 116-123: evict() still performs has/delete and updates gauges even
when the negative cache is disabled, causing unnecessary per-request overhead;
modify the evict(id: string) method to return early when the cache is disabled
(same condition used by isNegativelyCached()/recordMiss()), so that it skips
checking this.negativeCache.has(id), this.negativeCache.delete(id),
this.missTracker.delete(id) and this.updateGauges() when disabled; ensure the
early-return check references the same disabled flag or helper used elsewhere to
keep behavior consistent.
---
Nitpick comments:
In `@src/data/negative-data-cache.ts`:
- Around line 18-42: Add TSDoc comments to the exported NegativeDataCache class
and every public method (including the constructor and any methods that perform
promotion, lookup, add, eviction or clearing) describing purpose, parameters,
return values, thresholds (missCountThreshold, missDurationMs, ttlMs, maxSize),
side effects (when entries are promoted to negativeCache, when missTracker is
incremented, eviction behavior) and any important usage notes; specifically
document the NegativeDataCache class, its constructor signature, and each public
method so callers can discover the cache semantics, threshold behavior and
observable side effects.
- Around line 54-59: The LRUCache instantiation in the NegativeDataCache
(this.negativeCache) currently hardcodes ttlResolution: 0 which hurts production
performance; change the constructor for NegativeDataCache to accept an optional
ttlResolution (or default to undefined/1) alongside the existing now param and
pass that value into the LRUCache options (or omit the option so the library
default is used), ensuring tests can pass 0 while production uses the default
resolution; update the constructor signature and the LRUCache options creation
in the class that initializes this.negativeCache to reference the new
ttlResolution parameter instead of the hardcoded 0.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 52db6677-7981-40a5-a007-9c99b681e6eb
📒 Files selected for processing (2)
src/data/negative-data-cache.test.tssrc/data/negative-data-cache.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/data/negative-data-cache.test.ts
…ent errors The miss tracker had no TTL, so stale entries could persist indefinitely and cause incorrect promotions on a single new miss. Adding a TTL equal to missDurationMs ensures entries expire naturally. Also stops recording negative cache misses when getDataAttributes throws, since exceptions indicate transient infrastructure errors, not missing data. Bumps the default NEGATIVE_CACHE_MISS_THRESHOLD_MS from 4h to 8h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/routes/data/handlers.ts (2)
1440-1456:⚠️ Potential issue | 🟠 MajorEvict the manifest request ID on the index-resolution success path.
When
resolveFromIndex()+sendManifestResponse()succeeds, this branch returns without clearing any existing miss-tracker state forid. The directgetData()success path already self-heals at Line 1478, but this success path does not, so a later failure can still promote a now-valid manifest ID into the negative cache.Suggested change
if ( await sendManifestResponse({ log, req, res, dataAttributesSource, dataSource, requestAttributes, rateLimiter, paymentProcessor, parentSpan: span, ...manifestResolution, }) ) { + negativeDataCache?.evict(id); span.setAttribute('http.status_code', res.statusCode); span.addEvent('Manifest response sent successfully'); // Manifest response successfully sent return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/data/handlers.ts` around lines 1440 - 1456, On the successful index-resolution + sendManifestResponse() path in handlers.ts (the branch that currently returns after span.addEvent('Manifest response sent successfully')), add the same negative-cache eviction used on the getData() success path so the manifest request ID is removed from the miss-tracker; locate the getData() success code to copy the exact eviction call (e.g., missTracker.evict(id) or evictMissingManifest(id)) and invoke it just before returning from the resolveFromIndex()/sendManifestResponse() success branch.
890-907:⚠️ Potential issue | 🟠 MajorDon't turn transient attribute-store failures into 404s.
The new comment says
getDataAttributes()throwing is an infrastructure error, but Line 905 and Line 1389 still returnNot found. That hides real 5xx conditions and can send false 404s for data that would otherwise be retrievable. Re-throw here or surface a 500 instead of callingsendNotFound(res).Suggested change
- span.setAttribute('http.status_code', 404); - sendNotFound(res); - return; + throw error;Apply the same change to both catch blocks.
Also applies to: 1374-1391
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/data/handlers.ts` around lines 890 - 907, The catch blocks that handle errors from getDataAttributes() currently log the error but then mark the span with http.status_code=404 and call sendNotFound(res), which masks transient infrastructure failures as 404s; instead, for the catch blocks around getDataAttributes() (referenced by getDataAttributes, span.setAttribute('http.status_code', ...), and sendNotFound), either re-throw the error or send a 500 response: set the span http.status_code to 500, record the exception, log the error (keep existing log.error), and call sendServerError(res) or res.status(500).send(...) rather than sendNotFound(res); make the same change in both catch blocks mentioned in the comment so transient errors surface as 5xx rather than false 404s.
🧹 Nitpick comments (2)
src/routes/data/handlers.ts (1)
796-812: Document the newnegativeDataCachedependency on the exported handlers.
createRawDataHandlerandcreateDataHandlerboth gained an optional parameter that changes 404 behavior, but the exported factories still have no TSDoc describing the short-circuit / record / evict lifecycle. Please add a brief doc block or shared interface comment.As per coding guidelines,
**/*.{ts,tsx,js}: Add or improve TSDoc comments when modifying code to enhance documentation.Also applies to: 1271-1289
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/data/handlers.ts` around lines 796 - 812, Add a TSDoc comment to the exported handler factories (createRawDataHandler and createDataHandler) that documents the new optional negativeDataCache parameter and its lifecycle: explain that when provided the handler will short-circuit 404 / negative lookups, record negative entries on misses, and evict/refresh entries when writes or TTLs occur; mention it is optional, how it alters 404 behavior, and any expected interface/semantics of NegativeDataCache (e.g., record/evict/check methods). Place the doc block immediately above each factory export so callers and generated typings clearly convey the short-circuit/record/evict behavior.src/data/negative-data-cache.ts (1)
18-63: Add TSDoc for the public cache contract.This new exported class has non-obvious behavior around promotion thresholds, TTLs, and what
evict()clears, but there’s no API-level documentation yet. A short TSDoc block on the class and its public methods would make the handler integration and tests much easier to reason about.As per coding guidelines,
**/*.{ts,tsx,js}: Add or improve TSDoc comments when modifying code to enhance documentation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/data/negative-data-cache.ts` around lines 18 - 63, Add a TSDoc block for the exported class NegativeDataCache and TSDoc for each public method (including constructor, evict, and any other public API) that documents the cache contract: explain the enabled flag behavior, how missTracker and negativeCache interact (promotion threshold via missCountThreshold and time window via missDurationMs), what TTLs (ttlMs for negativeCache and missDurationMs for missTracker) control, what evict() clears (both missTracker and negativeCache) and any side effects, and note the purpose of the now callback parameter; place the docs immediately above the class declaration and each public method so callers and tests can understand promotion, TTL and eviction semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/routes/data/handlers.ts`:
- Around line 1440-1456: On the successful index-resolution +
sendManifestResponse() path in handlers.ts (the branch that currently returns
after span.addEvent('Manifest response sent successfully')), add the same
negative-cache eviction used on the getData() success path so the manifest
request ID is removed from the miss-tracker; locate the getData() success code
to copy the exact eviction call (e.g., missTracker.evict(id) or
evictMissingManifest(id)) and invoke it just before returning from the
resolveFromIndex()/sendManifestResponse() success branch.
- Around line 890-907: The catch blocks that handle errors from
getDataAttributes() currently log the error but then mark the span with
http.status_code=404 and call sendNotFound(res), which masks transient
infrastructure failures as 404s; instead, for the catch blocks around
getDataAttributes() (referenced by getDataAttributes,
span.setAttribute('http.status_code', ...), and sendNotFound), either re-throw
the error or send a 500 response: set the span http.status_code to 500, record
the exception, log the error (keep existing log.error), and call
sendServerError(res) or res.status(500).send(...) rather than sendNotFound(res);
make the same change in both catch blocks mentioned in the comment so transient
errors surface as 5xx rather than false 404s.
---
Nitpick comments:
In `@src/data/negative-data-cache.ts`:
- Around line 18-63: Add a TSDoc block for the exported class NegativeDataCache
and TSDoc for each public method (including constructor, evict, and any other
public API) that documents the cache contract: explain the enabled flag
behavior, how missTracker and negativeCache interact (promotion threshold via
missCountThreshold and time window via missDurationMs), what TTLs (ttlMs for
negativeCache and missDurationMs for missTracker) control, what evict() clears
(both missTracker and negativeCache) and any side effects, and note the purpose
of the now callback parameter; place the docs immediately above the class
declaration and each public method so callers and tests can understand
promotion, TTL and eviction semantics.
In `@src/routes/data/handlers.ts`:
- Around line 796-812: Add a TSDoc comment to the exported handler factories
(createRawDataHandler and createDataHandler) that documents the new optional
negativeDataCache parameter and its lifecycle: explain that when provided the
handler will short-circuit 404 / negative lookups, record negative entries on
misses, and evict/refresh entries when writes or TTLs occur; mention it is
optional, how it alters 404 behavior, and any expected interface/semantics of
NegativeDataCache (e.g., record/evict/check methods). Place the doc block
immediately above each factory export so callers and generated typings clearly
convey the short-circuit/record/evict behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b2679502-4327-4026-8f24-65b3d3fe9607
📒 Files selected for processing (5)
docs/envs.mdsrc/config.tssrc/data/negative-data-cache.test.tssrc/data/negative-data-cache.tssrc/routes/data/handlers.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/envs.md
- src/config.ts
- src/data/negative-data-cache.test.ts
Remove dead `lastSeenAt` field from MissTrackerEntry (never read). Use `delete()` return value instead of `has()` + `delete()` in evict() for a single atomic check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cache Reduce duration threshold default to 5 minutes so bursts trigger promotion. After first promotion, a single retry failure immediately re-promotes with doubled TTL (capped at configurable max). Eviction on successful retrieval clears promotion history entirely so the ID starts fresh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoid unnecessary LRU delete and gauge updates on every successful data fetch when the feature is off. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…es during outages Suppress negative cache promotions when the system-wide failure rate exceeds 80%, preventing transient errors (network timeouts, upstream 500s) from causing long-lived false 404s. Also gate recordMiss on dataAttributes being absent, so IDs with known local metadata are never negatively cached. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
NEGATIVE_CACHE_*), enabled by defaultTest plan
yarn test:file src/data/negative-data-cache.test.ts— 10 unit tests passyarn test— full suite passes (1486 pass, 0 fail)yarn build— compiles cleanlyyarn lint:check— no issuesyarn duplicate:check— no new duplicates🤖 Generated with Claude Code