[ALS-10714] Return ObfuscatedCount objects for cross-count types#248
Conversation
…-10714)
aggregate-data-sharing now emits a rich {count, display} object per
value in CATEGORICAL_CROSS_COUNT and CONTINUOUS_CROSS_COUNT responses
instead of just the obfuscated display string. The visualization
resource (and any downstream consumer that wants to plot the values)
gets the numeric count needed for chart math without having to parse
the display string.
- New ObfuscatedCount value class with int count + String display.
- aggregateCount now returns Optional<ObfuscatedCount>; the
below-threshold case carries threshold-1 as the numeric so the
bar still renders at a visible height while the display reads
"< threshold".
- randomize returns ObfuscatedCount carrying both the variance-
randomized numeric and the formatted display string.
- obfuscatedCrossCount writes ObfuscatedCount into the per-value
Map<String, Object> for CATEGORICAL_CROSS_COUNT and CONTINUOUS_CROSS_COUNT,
serialized by Jackson as {count, display}.
- COUNT and CROSS_COUNT (single-value and per-concept widgets used
outside the viz path) keep their plain-string contracts by mapping
ObfuscatedCount::display at the call sites.
- Same refactor applied to AggregateDataSharingResourceRSV3 to keep
the V3 parallel implementation consistent.
- Methods made package-private to enable direct unit tests.
- New ObfuscatedCountShapeTest covers the threshold, variance,
zero-treatment, and non-numeric cases for both V1 and V3.
…JSON-shape test Polish pass addressing self-review items on the prior commit: - ObfuscatedCount.ofInt(n) static factory replaces the inline new ObfuscatedCount(n, Integer.toString(n)) pattern. The "how do you wrap a plain int" rule belongs on the type, not scattered across callers. - Rename aggregateCount → applyThresholdFloor. The method's semantic is "if below threshold, return the floored display" — the old name suggested unrelated aggregation. Two overloads: int (the natural form, used by cross-counts) and String (kept for COUNT/CROSS_COUNT call sites that hold the value as JSON text). The Integer-vs-String aggregateCountHelper is gone. - randomize now takes int instead of String. The Integer.parseInt concern moves to the call site that actually has the string. - obfuscatedCrossCount renamed to obfuscateCrossCount, no longer mutates a Map<String, Map<String, Object>>; instead returns a fresh Map<String, Map<String, ObfuscatedCount>>. Output is strongly typed; the Object loosening that the original needed for in-place mutation is gone. - toInt(Object) helper handles the Integer-or-String input that Jackson hands us, inline at the one call site that needs it. - Comment on the threshold-1 numeric component explains why we pick that value (Plotly bar visible at the threshold ceiling) rather than 0 or threshold/2. - New ObfuscatedCountShapeTest covers the factory and pins the JSON wire shape via a round-trip test. The test failed immediately on the first run — Jackson couldn't serialize the manual class without @JsonCreator/@JsonProperty annotations. Without this test the cross-repo contract would have silently broken in production. Annotations now in place. - Same refactor applied to AggregateDataSharingResourceRSV3 to keep the parallel implementations consistent.
- Rewrite the stale processContinuousCrossCounts javadoc on V1 and V3. It described an "if obfuscation is not required, return data in string format" branch that no longer exists — both arms now unconditionally obfuscate. - Widen toInt from Integer/String to Number/String. The previous innerValue.toString() shape silently accepted Long/Double (Jackson picks Long for counts >= 2^31 and Double when JSON includes decimals). Narrowing to Integer would have crashed on those inputs; Number covers the full set without re-introducing the parse hop. Patient counts realistically don't hit Long range today, but the type at the boundary isn't pinned so don't assume. - toInt promoted to package-private + new test pinning all of Integer/Long/Double/Short/String to confirm the widening.
…ALS-10714) Add a nullable variance field to ObfuscatedCount: the half-width of the uncertainty band around count, rendered by consumers as [max(0, count - variance), count + variance]. randomize() carries the configured variance; applyThresholdFloor() now encodes below-threshold values as count 0 with variance threshold-1 (was count threshold-1) so the band rule reproduces the 0..threshold-1 display uniformly. Exact (authorized) values keep a null variance.
|
Warning Review limit reached
More reviews will be available in 20 minutes. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR introduces ChangesObfuscation refactoring
Sequence Diagram(s)Not applicable. This change refactors internal obfuscation logic without introducing new inter-component flows or user-facing interactions. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java (1)
136-140: ⚡ Quick winUse JSON-structure assertions instead of exact serialized string order.
Line 140, Line 155, and Line 158 currently make field order load-bearing despite the comment saying it isn’t. This can create brittle failures across mapper/version changes.
Proposed test-hardening diff
@@ - String json = mapper.writeValueAsString(value); - - // Exact-match assertion on serialized form. Field order is conventional for clarity but not load-bearing - // for consumers; we accept either ordering by JSON equality below. - assertEquals("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}", json); + String json = mapper.writeValueAsString(value); + assertEquals(mapper.readTree("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}"), mapper.readTree(json)); @@ - assertEquals("{\"count\":45000,\"display\":\"45000\",\"variance\":null}", mapper.writeValueAsString(ObfuscatedCount.ofInt(45000))); + assertEquals( + mapper.readTree("{\"count\":45000,\"display\":\"45000\",\"variance\":null}"), + mapper.readTree(mapper.writeValueAsString(ObfuscatedCount.ofInt(45000)))); @@ - assertEquals("{\"count\":0,\"display\":\"< 10\",\"variance\":9}", mapper.writeValueAsString(belowThreshold)); + assertEquals( + mapper.readTree("{\"count\":0,\"display\":\"< 10\",\"variance\":9}"), + mapper.readTree(mapper.writeValueAsString(belowThreshold)));Also applies to: 155-158
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java` around lines 136 - 140, The test in ObfuscatedCountShapeTest is asserting the exact serialized JSON string which makes field order load-bearing; instead deserialize the produced json (from mapper.writeValueAsString(value)) into a JSON structure (e.g., JsonNode or Map) and assert structural equality to an expected JSON structure so field order is ignored—replace the assertEquals("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}", json) (and the similar assertions at the other locations) with an assertion that compares mapper.readTree(json) (or equivalent) to an expected JsonNode built from the same fields.pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java (1)
23-32: 💤 Low valueConstructor accepts null
displaywithout validation.The constructor doesn't validate that
displayis non-null. WhileofIntalways passes a valid string, direct constructor calls (e.g., from deserialization of malformed JSON or fromapplyThresholdFloor/randomizein the resource classes) could theoretically pass null. Theequalsmethod usesObjects.equalswhich handles null gracefully, so this won't crash, but a null display would be semantically invalid for this domain object.Given that all current call sites in V1/V3 construct with non-null display strings, this is low risk.
🛡️ Optional: Add null check for display
`@JsonCreator` public ObfuscatedCount( `@JsonProperty`("count") int count, `@JsonProperty`("display") String display, `@JsonProperty`("variance") Integer variance ) { this.count = count; - this.display = display; + this.display = Objects.requireNonNull(display, "display must not be null"); this.variance = variance; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java` around lines 23 - 32, The ObfuscatedCount constructor currently allows a null display; add a non-null validation in the ObfuscatedCount(`@JsonProperty`("count") int count, `@JsonProperty`("display") String display, `@JsonProperty`("variance") Integer variance) constructor (used by deserialization and called from ofInt/applyThresholdFloor/randomize) and throw an explicit exception (e.g., NullPointerException or IllegalArgumentException) with a clear message when display is null to enforce the invariant that display must be non-null.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRS.java`:
- Line 533: The variable obfuscatedParents computed via
obfuscatedKeys.stream().flatMap(this::generateParents).collect(Collectors.toSet())
is never used; remove that dead computation from AggregateDataSharingResourceRS
(delete the obfuscatedParents declaration and stream call) so no unused local
remains, or if intended behavior was to include parent keys, instead merge the
result into obfuscatedKeys (e.g., addAll on obfuscatedKeys) — locate the code
around obfuscatedKeys and generateParents to apply the chosen fix.
- Around line 367-369: The code double-parses entityString which can cause an
uncaught NumberFormatException in the orElseGet lambda; modify the logic in
AggregateDataSharingResourceRS so entityString is parsed exactly once (or
parsing is attempted and its result cached) before calling
applyThresholdFloor(entityString) and before invoking
randomize/Integer.parseInt, e.g., attempt Integer.parseInt(entityString) into a
local int (or try parse and record failure), pass that value (or the
parse-failure state) into applyThresholdFloor/obfuscation logic (or fall back to
randomize using the cached int), or if prefered handle NumberFormatException
inside the orElseGet; ensure references include applyThresholdFloor,
ObfuscatedCount.display, randomize, entityString, and requestVariance so the
parse result is reused and no second parse can throw.
In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRSV3.java`:
- Around line 343-345: The code re-parses entityString in the orElseGet
fallback, causing an uncaught NumberFormatException when COUNT is non-numeric;
fix by parsing entityString once into a local int (inside a try/catch) and reuse
that parsed value for the randomize fallback used by the existing expression
involving applyThresholdFloor and randomize (refer to
applyThresholdFloor(String), ObfuscatedCount::display, and randomize(...)).
Specifically, call applyThresholdFloor(entityString) first; if it returns empty,
use the already-parsed int for randomize; if parsing fails catch
NumberFormatException, log/handle it and return the safe fallback (e.g., the
original entityString or the applyThresholdFloor result) instead of letting
parse propagate.
---
Nitpick comments:
In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java`:
- Around line 23-32: The ObfuscatedCount constructor currently allows a null
display; add a non-null validation in the ObfuscatedCount(`@JsonProperty`("count")
int count, `@JsonProperty`("display") String display, `@JsonProperty`("variance")
Integer variance) constructor (used by deserialization and called from
ofInt/applyThresholdFloor/randomize) and throw an explicit exception (e.g.,
NullPointerException or IllegalArgumentException) with a clear message when
display is null to enforce the invariant that display must be non-null.
In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java`:
- Around line 136-140: The test in ObfuscatedCountShapeTest is asserting the
exact serialized JSON string which makes field order load-bearing; instead
deserialize the produced json (from mapper.writeValueAsString(value)) into a
JSON structure (e.g., JsonNode or Map) and assert structural equality to an
expected JSON structure so field order is ignored—replace the
assertEquals("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}", json) (and
the similar assertions at the other locations) with an assertion that compares
mapper.readTree(json) (or equivalent) to an expected JsonNode built from the
same fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: bba9bde8-b476-456d-b7c6-15ec5c8700fe
📒 Files selected for processing (4)
pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRS.javapic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRSV3.javapic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.javapic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java
…ode (ALS-10714) Parse the COUNT entity string once so a non-numeric upstream response falls through unchanged instead of throwing an uncaught NumberFormatException in the orElseGet re-parse. Remove the unused obfuscatedParents computation and the generateParents helper it orphaned (dead since before this branch).
TL;DR: Moves obfuscation ownership into the API layer.
For CATEGORICAL_CROSS_COUNT / CONTINUOUS_CROSS_COUNT result types, agg-data-sharing now returns rich {count, display, variance} objects instead of bare display strings: count is the numeric bar value, display the human-readable label ("45000", "222 ±3", "< 10"), and variance the half-width of the uncertainty band (null for exact values; below-threshold encodes as count: 0, variance: threshold−1).
NOTE: Plain COUNT/CROSS_COUNT responses are unchanged (still strings). Uses the existing target.picsure.obfuscation_* properties — no new config.
Summary by CodeRabbit
New Features
Improvements
Tests