Skip to content

[ALS-10714] Return ObfuscatedCount objects for cross-count types#248

Merged
JamesPeck merged 5 commits into
mainfrom
ALS-10714-obf-in-api
Jun 11, 2026
Merged

[ALS-10714] Return ObfuscatedCount objects for cross-count types#248
JamesPeck merged 5 commits into
mainfrom
ALS-10714-obf-in-api

Conversation

@JamesPeck

@JamesPeck JamesPeck commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

TL;DR: Moves obfuscation ownership into the API layer.

For CATEGORICAL_CROSS_COUNT / CONTINUOUS_CROSS_COUNT result types, agg-data-sharing now returns rich {count, display, variance} objects instead of bare display strings: count is the numeric bar value, display the human-readable label ("45000", "222 ±3", "< 10"), and variance the half-width of the uncertainty band (null for exact values; below-threshold encodes as count: 0, variance: threshold−1).

NOTE: Plain COUNT/CROSS_COUNT responses are unchanged (still strings). Uses the existing target.picsure.obfuscation_* properties — no new config.

Summary by CodeRabbit

  • New Features

    • Enhanced privacy controls for data query responses with improved threshold-based obfuscation that consistently masks counts below privacy thresholds.
  • Improvements

    • Refined variance randomization mechanisms for consistent and predictable privacy band handling across different response types.
  • Tests

    • Comprehensive test suite added to validate privacy obfuscation behavior and data accuracy.

JamesPeck added 4 commits June 3, 2026 21:14
…-10714)

aggregate-data-sharing now emits a rich {count, display} object per
value in CATEGORICAL_CROSS_COUNT and CONTINUOUS_CROSS_COUNT responses
instead of just the obfuscated display string. The visualization
resource (and any downstream consumer that wants to plot the values)
gets the numeric count needed for chart math without having to parse
the display string.

- New ObfuscatedCount value class with int count + String display.
- aggregateCount now returns Optional<ObfuscatedCount>; the
  below-threshold case carries threshold-1 as the numeric so the
  bar still renders at a visible height while the display reads
  "< threshold".
- randomize returns ObfuscatedCount carrying both the variance-
  randomized numeric and the formatted display string.
- obfuscatedCrossCount writes ObfuscatedCount into the per-value
  Map<String, Object> for CATEGORICAL_CROSS_COUNT and CONTINUOUS_CROSS_COUNT,
  serialized by Jackson as {count, display}.
- COUNT and CROSS_COUNT (single-value and per-concept widgets used
  outside the viz path) keep their plain-string contracts by mapping
  ObfuscatedCount::display at the call sites.
- Same refactor applied to AggregateDataSharingResourceRSV3 to keep
  the V3 parallel implementation consistent.
- Methods made package-private to enable direct unit tests.
- New ObfuscatedCountShapeTest covers the threshold, variance,
  zero-treatment, and non-numeric cases for both V1 and V3.
…JSON-shape test

Polish pass addressing self-review items on the prior commit:

- ObfuscatedCount.ofInt(n) static factory replaces the inline
  new ObfuscatedCount(n, Integer.toString(n)) pattern. The
  "how do you wrap a plain int" rule belongs on the type, not
  scattered across callers.
- Rename aggregateCount → applyThresholdFloor. The method's
  semantic is "if below threshold, return the floored display"
  — the old name suggested unrelated aggregation. Two overloads:
  int (the natural form, used by cross-counts) and String (kept
  for COUNT/CROSS_COUNT call sites that hold the value as JSON
  text). The Integer-vs-String aggregateCountHelper is gone.
- randomize now takes int instead of String. The Integer.parseInt
  concern moves to the call site that actually has the string.
- obfuscatedCrossCount renamed to obfuscateCrossCount, no longer
  mutates a Map<String, Map<String, Object>>; instead returns a
  fresh Map<String, Map<String, ObfuscatedCount>>. Output is
  strongly typed; the Object loosening that the original needed
  for in-place mutation is gone.
- toInt(Object) helper handles the Integer-or-String input that
  Jackson hands us, inline at the one call site that needs it.
- Comment on the threshold-1 numeric component explains why we
  pick that value (Plotly bar visible at the threshold ceiling)
  rather than 0 or threshold/2.
- New ObfuscatedCountShapeTest covers the factory and pins the
  JSON wire shape via a round-trip test. The test failed
  immediately on the first run — Jackson couldn't serialize
  the manual class without @JsonCreator/@JsonProperty
  annotations. Without this test the cross-repo contract would
  have silently broken in production. Annotations now in place.
- Same refactor applied to AggregateDataSharingResourceRSV3 to
  keep the parallel implementations consistent.
- Rewrite the stale processContinuousCrossCounts javadoc on V1 and V3.
  It described an "if obfuscation is not required, return data in
  string format" branch that no longer exists — both arms now
  unconditionally obfuscate.
- Widen toInt from Integer/String to Number/String. The previous
  innerValue.toString() shape silently accepted Long/Double (Jackson
  picks Long for counts >= 2^31 and Double when JSON includes
  decimals). Narrowing to Integer would have crashed on those inputs;
  Number covers the full set without re-introducing the parse hop.
  Patient counts realistically don't hit Long range today, but the
  type at the boundary isn't pinned so don't assume.
- toInt promoted to package-private + new test pinning all of
  Integer/Long/Double/Short/String to confirm the widening.
…ALS-10714)

Add a nullable variance field to ObfuscatedCount: the half-width of the
uncertainty band around count, rendered by consumers as
[max(0, count - variance), count + variance]. randomize() carries the
configured variance; applyThresholdFloor() now encodes below-threshold
values as count 0 with variance threshold-1 (was count threshold-1) so
the band rule reproduces the 0..threshold-1 display uniformly. Exact
(authorized) values keep a null variance.
@JamesPeck JamesPeck requested review from Gcolon021 and ramari16 June 5, 2026 05:06
@JamesPeck JamesPeck added the enhancement New feature or request label Jun 5, 2026
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@JamesPeck, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 20 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d86e0434-b99b-4bc8-8472-a7908faffc53

📥 Commits

Reviewing files that changed from the base of the PR and between d2109df and d7ed9ac.

📒 Files selected for processing (2)
  • pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRS.java
  • pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRSV3.java
📝 Walkthrough

Walkthrough

This PR introduces ObfuscatedCount, a typed value class combining count, display, and optional variance, then refactors COUNT and cross-count obfuscation across V1 and V3 resources to use shared threshold-floor and variance-randomization helpers instead of string-based mutations, with comprehensive test coverage.

Changes

Obfuscation refactoring

Layer / File(s) Summary
ObfuscatedCount value type
src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java
New immutable data class with count, display, and optional variance fields; Jackson-annotated for serialization; includes ofInt factory for exact values and standard equals/hashCode/toString.
Shared typed obfuscation helpers
src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRS.java, AggregateDataSharingResourceRSV3.java
Both V1 and V3 implement identical helpers: applyThresholdFloor(int/String) returns Optional<ObfuscatedCount> for privacy floor; randomize(int, variance) computes and returns ObfuscatedCount; obfuscateCrossCount(...) builds fresh nested map of obfuscated counts; toInt(Object) safely converts JSON numbers/strings.
V1 resource refactoring
src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRS.java
COUNT path now applies applyThresholdFloor and randomize; cross-count maps use per-key threshold logic and variance randomization; continuous and categorical variants build and serialize obfuscated maps via obfuscateCrossCount, replacing prior in-place string mutations.
V3 resource refactoring
src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRSV3.java
Parallel refactoring to V1: COUNT and cross-count variants integrate identical threshold-floor and randomization helpers; continuous cross-count Javadoc updated to document binning/fallback behavior; old string-based obfuscation methods removed.
Test validation
src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java
Comprehensive test suite: threshold-floor contracts (below-threshold returns 0 with "< threshold" display; at/above returns empty); variance formatting and numeric conversion; JSON serialization shape and round-trip contracts; parallel V1/V3 behavior assertions.

Sequence Diagram(s)

Not applicable. This change refactors internal obfuscation logic without introducing new inter-component flows or user-facing interactions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit refactors with care and delight,
Trading mutations for types, oh what a sight!
ThresholdFloors whisper, ObfuscatedCounts stand,
Both V1 and V3 now beautifully planned! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: introducing ObfuscatedCount return types for cross-count API responses.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ALS-10714-obf-in-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java (1)

136-140: ⚡ Quick win

Use JSON-structure assertions instead of exact serialized string order.

Line 140, Line 155, and Line 158 currently make field order load-bearing despite the comment saying it isn’t. This can create brittle failures across mapper/version changes.

Proposed test-hardening diff
@@
-        String json = mapper.writeValueAsString(value);
-
-        // Exact-match assertion on serialized form. Field order is conventional for clarity but not load-bearing
-        // for consumers; we accept either ordering by JSON equality below.
-        assertEquals("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}", json);
+        String json = mapper.writeValueAsString(value);
+        assertEquals(mapper.readTree("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}"), mapper.readTree(json));
@@
-        assertEquals("{\"count\":45000,\"display\":\"45000\",\"variance\":null}", mapper.writeValueAsString(ObfuscatedCount.ofInt(45000)));
+        assertEquals(
+                mapper.readTree("{\"count\":45000,\"display\":\"45000\",\"variance\":null}"),
+                mapper.readTree(mapper.writeValueAsString(ObfuscatedCount.ofInt(45000))));
@@
-        assertEquals("{\"count\":0,\"display\":\"< 10\",\"variance\":9}", mapper.writeValueAsString(belowThreshold));
+        assertEquals(
+                mapper.readTree("{\"count\":0,\"display\":\"< 10\",\"variance\":9}"),
+                mapper.readTree(mapper.writeValueAsString(belowThreshold)));

Also applies to: 155-158

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java`
around lines 136 - 140, The test in ObfuscatedCountShapeTest is asserting the
exact serialized JSON string which makes field order load-bearing; instead
deserialize the produced json (from mapper.writeValueAsString(value)) into a
JSON structure (e.g., JsonNode or Map) and assert structural equality to an
expected JSON structure so field order is ignored—replace the
assertEquals("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}", json) (and
the similar assertions at the other locations) with an assertion that compares
mapper.readTree(json) (or equivalent) to an expected JsonNode built from the
same fields.
pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java (1)

23-32: 💤 Low value

Constructor accepts null display without validation.

The constructor doesn't validate that display is non-null. While ofInt always passes a valid string, direct constructor calls (e.g., from deserialization of malformed JSON or from applyThresholdFloor/randomize in the resource classes) could theoretically pass null. The equals method uses Objects.equals which handles null gracefully, so this won't crash, but a null display would be semantically invalid for this domain object.

Given that all current call sites in V1/V3 construct with non-null display strings, this is low risk.

🛡️ Optional: Add null check for display
 `@JsonCreator`
 public ObfuscatedCount(
     `@JsonProperty`("count") int count,
     `@JsonProperty`("display") String display,
     `@JsonProperty`("variance") Integer variance
 ) {
     this.count = count;
-    this.display = display;
+    this.display = Objects.requireNonNull(display, "display must not be null");
     this.variance = variance;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java`
around lines 23 - 32, The ObfuscatedCount constructor currently allows a null
display; add a non-null validation in the ObfuscatedCount(`@JsonProperty`("count")
int count, `@JsonProperty`("display") String display, `@JsonProperty`("variance")
Integer variance) constructor (used by deserialization and called from
ofInt/applyThresholdFloor/randomize) and throw an explicit exception (e.g.,
NullPointerException or IllegalArgumentException) with a clear message when
display is null to enforce the invariant that display must be non-null.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRS.java`:
- Line 533: The variable obfuscatedParents computed via
obfuscatedKeys.stream().flatMap(this::generateParents).collect(Collectors.toSet())
is never used; remove that dead computation from AggregateDataSharingResourceRS
(delete the obfuscatedParents declaration and stream call) so no unused local
remains, or if intended behavior was to include parent keys, instead merge the
result into obfuscatedKeys (e.g., addAll on obfuscatedKeys) — locate the code
around obfuscatedKeys and generateParents to apply the chosen fix.
- Around line 367-369: The code double-parses entityString which can cause an
uncaught NumberFormatException in the orElseGet lambda; modify the logic in
AggregateDataSharingResourceRS so entityString is parsed exactly once (or
parsing is attempted and its result cached) before calling
applyThresholdFloor(entityString) and before invoking
randomize/Integer.parseInt, e.g., attempt Integer.parseInt(entityString) into a
local int (or try parse and record failure), pass that value (or the
parse-failure state) into applyThresholdFloor/obfuscation logic (or fall back to
randomize using the cached int), or if prefered handle NumberFormatException
inside the orElseGet; ensure references include applyThresholdFloor,
ObfuscatedCount.display, randomize, entityString, and requestVariance so the
parse result is reused and no second parse can throw.

In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRSV3.java`:
- Around line 343-345: The code re-parses entityString in the orElseGet
fallback, causing an uncaught NumberFormatException when COUNT is non-numeric;
fix by parsing entityString once into a local int (inside a try/catch) and reuse
that parsed value for the randomize fallback used by the existing expression
involving applyThresholdFloor and randomize (refer to
applyThresholdFloor(String), ObfuscatedCount::display, and randomize(...)).
Specifically, call applyThresholdFloor(entityString) first; if it returns empty,
use the already-parsed int for randomize; if parsing fails catch
NumberFormatException, log/handle it and return the safe fallback (e.g., the
original entityString or the applyThresholdFloor result) instead of letting
parse propagate.

---

Nitpick comments:
In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java`:
- Around line 23-32: The ObfuscatedCount constructor currently allows a null
display; add a non-null validation in the ObfuscatedCount(`@JsonProperty`("count")
int count, `@JsonProperty`("display") String display, `@JsonProperty`("variance")
Integer variance) constructor (used by deserialization and called from
ofInt/applyThresholdFloor/randomize) and throw an explicit exception (e.g.,
NullPointerException or IllegalArgumentException) with a clear message when
display is null to enforce the invariant that display must be non-null.

In
`@pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java`:
- Around line 136-140: The test in ObfuscatedCountShapeTest is asserting the
exact serialized JSON string which makes field order load-bearing; instead
deserialize the produced json (from mapper.writeValueAsString(value)) into a
JSON structure (e.g., JsonNode or Map) and assert structural equality to an
expected JSON structure so field order is ignored—replace the
assertEquals("{\"count\":222,\"display\":\"222 ±3\",\"variance\":3}", json) (and
the similar assertions at the other locations) with an assertion that compares
mapper.readTree(json) (or equivalent) to an expected JsonNode built from the
same fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: bba9bde8-b476-456d-b7c6-15ec5c8700fe

📥 Commits

Reviewing files that changed from the base of the PR and between 60b9ac5 and d2109df.

📒 Files selected for processing (4)
  • pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRS.java
  • pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/AggregateDataSharingResourceRSV3.java
  • pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/main/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCount.java
  • pic-sure-resources/pic-sure-aggregate-data-sharing-resource/src/test/java/edu/harvard/hms/dbmi/avillach/ObfuscatedCountShapeTest.java

…ode (ALS-10714)

Parse the COUNT entity string once so a non-numeric upstream response
falls through unchanged instead of throwing an uncaught
NumberFormatException in the orElseGet re-parse. Remove the unused
obfuscatedParents computation and the generateParents helper it
orphaned (dead since before this branch).
@JamesPeck JamesPeck merged commit e1673f6 into main Jun 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant