fix(metrics-autheticity): correct authenticity logic to match paper definition. #86

bzamanlooy · 2025-11-04T19:57:02Z

fix(metrics): correct authenticity calculation to align with formal definition

PR Type

Fix

Short Description

This PR corrects the authenticity metric calculation so it considers all the synthetic samples not only the ones that are the closest one to a real data point.

…efinition

coderabbitai · 2025-11-04T20:03:01Z

📝 Walkthrough

Walkthrough

This pull request refactors the authenticity metric calculation in the AlphaPrecision class within the statistical evaluation module. The calculation method shifts from direct distance comparison to a nearest-neighbor-based approach that determines, for each synthetic point, whether the nearest real point is closer to the synthetic point than the synthetic point is to that real point. The implementation removes the previous default centering behavior for the embedding center parameter. Corresponding unit tests are updated with adjusted tolerance values and one modified expected value to reflect the new calculation logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Key areas requiring attention:
- The new authenticity calculation logic using nearest neighbors: verify correctness of distance comparisons and ensure the boolean array computation aligns with intended metric semantics
- Impact of removing the default emb_center centering: confirm this doesn't break downstream logic or introduce unintended behavioral changes
- Test value changes, particularly the authenticity_OC value shift from 0.5102592592592593 to 0.4962962962962963: validate that this reflects the intended new behavior rather than a regression
- Tolerance relaxation from 1e-8 to 1e-2: assess whether the broader tolerance is justified by inherent algorithmic differences or numerical stability considerations

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies this as a fix to the authenticity logic in metrics, specifically addressing alignment with the paper definition.
Description check	✅ Passed	The description provides PR type and a short description explaining the change, but is missing the Tests Added section and Clickup ticket link specified in the template.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bz/fix-alpha-precision-second-attempt

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py (2)
120-120: Minor: Remove unnecessary whitespace change.

This appears to be an unintentional whitespace-only change.

164-171: Consider vectorizing the authenticity calculation for better performance.

The new authenticity logic correctly implements the paper definition by considering all synthetic samples. However, the explicit Python loop can be replaced with vectorized NumPy operations for significantly better performance, especially with large datasets.

Apply this diff to vectorize the calculation:
-        authen = []
-        for syn_idx in range(x_syn.shape[0]):
-            d_real_to_synthetic = closest_real_to_synth_distance[syn_idx]
-            d_real_to_real = real_to_real[closest_real_to_synthetic_idx_list[syn_idx]]
-            is_authentic = d_real_to_real <= d_real_to_synthetic
-            authen.append(is_authentic)
-
-        authenticity = np.mean(authen)
+        authen = real_to_real[closest_real_to_synthetic_idx_list] <= closest_real_to_synth_distance
+        authenticity = np.mean(authen)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2dbe6d and b1d942f.

📒 Files selected for processing (2)

src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py (2 hunks)
tests/unit/evaluation/quality/test_alpha_precision.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run-code-check
GitHub Check: unit-tests
GitHub Check: integration-tests

🔇 Additional comments (1)

tests/unit/evaluation/quality/test_alpha_precision.py (1)

57-62: Now let me examine the test file and implementation to understand the context better:

Based on my investigation, I cannot run the test directly due to missing dependencies in the sandbox environment. However, I can verify that your concerns are valid based on testing best practices.

The pytest.approx default tolerance is 1e-6 relative tolerance (one part in a million), and the change from 1e-8 to 1e-2 represents a 1,000,000× relaxation of precision requirements. Absolute and relative tolerances are important for CI across different systems and architectures, with larger tolerances typically needed for single precision vs. double precision arithmetic.

The authenticity_naive change from 0.9905 to 0.7695 (a ~22% drop) is substantial and requires justification. Without access to the implementation changes that prompted this adjustment, I cannot definitively verify whether this tolerance increase is justified.

src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py

emersodb

Overall, great catch on this bug. Really silly error on their part. Just a few suggestions.

emersodb · 2025-11-06T20:22:57Z

tests/unit/evaluation/quality/test_alpha_precision.py

-        assert pytest.approx(0.05994074074074074, abs=1e-8) == quality_results["delta_precision_alpha_naive"]
-        assert pytest.approx(0.005229629629629584, abs=1e-8) == quality_results["delta_coverage_beta_naive"]
-        assert pytest.approx(0.9905185185185185, abs=1e-8) == quality_results["authenticity_naive"]
+        assert pytest.approx(0.9732668369518944, abs=1e-2) == quality_results["delta_precision_alpha_OC"]


These are still very loose. I think we talked about just going back to the original values I had proposed and not worrying about the cluster variations, but let me know if I'm misremembering.

Yes right. It's a mistake on my part.

emersodb · 2025-11-06T20:27:30Z

src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py

+            d_real_to_synthetic = closest_real_to_synth_distance[syn_idx]
+            d_real_to_real = real_to_real[closest_real_to_synthetic_idx_list[syn_idx]]
+            is_authentic = d_real_to_real <= d_real_to_synthetic
+            authen.append(is_authentic)


I maybe be wrong, but I believe we can forgo the for loop here by just doing

is_authentic = closest_real_to_synth_distance < real_to_real[closest_real_to_synthetic_idx_list] authenticity = np.mean(is_authentic.astype(int))

You may have to do a touch of reshaping on these tensors but multi-indexing on numpy arrays normally works I think.

emersodb · 2025-11-06T20:29:56Z

src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py


-        # See which one is bigger
-
-        authen = real_to_real[real_to_synth_args] < real_to_synth


This error brought to you by someone who couldn't be bothered to actually fully understand what the nearest neighbor function does...yikes.

emersodb · 2025-11-06T20:35:17Z

tests/unit/evaluation/quality/test_alpha_precision.py

    else:
-        assert pytest.approx(0.9732668369518944, abs=1e-8) == quality_results["delta_precision_alpha_OC"]
-        assert pytest.approx(0.47238271604938276, abs=1e-8) == quality_results["delta_coverage_beta_OC"]
-        assert pytest.approx(0.5102592592592593, abs=1e-8) == quality_results["authenticity_OC"]


This is probably a naive question, but does your change affect the authenticity_OC metric as well? If so, then removing this makes sense. Just want to check.

Yes exactly. The process for calculating authenticity is shared between them, only that in OC the data is embedded using the one layer NN.

emersodb · 2025-11-06T20:36:46Z

tests/unit/evaluation/quality/test_autheticity.py

+
+    # Check naive authenticity as the _OC metric depends on a 1-layer NN training
+    # which may give different results on different architectures
+    expected_authenticity = 0.0


Based on a scan of the tests (correct me if I'm wrong), none of them have an expected authenticity greater than 0. I think we want to see at least one where we get something non-zero.

fix(metrics-autheticity): correct authenticity logic to match paper d…

b1d942f

…efinition

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py Outdated Show resolved Hide resolved

bzamanlooy and others added 4 commits November 4, 2025 16:11

Merge branch 'main' into bz/fix-alpha-precision-second-attempt

e1258f3

fixed autheticity calculation and added tests

e9581eb

typo fix

65c57af

Added fix for mismatched synthetic and real dataset sizes

71d4e42

bzamanlooy assigned bzamanlooy and emersodb Nov 6, 2025

emersodb reviewed Nov 6, 2025

View reviewed changes

Behnoosh Zamanlooy added 2 commits November 6, 2025 17:26

Addressed David's Comments

98bad54

minor update

ca6648e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(metrics-autheticity): correct authenticity logic to match paper definition. #86

fix(metrics-autheticity): correct authenticity logic to match paper definition. #86

bzamanlooy commented Nov 4, 2025

Uh oh!

coderabbitai bot commented Nov 4, 2025

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

emersodb left a comment

Uh oh!

emersodb Nov 6, 2025

Uh oh!

bzamanlooy Nov 6, 2025

Uh oh!

emersodb Nov 6, 2025

Uh oh!

emersodb Nov 6, 2025

Uh oh!

emersodb Nov 6, 2025

Uh oh!

bzamanlooy Nov 6, 2025

Uh oh!

emersodb Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		# See which one is bigger

		authen = real_to_real[real_to_synth_args] < real_to_synth

fix(metrics-autheticity): correct authenticity logic to match paper definition. #86

Are you sure you want to change the base?

fix(metrics-autheticity): correct authenticity logic to match paper definition. #86

Conversation

bzamanlooy commented Nov 4, 2025

PR Type

Short Description

Uh oh!

coderabbitai bot commented Nov 4, 2025

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

emersodb left a comment

Choose a reason for hiding this comment

Uh oh!

emersodb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

bzamanlooy Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

emersodb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

emersodb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

emersodb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

bzamanlooy Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

emersodb Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants