Skip to content

Conversation

@bzamanlooy
Copy link
Collaborator

fix(metrics): correct authenticity calculation to align with formal definition

PR Type

Fix

Short Description

This PR corrects the authenticity metric calculation so it considers all the synthetic samples not only the ones that are the closest one to a real data point.

@coderabbitai
Copy link

coderabbitai bot commented Nov 4, 2025

📝 Walkthrough

Walkthrough

This pull request refactors the authenticity metric calculation in the AlphaPrecision class within the statistical evaluation module. The calculation method shifts from direct distance comparison to a nearest-neighbor-based approach that determines, for each synthetic point, whether the nearest real point is closer to the synthetic point than the synthetic point is to that real point. The implementation removes the previous default centering behavior for the embedding center parameter. Corresponding unit tests are updated with adjusted tolerance values and one modified expected value to reflect the new calculation logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Key areas requiring attention:
    • The new authenticity calculation logic using nearest neighbors: verify correctness of distance comparisons and ensure the boolean array computation aligns with intended metric semantics
    • Impact of removing the default emb_center centering: confirm this doesn't break downstream logic or introduce unintended behavioral changes
    • Test value changes, particularly the authenticity_OC value shift from 0.5102592592592593 to 0.4962962962962963: validate that this reflects the intended new behavior rather than a regression
    • Tolerance relaxation from 1e-8 to 1e-2: assess whether the broader tolerance is justified by inherent algorithmic differences or numerical stability considerations

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies this as a fix to the authenticity logic in metrics, specifically addressing alignment with the paper definition.
Description check ✅ Passed The description provides PR type and a short description explaining the change, but is missing the Tests Added section and Clickup ticket link specified in the template.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bz/fix-alpha-precision-second-attempt

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py (2)

120-120: Minor: Remove unnecessary whitespace change.

This appears to be an unintentional whitespace-only change.


164-171: Consider vectorizing the authenticity calculation for better performance.

The new authenticity logic correctly implements the paper definition by considering all synthetic samples. However, the explicit Python loop can be replaced with vectorized NumPy operations for significantly better performance, especially with large datasets.

Apply this diff to vectorize the calculation:

-        authen = []
-        for syn_idx in range(x_syn.shape[0]):
-            d_real_to_synthetic = closest_real_to_synth_distance[syn_idx]
-            d_real_to_real = real_to_real[closest_real_to_synthetic_idx_list[syn_idx]]
-            is_authentic = d_real_to_real <= d_real_to_synthetic
-            authen.append(is_authentic)
-
-        authenticity = np.mean(authen)
+        authen = real_to_real[closest_real_to_synthetic_idx_list] <= closest_real_to_synth_distance
+        authenticity = np.mean(authen)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2dbe6d and b1d942f.

📒 Files selected for processing (2)
  • src/midst_toolkit/evaluation/quality/synthcity/statistical_eval.py (2 hunks)
  • tests/unit/evaluation/quality/test_alpha_precision.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-code-check
  • GitHub Check: unit-tests
  • GitHub Check: integration-tests
🔇 Additional comments (1)
tests/unit/evaluation/quality/test_alpha_precision.py (1)

57-62: Now let me examine the test file and implementation to understand the context better:

Based on my investigation, I cannot run the test directly due to missing dependencies in the sandbox environment. However, I can verify that your concerns are valid based on testing best practices.

The pytest.approx default tolerance is 1e-6 relative tolerance (one part in a million), and the change from 1e-8 to 1e-2 represents a 1,000,000× relaxation of precision requirements. Absolute and relative tolerances are important for CI across different systems and architectures, with larger tolerances typically needed for single precision vs. double precision arithmetic.

The authenticity_naive change from 0.9905 to 0.7695 (a ~22% drop) is substantial and requires justification. Without access to the implementation changes that prompted this adjustment, I cannot definitively verify whether this tolerance increase is justified.


Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, great catch on this bug. Really silly error on their part. Just a few suggestions.

assert pytest.approx(0.05994074074074074, abs=1e-8) == quality_results["delta_precision_alpha_naive"]
assert pytest.approx(0.005229629629629584, abs=1e-8) == quality_results["delta_coverage_beta_naive"]
assert pytest.approx(0.9905185185185185, abs=1e-8) == quality_results["authenticity_naive"]
assert pytest.approx(0.9732668369518944, abs=1e-2) == quality_results["delta_precision_alpha_OC"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are still very loose. I think we talked about just going back to the original values I had proposed and not worrying about the cluster variations, but let me know if I'm misremembering.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right. It's a mistake on my part.

d_real_to_synthetic = closest_real_to_synth_distance[syn_idx]
d_real_to_real = real_to_real[closest_real_to_synthetic_idx_list[syn_idx]]
is_authentic = d_real_to_real <= d_real_to_synthetic
authen.append(is_authentic)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe be wrong, but I believe we can forgo the for loop here by just doing

is_authentic = closest_real_to_synth_distance < real_to_real[closest_real_to_synthetic_idx_list]
authenticity = np.mean(is_authentic.astype(int))

You may have to do a touch of reshaping on these tensors but multi-indexing on numpy arrays normally works I think.


# See which one is bigger

authen = real_to_real[real_to_synth_args] < real_to_synth
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error brought to you by someone who couldn't be bothered to actually fully understand what the nearest neighbor function does...yikes.

else:
assert pytest.approx(0.9732668369518944, abs=1e-8) == quality_results["delta_precision_alpha_OC"]
assert pytest.approx(0.47238271604938276, abs=1e-8) == quality_results["delta_coverage_beta_OC"]
assert pytest.approx(0.5102592592592593, abs=1e-8) == quality_results["authenticity_OC"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a naive question, but does your change affect the authenticity_OC metric as well? If so, then removing this makes sense. Just want to check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. The process for calculating authenticity is shared between them, only that in OC the data is embedded using the one layer NN.


# Check naive authenticity as the _OC metric depends on a 1-layer NN training
# which may give different results on different architectures
expected_authenticity = 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a scan of the tests (correct me if I'm wrong), none of them have an expected authenticity greater than 0. I think we want to see at least one where we get something non-zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants