Skip to content

Refactor synthetic_data to use variable_elimination with evidence#93

Open
ryan112358 wants to merge 3 commits intomasterfrom
refactor-synthetic-data-ve-evidence-12391338670753120305
Open

Refactor synthetic_data to use variable_elimination with evidence#93
ryan112358 wants to merge 3 commits intomasterfrom
refactor-synthetic-data-ve-evidence-12391338670753120305

Conversation

@ryan112358
Copy link
Owner

Refactor synthetic_data to use variable_elimination with evidence.

The synthetic_data method was manually computing conditional probabilities by projecting onto parent cliques and performing division and cumsum. This refactor replaces that logic with marginal_oracles.variable_elimination, which now supports an evidence parameter (including vector evidence).

To prevent Out-Of-Memory errors when slicing large potentials with vector evidence (which expands the domain by N), the implementation first projects the model onto the relevant clique (target + parents) using self.project. This reduces the factor size before applying the evidence via variable_elimination.

The sampling logic for method='sample' and method='round' is preserved, ensuring that the statistical properties of the synthetic data (especially conditional correlations) are maintained.


PR created automatically by Jules for task 12391338670753120305 started by @ryan112358

This change refactors `MarkovRandomField.synthetic_data` to use `variable_elimination` with the `evidence` parameter for generating conditional distributions.

It implements a hybrid approach to avoid OOM errors on large potentials:
1. First, `self.project` is used to obtain the joint marginal of the target column and its parents. This leverages cached marginals and efficient elimination.
2. Then, `variable_elimination` is called on this projected factor with the vector evidence. This correctly handles slicing and normalization per row.

This simplifies the code by removing manual conditional probability calculations and ensures consistency with `variable_elimination` logic. It also preserves the stratified sampling logic for `method='round'`.

Co-authored-by: ryan112358 <8495634+ryan112358@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

google-labs-jules bot and others added 2 commits February 11, 2026 18:04
This commit refactors `MarkovRandomField.synthetic_data` to use `variable_elimination` directly with the `evidence` parameter, replacing the previous manual conditional probability logic. This is cleaner and leverages existing marginal oracles.

To support this efficiently with vector evidence (large N), `variable_elimination` in `src/mbi/marginal_oracles.py` has been improved to use a lazy slicing strategy. Instead of immediately slicing all potentials (which can explode memory usage to N * prod(domain)), it treats evidence variables as "retained" and delays slicing until it is beneficial (when the factor size reduction outweighs the cost of introducing the N dimension) or unavoidable. It also enforces consistency to ensure sliced and unsliced versions of the same variable are not mixed.

Additionally, `test/test_markov_random_field.py` was updated to correctly pass log-space potentials to `MarkovRandomField`, as `variable_elimination` expects log-potentials. This fixes a latent bug exposed by the refactor.

Co-authored-by: ryan112358 <8495634+ryan112358@users.noreply.github.com>
This commit fixes CI failures in `pytype` for `src/mbi/marginal_oracles.py`.
1. Added missing `import numpy as np`.
2. Updated `variable_elimination` type hint for `evidence` to allow `jax.Array` and `np.ndarray`.
3. Changed `evidence = evidence or {}` to explicit `if evidence is None: evidence = {}` to help pytype's inference engine understand that `evidence` is never None when accessed later.

Co-authored-by: ryan112358 <8495634+ryan112358@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant