Skip to content

Conversation

@filippozacchei
Copy link

Introduction of Weighted Samples in SINDy.fit()

This pull request introduces support for weighted samples in both standard and weak formulations of SINDy.
It enables users to provide per-sample or per-trajectory weights that influence model fitting, improving control over data importance and uncertainty propagation.

Main Changes

_core._expand_sample_weights

  • Introduces a unified weight-expansion utility to validate and broadcast user-provided sample weights.
  • In standard SINDy, weights are expanded to match the total number of samples (n_samples × 1).
  • In weak SINDy, each trajectory’s scalar weight is repeated according to the number of weak test functions (`(
  • (n_trajectories \cdot K) × 1`).
  • Strict validation ensures that:
    • Weak mode accepts exactly one scalar weight per trajectory.
    • Standard mode allows either scalars or per-sample arrays.

weighted_weak_pde_library

  • Implements Generalized Least Squares (GLS) whitening for weak-form regression.

  • When non-uniform spatiotemporal weights are provided, the feature matrix (Θ) and right-hand side (V) are prewhitened using the Cholesky factor of the covariance matrix: $\mathrm{Cov}[V] = V' \Sigma (V')^\top$

    where:

    • Σ represents the pointwise sample weights.
    • V′ are the weak integration matrices for the right-hand side.

Automated Tests

  • Added two new test suites to validate the weighting behavior.

@Jacob-Stevens-Haas Jacob-Stevens-Haas linked an issue Oct 14, 2025 that may be closed by this pull request
@Jacob-Stevens-Haas
Copy link
Member

Jacob-Stevens-Haas commented Oct 14, 2025

Hey, thanks for this PR! Some quick thoughts:

  • Weak form is going through a big refactor, and the WeakPDELibrary will be deprecated in the next release. In it's place, WeakSINDy will be a subclass of _BaseSINDy. This will allow removing a lot of if weak: that is sprinkled throughout the package, as well as enable prediction and simulation of weak models and pave the way for people to implement different basis functions. IDK how you want to proceed - wait to see what weak branch comes up with in a couple weeks, or just continue with strict "1-weight-per-trajectory" for now so it's the same for weak and regular?
  • On a related note, why is GLS whitening desired?
  • I know I steered you away from SampleConcatter before, but maybe calls to _expand_sample_weights should go there, rather than in fit() and score. It might be able to reduce redundant code. It helps if the logic to handle flattening axes only occurs in one place, since otherwise, developers need to keep multiple axis layout conventions in mind.
  • Is STLSQ the only optimizer that supports sample weight?
  • Add type annotations wherever you touch a function signature

@filippozacchei
Copy link
Author

Thanks for your comment.

I would say then to just use sample weights for only standard Sindy then. The GLS whitening is needed for Weak SINDy if we have samples of the same trajectory of different importance, but I would avoid for the time being.

Will look into SampleConcatter and update!

Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay! I can finally begin to prioritize this PR. As I mentioned, it's best to leave Weak for another day, and to put the sample array alignment in the same place as trajectories are already being flattened and combined (SampleConcatter). Other than that, there's a few tweaks to _expand... and to the tests

from sklearn.pipeline import Pipeline
from sklearn.utils.validation import check_is_fitted

set_config(enable_metadata_routing=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave this up to the user. scikit-learn documentation explains that if they're using a pipeline or composite transform, they'll need this.

Comment on lines +299 to +301
self.set_fit_request(sample_weight=True)
self.set_score_request(sample_weight=True)
self.optimizer.set_fit_request(sample_weight=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, leave this up to the user

x_dot=None,
u=None,
feature_names: Optional[list[str]] = None,
sample_weight=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide static type

Names for the input features (e.g. :code:`['x', 'y', 'z']`).
If None, will use :code:`['x0', 'x1', ...]`.
sample_weight : float or array-like of shape (n_samples,), optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow a single float?

Also, shouldn't this be (*n_spatial, n_time)? Or a list of those arrays, for when multiple trajectories are set?

x_dot = concat_sample_axis(x_dot)
self.model = Pipeline(steps)
self.model.fit(x, x_dot)
self.model.fit(x, x_dot, sample_weight=sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the Pipeline and directly call the components. That way, we don't need to rely on metadata routing internally


# --- Fit without explicit sample weights ---
sindy.fit(X_trajs, t=0.1, x_dot=Xdot_trajs)
coef_unweighted = np.copy(sindy.model.named_steps["model"].coef_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use sindy.optimizer.coef_

# --- Fit with sample weights to emphasize trajectory 3 (different system) ---
sample_weight = [np.ones(len(x_a)), np.ones(len(x_a)), 10 * np.ones(len(x_b))]
sindy.fit(X_trajs, t=0.1, x_dot=Xdot_trajs, sample_weight=sample_weight)
coef_weighted = np.copy(sindy.model.named_steps["model"].coef_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines +58 to +64
# 3. Weighted model should bias toward system B coefficients
# since trajectory B had much higher weight
# True systems differ by factor of 2
ratio = np.mean(np.abs(coef_weighted / coef_unweighted))
assert (
ratio > 1.05
), "Weighted coefficients should reflect stronger influence from system B"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need both an error message and description (prefer the error message). Also, make ratio > 1.5, since it should be substantially towards system B

Suggested change
# 3. Weighted model should bias toward system B coefficients
# since trajectory B had much higher weight
# True systems differ by factor of 2
ratio = np.mean(np.abs(coef_weighted / coef_unweighted))
assert (
ratio > 1.05
), "Weighted coefficients should reflect stronger influence from system B"
# True systems differ by factor of 2
ratio = np.mean(np.abs(coef_weighted / coef_unweighted))
fail_msg = "Weighted coefficients should reflect stronger influence from system B"
assert ratio > 1.5, fail_msg

Comment on lines +51 to +56
# --- Assertions ---
# 1. Shapes are consistent
assert coef_weighted.shape == coef_unweighted.shape

# 2. The coefficients must differ when weighting is applied
assert not np.allclose(coef_weighted, coef_unweighted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are implied by the following tests. That's justified if these are more specific. However, shape errors will show explicitly in the below code. Additionally, not np.allcolse is a less-specific version of the next assertion

Comment on lines +70 to +72
assert np.linalg.norm(coef_weighted - 2 * coef_unweighted) < np.linalg.norm(
coef_unweighted
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is correct? It would make more sense to explicitly form coef_a and coef_b, then check the equations that you added as comments.

@Jacob-Stevens-Haas
Copy link
Member

As a heads up, we're changing the default branch from master to main, so you'll need to change the target of the PR to main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce sample_weights inside fit function

2 participants