Skip to content

Conversation

@AntoinePassemiers
Copy link
Contributor

"recovery_2" metric:

  • Found 3 critical bugs, so I reimplemented it from scratch.
  • Based on a consistency score, defined based on the spread between maximum and minimum Spearman correlation coefficients across groups (donors).

"anchor_regression" metric:

  • Removed it.

"regression_3" metric:

  • Added a new "regression_3" metric to replace "anchor_regression". It is based on residual correlations. Core idea, in short: after fitting gene expression on the expression of selected TFs, the less the residuals are correlated with the non-selected TFs, the better.

"vc" metric:

  • Fixed the same issue as in the "sem" metric: the inferred GRN was accidentally shuffled
  • Predict perturbations (x_pert - x_control), rather than the expression of perturbated samples (x_pert)
  • Initialize the NN and baseline NN with the exact same parameters (except the GRN weights)
  • Use perturbation embeddings as linear operators to be applied in the latent space
  • Fixed mathematically wrong implementation choices in GRNLayer
  • Use stratified group K-fold for CV, where groups are (perturbation, cell type) pairs
  • Improved the NN architecture and added strong regularization
  • Keep track of the best solution
  • IMPORTANT NOTE: I tested the metric on the small OP dataset, predictive performance was really low and final scores very noisy. I attribute this to the small number of samples, and hope performance will scale with the dataset size.

@janursa janursa force-pushed the jalil branch 6 times, most recently from 66b19dd to 9903ec4 Compare October 28, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants