Skip to content

Conversation

@alexblnn
Copy link

@alexblnn alexblnn commented Jul 28, 2022

as discussed with @tbng @bthirion @pneuvial, this PR aims at adding Knockoff aggregation using e-values as described in Ren and Barber 2022 (https://arxiv.org/pdf/2205.15461.pdf)

@tbng
Copy link
Collaborator

tbng commented Jul 28, 2022

Thanks a lot! I think it is better that you also create a small test case to check the FDP and Power of the method on simulated data (ideally FDP < threshold and Power is greater than 0); you can check examples in the test folder.

A related thing is that maybe we need a general procedure for creating e-value, since as far as I'm aware there is no such implementation on Python.

Note that there are many unorganized things with the knockoff module that I might plan to do a refactoring soon, but I have not found the time yet. That being said, I will do it after the merge of this PR.

@bthirion
Copy link
Collaborator

Indeed, this PR obviously needs test + example.

@tbng
Copy link
Collaborator

tbng commented Jul 28, 2022

Also note that Zhimei Ren has an R implementation here: https://github.com/zhimeir/derandomized_knockoffs_fdr

Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds promising, thx !

Nguyen et al. (2020) Aggregation of Multiple Knockoffs
https://arxiv.org/abs/2002.09269
To reduce the script runtime it is desirable to increase n_jobs parameter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the status of this 'examples_not_exhibited' directory ? if these are non-working examples it should be removed ;-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of my TODOs for the knockoff module: after refactoring knockoffs to the main branch using different example for a fast build (the current one is with n=500, p=1000 and 2500 simulations, hence not really the most friendly example to run).

print('Done!')


main()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid this structure in examples to improve readability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the all the examples need rework IMO.

return np.array(pvals)


def _empirical_eval(test_score, fdr=0.1, offset=1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you expose the offset parameter ? I think it should be fixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is shown in the rest of the code (i.e. it is a parameter of knockoff_aggregation etc), should we remove this altogether?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, as I don't see any use case for changing it. @tbng any opinion on this ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is done long before, anyway I think it's ok we remove the offset and simple use $1 + num W_j \leq -W_k$ in the numerator, as written in the paper.

evals_sorted = -np.sort(-evals) # sort in descending order
selected_index = 2 * n_features
for i in range(n_features - 1, -1, -1):
if evals_sorted[i] >= n_features / (fdr * (i + 1)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can avoid the for loop.

else:
return -1.0

def _ebh_threshold(evals, fdr=0.1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should have a test.

Copy link
Author

@alexblnn alexblnn Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i add a test_utils file to the test section? the utils file is not tested as of now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

@tbng
Copy link
Collaborator

tbng commented Sep 9, 2022

Resurrecting this PR as I'm working to finish it right now. Edit: with @alexblnn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants