-
Notifications
You must be signed in to change notification settings - Fork 7
WIP add e-values support #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks a lot! I think it is better that you also create a small test case to check the FDP and Power of the method on simulated data (ideally FDP < threshold and Power is greater than 0); you can check examples in the test folder. A related thing is that maybe we need a general procedure for creating e-value, since as far as I'm aware there is no such implementation on Python. Note that there are many unorganized things with the knockoff module that I might plan to do a refactoring soon, but I have not found the time yet. That being said, I will do it after the merge of this PR. |
|
Indeed, this PR obviously needs test + example. |
|
Also note that Zhimei Ren has an R implementation here: https://github.com/zhimeir/derandomized_knockoffs_fdr |
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds promising, thx !
| Nguyen et al. (2020) Aggregation of Multiple Knockoffs | ||
| https://arxiv.org/abs/2002.09269 | ||
| To reduce the script runtime it is desirable to increase n_jobs parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the status of this 'examples_not_exhibited' directory ? if these are non-working examples it should be removed ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of my TODOs for the knockoff module: after refactoring knockoffs to the main branch using different example for a fast build (the current one is with n=500, p=1000 and 2500 simulations, hence not really the most friendly example to run).
| print('Done!') | ||
|
|
||
|
|
||
| main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We avoid this structure in examples to improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the all the examples need rework IMO.
| return np.array(pvals) | ||
|
|
||
|
|
||
| def _empirical_eval(test_score, fdr=0.1, offset=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you expose the offset parameter ? I think it should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is shown in the rest of the code (i.e. it is a parameter of knockoff_aggregation etc), should we remove this altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would, as I don't see any use case for changing it. @tbng any opinion on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is done long before, anyway I think it's ok we remove the offset and simple use
| evals_sorted = -np.sort(-evals) # sort in descending order | ||
| selected_index = 2 * n_features | ||
| for i in range(n_features - 1, -1, -1): | ||
| if evals_sorted[i] >= n_features / (fdr * (i + 1)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can avoid the for loop.
| else: | ||
| return -1.0 | ||
|
|
||
| def _ebh_threshold(evals, fdr=0.1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function should have a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should i add a test_utils file to the test section? the utils file is not tested as of now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please.
|
Resurrecting this PR as I'm working to finish it right now. Edit: with @alexblnn |
as discussed with @tbng @bthirion @pneuvial, this PR aims at adding Knockoff aggregation using e-values as described in Ren and Barber 2022 (https://arxiv.org/pdf/2205.15461.pdf)