-
Notifications
You must be signed in to change notification settings - Fork 3
Debug/default value error #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Okay, I think I can replicate now. It happens when values are returned in windows. Working on a fix. |
7d423cb to
a544afc
Compare
|
@mukamel-lab, I think this last commit should solves the problem. If it indeed does I will merge this and create a new release to pypi and the conda channel. |
I've tried out the new branch |
…tion outputs the same as the cp.nanmean on the full matrix
|
@mukamel-lab, when you create
I added tests that take bigwig files and takes a batch from them with window_size=1 and window_size=128. Then I compare whether Batch(window_size=128) is the same as the Batch(window_size=1) with a nanmean over the window. Are you creating the BigWigDataset (or PytorchBigWigDataset) something like this? from bigwig_loader.dataset import BigWigDataset
import cupy as cp
dataset_with_window = BigWigDataset(
regions_of_interest=merged_intervals,
collection=bigwig_path,
reference_genome_path=reference_genome_path,
sequence_length=2048,
center_bin_to_predict=2048,
window_size=128,
batch_size=32,
batches_per_epoch=10,
maximum_unknown_bases_fraction=0.1,
default_value=cp.nan,
return_batch_objects=True
) |
Sorry to ask, but did you git pull the latest changes on this branch? |
I found that your test (test_get_values_from_intervals_edge_case_1) fails if I use a default value other than 0 or nan. I realized this can be fixed by making sure default_value is np.float32. However, I'm still having trouble with PyTorchBigWigDataset filling in 0 instead of default_value for bins that are not at the right-hand end of the query region. My code looks like this: |
|
This is very helpful. My excuses. I am drilling into this now. |
…ared to the output of pyBigWig with a window function applied afterwards. Parametrized with different default values.
I came up with a fix (using chatGPT! -- I have no experience with cuda programming) which appears to work for me. Please check if this looks okay, and thanks! https://github.com/mukamel-lab/bigwig-loader/tree/mukamel-lab-patch-1 |
…stom_position_sampler and custom_track_sampler options to the dataset objects.
|
@mukamel-lab will look into your cuda code. In the meanwhile I added more tests. There is also a test using the actual PytorchBigWigDataset (instead of the stuff underneath) and I still have problems replicating what you're seeing (also on the bigwig file you send me). To make things easier to replicate I added a "custom_position_sampler" to the dataset objects (which can just be an Iterable of tuples of chromosome, center position like [("chr1", 73674382), ("chr6", 725209)....]. Could you maybe change this test, so it fails (on your file):
Or tell me how what that test tests is not what you mean. I wondered, when you figured out that the "default_value" was incorrectly not a float32 when not 0 or NaN, did you also try cp.float32(cp.nan) as default_value? For which, thanks again. Because what could have happened was that positions in the float32 tensor were being overwritten with 64 bits somehow in a row, leaving the first 32 bits of the 64 bit NaN in each position?? I am just guessing at this point what could be happening. But it's so hard since (unlike before) I am not finding those zeros. |
It works for me now! I think I must have been incorrectly loading the wrong version of the module. I am now able to get the expected behavior from PytorchBigWigDataset. Thank you!! |
|
Great! I was already afraid we were chasing a Heisenbug. |
see comment: #17 (comment)
I can not replicate the behavior where there are zero's in the output matrix where there should not be.