Skip to content

allen dataset: Create colab example with Allen demo dataset.#139

Open
abdelrahman725 wants to merge 1 commit intosensorium-competition:mainfrom
abdelrahman725:allen-dataset-colab-example
Open

allen dataset: Create colab example with Allen demo dataset.#139
abdelrahman725 wants to merge 1 commit intosensorium-competition:mainfrom
abdelrahman725:allen-dataset-colab-example

Conversation

@abdelrahman725
Copy link
Copy Markdown
Contributor

@abdelrahman725 abdelrahman725 commented Mar 17, 2026

Description

Create a google colab notebook to show case the whole pipline of downloading, exporting, and interpolating allen dataset.

Open In Colab

Follow up

Are there any visualizations you want in the notebook ? If yes, please explain why they are needed, this helps me learn really quickly while building !

Closes #102

@gitnotebooks
Copy link
Copy Markdown

gitnotebooks bot commented Mar 17, 2026

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/sensorium-competition/experanto/pull/139

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@abdelrahman725 abdelrahman725 force-pushed the allen-dataset-colab-example branch from 19b829d to 241f53f Compare March 17, 2026 05:03
Comment thread examples/allen_example.ipynb
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Colab-oriented example notebook demonstrating how to export an Allen dataset via allen-exporter and then use Experanto to load and interpolate multiple modalities.

Changes:

  • Added examples/allen_example.ipynb with Micromamba-based setup for allensdk compatibility in Colab.
  • Added walkthrough code to export an Allen experiment, load it with experanto.Experiment, and run interpolation for all devices or a single device.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/allen_example.ipynb Outdated
Comment thread examples/allen_example.ipynb Outdated
Comment thread examples/allen_example.ipynb Outdated
Comment thread examples/allen_example.ipynb
Comment thread examples/allen_example.ipynb Outdated
@schewskone schewskone self-requested a review March 17, 2026 08:16
@abdelrahman725 abdelrahman725 force-pushed the allen-dataset-colab-example branch from 241f53f to a19b38b Compare March 17, 2026 09:35
@abdelrahman725
Copy link
Copy Markdown
Contributor Author

Updates

Addressed only the relevant Copilot comments

Copy link
Copy Markdown
Collaborator

@schewskone schewskone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR, the colab setup and environment instantiation look good.
Please address the comments to make the example even more intuitive

Comment thread examples/allen_example.ipynb Outdated
Comment on lines +288 to +301
"import numpy as np\n",
"\n",
"# Query 100 time points spread evenly over 10 seconds\n",
"times = np.linspace(0, 10, 100)\n",
"\n",
"\n",
"# device can be: screen, treadmill, eye_tracker, or responses\n",
"screen = exp.interpolate(times, device=\"screen\")\n",
"treadmill = exp.interpolate(times, device=\"treadmill\")\n",
"eye_tracker = exp.interpolate(times, device=\"eye_tracker\")\n",
"responses = exp.interpolate(times, device=\"responses\")\n",
"\n",
"# Change here what device you want to see its interpolated signals\n",
"print(responses)"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A visualization of the results would be great here. Printing the screen values is not very intuitive.
Here is some code I use in one of my notebooks to check the interpolation in a interactive video plot :

# get the experiment class from experanto 
from experanto.experiment import Experiment

# set experiment folder as root
root_folder = '../../data/allen_data/experiment_951980471'

# initialize experiment object
e = Experiment(root_folder)

%load_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML`

times = np.arange(310., 340., 0.5)
video = e.interpolate(times, device="screen")  # shape: (T, C, H, W)

# Clip and convert to uint8 to avoid matplotlib clipping warnings
video = np.clip(video, 0, 255).astype(np.uint8)
video = video.transpose(0, 2, 3, 1)  # (T, H, W, C)

n_frames, height, width, channels = video.shape
print(f"Video shape: {video.shape}")

# Handle grayscale vs color
is_grayscale = (channels == 1)
if is_grayscale:
    video = video[..., 0]  # Now shape: (T, H, W)

fig, ax = plt.subplots()

# Initialize with appropriate cmap
if is_grayscale:
    img = ax.imshow(video[0], cmap='gray', vmin=0, vmax=255)
else:
    img = ax.imshow(video[0])

ax.axis('off')

def update(frame):
    img.set_array(video[frame])
    ax.set_title(f'Frame {frame}')
    return [img]

ani = animation.FuncAnimation(fig, update, frames=n_frames, interval=50, blit=True)

plt.close(fig)
HTML(ani.to_jshtml())`

Copy link
Copy Markdown
Contributor Author

@abdelrahman725 abdelrahman725 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is something wrong with the timestamp-to-frame mapping:

Running only the following:

times = np.arange(310., 340., 0.5) 
video = e.interpolate(times, device="screen")

tries to open this file data/allen_data/experiment_951980471/screen/data/00061.npy, which I can clearly see it doesn't exist. The same happens with the video timestamps 4484., 4500 you mentioned below, it tries to open another non-existent file 09631.npy

Worth mentioning that this issue occurs in both cases:

  • on my local machine, where allensdk is installed from PyPI, which is quite outdated (Nov 2023)
  • on colab notebok, where allensdk is installed from Github and should be new

Hmm, should we adjust the arguments to multi_session_export, since this is the function responsible for exporting the data ?

cache, ids = multi_session_export(1, val_rate= 0.2, subsample_frac=1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdelrahman725 please read the code above.

you need to modify it, specifically the root_folder
Tom just gave an example how he visualised thing - you need to match this example with the experiment created from allen data

Copy link
Copy Markdown
Contributor Author

@abdelrahman725 abdelrahman725 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pollytur

I would never use that code without reading and modfiying it first. root_folder is not related here, I've already loaded the experiment. The problem is only with the timestamps.

As I explained, those specific .npy files shown in the logs are not physically in the /data, I searched for them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, I'm still investigating and working on it. I just wanted to share this just in case @schewskone knows something

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi sorry for the late response, this is my bad. I always used #81 when working with the allen data and forgot that it implements a small change to data loading that is not yet in the main branch. Hence the allen-data is not compatible with the main branch. Sorry for the confusion, I will address this in a small PR later today.

Copy link
Copy Markdown
Contributor Author

@abdelrahman725 abdelrahman725 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting.

I actually solved the problem by modifying _parse_trials():

...
modality = metadata.get("modality", "")
data_file_name = self.root_folder / "data" / f"{image_name}{file_format}"
...

The main issue is that the field image_name inside experiment_951980471/screen/combined_meta.json was actually ignored, and instead it uses the consecutive json file keys (e.g. "00000", "00001") instead. Leading later to load the wrong file path.

While exp.interpolate ran successfully, that introduced other issue with np.clip(video, 0, 255).astype(np.uint8)

Anyway, thanks @schewskone for letting me know, I think I will use your branch until it gets merged to main, for the sake of completing the notebook.

Ah, ok if it's gonna be a small PR today, that's good.

Copy link
Copy Markdown
Contributor

@pollytur pollytur Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdelrahman725 should this thread be resolved now?
If yes - please resolve it

Comment thread examples/allen_example.ipynb Outdated
Comment on lines +296 to +298
"treadmill = exp.interpolate(times, device=\"treadmill\")\n",
"eye_tracker = exp.interpolate(times, device=\"eye_tracker\")\n",
"responses = exp.interpolate(times, device=\"responses\")\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be nice to have one cell per modality besides screen, where you briefly sanity check the shapes and print/plot a few values if it makes sense

Comment thread examples/allen_example.ipynb Outdated
"import numpy as np\n",
"\n",
"# Query 100 time points spread evenly over 10 seconds\n",
"times = np.linspace(0, 10, 100)\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to give options to query over multiple times by defining this cell as a function, i.e.
def interpolate_screen(experiment, timestamps) -> np.ndarray

For the Allen dataset this would be nice as initially there is only images and later on there is videos.
sample video time for experiment 951980471 : times = np.arange(4484., 4500., 0.5)

Comment thread examples/allen_example.ipynb Outdated
Comment thread examples/allen_example.ipynb Outdated
Comment thread examples/allen_example.ipynb
@pollytur
Copy link
Copy Markdown
Contributor

@abdelrahman725 please merge current main in your branch :)
The recent changes from @schewskone might also help with the issue you have seen before

@abdelrahman725 abdelrahman725 force-pushed the allen-dataset-colab-example branch from a19b38b to e1d31b2 Compare April 6, 2026 15:42
@abdelrahman725
Copy link
Copy Markdown
Contributor Author

Updates

Addressed review comments.

I decided to leave eye_tracker and treadmill visualization, because I wanted first to make sure that the current visualizations are what's intended.

@abdelrahman725 abdelrahman725 force-pushed the allen-dataset-colab-example branch from e1d31b2 to 42e7c07 Compare April 6, 2026 21:48
@pollytur
Copy link
Copy Markdown
Contributor

pollytur commented Apr 13, 2026

@abdelrahman725 please take a look on why the tests are failing. It's a bit suspicious cause adding a notebook should not change the tests but they were fine before. Maybe merging current main in there would be enough to solve.

Also please re-request review from me and Tom, when the PR is ready for another review

@pollytur
Copy link
Copy Markdown
Contributor

@abdelrahman725 we found the issue with the tests - its now fixed
Please merge current main in your branch - then tests should be fine and not fail anymore
Sorry about it

@abdelrahman725 abdelrahman725 force-pushed the allen-dataset-colab-example branch from 42e7c07 to 418c801 Compare April 14, 2026 10:43
@pollytur
Copy link
Copy Markdown
Contributor

@abdelrahman725 At the end of the notebook in sections Eye tracker Interpolation and Treadmill Interpolation there are still TODOs - could this please be addressed? do you need our input to address it somehow?

@abdelrahman725
Copy link
Copy Markdown
Contributor Author

@pollytur
I should proceed with the right visualizations for eye_tracker and treadmill after your review and an explanation for this:

please print shapes only and use plots for the rest

@abdelrahman725
Copy link
Copy Markdown
Contributor Author

@abdelrahman725 At the end of the notebook in sections Eye tracker Interpolation and Treadmill Interpolation there are still TODOs - could this please be addressed? do you need our input to address it somehow?

Yes, as I mentioned above, I'm not sure what "please print shapes only and use plots for the rest" means and for which devices ?

@pollytur
Copy link
Copy Markdown
Contributor

@abdelrahman725 At the end of the notebook in sections Eye tracker Interpolation and Treadmill Interpolation there are still TODOs - could this please be addressed? do you need our input to address it somehow?

Yes, as I mentioned above, I'm not sure what "please print shapes only and use plots for the rest" means and for which devices ?

It means that Its nice to have plots of Eye tracker / Treadmill channels activity over time same as you have it above for neurons
At some point it had a bunch of numbers displayed in the cell output - which is not great, hence, I asked to print shapes only
Does it make sense now?

@pollytur
Copy link
Copy Markdown
Contributor

@abdelrahman725 like also here
its a bunch of numbers which make the output huge but not very meaningful.
Would be much better to print some shapes and maybe states / bynber of nans instead of displaying million of numbers
Or am I missing sth?
image

@pollytur
Copy link
Copy Markdown
Contributor

for printing both neurons activity and behaviour activities please use '-o' setting (e.g. ax.plot(times, responses[:, neuron_idx], '-o', linewidth=0.8, color='steelblue')) - this would help to see where the actual points are (not so easy with just a solid line)

@pollytur
Copy link
Copy Markdown
Contributor

pollytur commented Apr 14, 2026

@schewskone you downloaded the Visual Coding 2P Allen dataset right? https://allenswdb.github.io/physiology/ophys/visual-coding/vc2p-background.html

There was no water licking or behaviour in there or was it?

@abdelrahman725 wait for @schewskone reply but I think the explanation you give at the top is for the wrong dataset (e.g. Visual behaviour while it should be Visual Coding one)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the first markdown there is

Cheat sheet
See this helpful [cheat sheet](https://github.com/trendinafrica/TReND-CaMinA/blob/main/notebooks/Zambia25/07-to-10-Mon-toThu-AllenTutorial/Visual%20Coding%202P%20Cheat%20Sheet%20October2018.pdf) from CAMINA school

Cheat sheet for what? Give 1 sentence context what would they found there

Also add brief explanations in most of the markdown sections

  • for example Download and export Allen dataset - what is the difference between download and export?
  • Load experiment - what happens there (e.g. data was downloaded in this folder and we are now loading it with the help of experanto) - yes, I know its trivial from a coding perspective but the goal of example notebooks is to extremely clear and self-explanatory
  • would be nice to have a comment on tiers split and what do these id mean here
Image

@abdelrahman725
Copy link
Copy Markdown
Contributor Author

abdelrahman725 commented Apr 14, 2026

@abdelrahman725 like also here
its a bunch of numbers which make the output huge but not very meaningful.
Would be much better to print some shapes and maybe states / bynber of nans instead of displaying million of numbers
Or am I missing sth?

@pollytur I agree, I can print the shapes in "Interplate from all devices" cell and maybe other minimal states , but I prefer to include other details like nans into each correpsonding modality cell below and before plotting.

@pollytur
Copy link
Copy Markdown
Contributor

@abdelrahman725 like also here
its a bunch of numbers which make the output huge but not very meaningful.
Would be much better to print some shapes and maybe states / number of nans instead of displaying million of numbers
Or am I missing sth?

@pollytur I agree, I can print the shapes in "Interplate from all devices" cell and maybe other minimal states , but I prefer to include other details like nans into each correpsonding modality cell below and before plotting.

Sure, you are welcome to plot some details as suggested above. But this should be minimal meaningful prints instead of plain display of array parts

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen and response interpolation lgtm. Please address the print of the interpolation output and finish todos.

@schewskone
Copy link
Copy Markdown
Collaborator

Visual Behavior is the correct dataset not Visual Coding I'm not entirely sure about the difference but I followed the tutorials from here, which is for the Visual Behavior dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make a Colab example with Allen demo dataset

4 participants