-
Notifications
You must be signed in to change notification settings - Fork 36
#944 longitudinal normalization #958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
| if group_key is None: | ||
| var_values = scale_func(var_values) | ||
|
|
||
| if hasattr(edata, "R") and edata.R is not None and edata.R.ndim == 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the edata object has R, then it will be used regardless of what the layer argument specified.
This line is a good start to investigate and set a stopping point with a debugger, to see what branch of the if/else statement is actually entered by the code, and whether it matches what you think it should :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agerardy this hasn't been addressed right? Eljas first part of the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this logic to its own function _get_target_layer and changed it to check for layers first.
if layer is None:
if hasattr(edata, "R") and edata.R is not None:
return edata.R, "R"
else:
return edata.X, "X"
else:
return edata.layers[layer], layerI hope this is how it needs to work
… maxabs_norm and robust_scale_norm
…ehrapy into 944-longitudinal-normalization
for more information, see https://pre-commit.ci
…oved old 3d tests that only raised valueErrors
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Already looks pretty good.
- Many of my comments are repetitive so I stopped repeating them after some time 😄
- Many of your tests have tons of useless comments. Let the code speak for itself and clean up any LLM leftovers, please.
- Please also follow the comments that I make in Öyku's PRs. One of them is to improve the PR description and add some usage examples.
Just a first quick pass. I'll let @eroell have a go and then I might have a look again.
Thanks!
| >>> edata = ed.dt.mimic_2() | ||
| >>> edata_norm = ep.pp.scale_norm(edata, copy=True) | ||
| >>> # Works automatically with both 2D and 3D data | ||
| >>> edata_3d_norm = ep.pp.scale_norm(edata_3d, copy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it simple and not distinguish between 2D and 3D. We should rather finally have a proper 3D test dataset @eroell .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gathering a few comments from other places here, to make it less dispersed
a) Showing some output is very helpful, see e.g. here
b) One test of 2D or 3D is enough; Can you use a 3D here, ideally the physionet2012 dataset?
c) edata_3d variable would never have been introduced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just reply to this one and mark the duplicates resolved :) I've written the examples with fake numbers for now, but will try to get it actually running with the physionet2012 dataset. it certainly looks better but doesnt work yet
… properly handle NaN values
for more information, see https://pre-commit.ci
| assert np.array_equal(expected_adata.X, ep.pp.log_norm(to_normalize_adata, copy=True).X) | ||
|
|
||
|
|
||
| def test_scale_norm_3d(edata_blob_small_3d): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every changed function should be tested - are you preferring to make one classic example here, which you then expand to every other normalization function once we have iterated, or create them now already? Both options are OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by this. I have written a 3D test function for every normalization function?
PR Checklist
docsis updatedDescription of changes
#944
This PR implements normalization support for 3D EHRData objects. The implementation enables all existing normalization functions to work with longitudinal data with shape
(n_obs, n_var, n_timestamps)but maintains backward compatibility with 2D data.Technical details
Treats .R as a named layer with 3D structure. Uses helper functions (
_get_target_layer,_set_target_layer, andnormalize_3d_data,_normalize_2d_data) to avoid code duplication.Each variable is processed independently by flattening the time dimension
(n_obs x n_timestamps), applying the sklearn normalization function, then reshaping to 3D.Added tests for the new functions, including group functionality and NaN cases
Examples: