#944 longitudinal normalization #958

agerardy · 2025-10-07T13:42:57Z

PR Checklist

This comment contains a description of changes (with reason)
Referenced issue is linked
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes
#944
This PR implements normalization support for 3D EHRData objects. The implementation enables all existing normalization functions to work with longitudinal data with shape (n_obs, n_var, n_timestamps) but maintains backward compatibility with 2D data.

Technical details
Treats .R as a named layer with 3D structure. Uses helper functions (_get_target_layer, _set_target_layer, and normalize_3d_data, _normalize_2d_data) to avoid code duplication.
Each variable is processed independently by flattening the time dimension (n_obs x n_timestamps), applying the sklearn normalization function, then reshaping to 3D.

Added tests for the new functions, including group functionality and NaN cases

Examples:

edata = ed.dt.ehrdata_blobs(n_observations=100, base_timepoints=24, cluster_std=0.5, n_centers=3, seasonality=True, 
    time_shifts=True, variable_length=False)

# standard scaling
ep.pp.scale_norm(edata)

# log transformation
ep.pp.offset_negative_values(edata)
ep.pp.log_norm(edata)

…omments for affected functions

for more information, see https://pre-commit.ci

eroell · 2025-10-07T19:33:49Z

ehrapy/preprocessing/_normalization.py

-    if group_key is None:
-        var_values = scale_func(var_values)
-
+    if hasattr(edata, "R") and edata.R is not None and edata.R.ndim == 3:


If the edata object has R, then it will be used regardless of what the layer argument specified.

This line is a good start to investigate and set a stopping point with a debugger, to see what branch of the if/else statement is actually entered by the code, and whether it matches what you think it should :)

@agerardy this hasn't been addressed right? Eljas first part of the comment.

I've moved this logic to its own function _get_target_layer and changed it to check for layers first.

if layer is None: if hasattr(edata, "R") and edata.R is not None: return edata.R, "R" else: return edata.X, "X" else: return edata.layers[layer], layer

I hope this is how it needs to work

… maxabs_norm and robust_scale_norm

…ehrapy into 944-longitudinal-normalization

for more information, see https://pre-commit.ci

…oved old 3d tests that only raised valueErrors

for more information, see https://pre-commit.ci

…d more tests

for more information, see https://pre-commit.ci

Zethson

Thank you! Already looks pretty good.

Many of my comments are repetitive so I stopped repeating them after some time 😄
Many of your tests have tons of useless comments. Let the code speak for itself and clean up any LLM leftovers, please.
Please also follow the comments that I make in Öyku's PRs. One of them is to improve the PR description and add some usage examples.

Just a first quick pass. I'll let @eroell have a go and then I might have a look again.

Thanks!

ehrapy/preprocessing/_normalization.py

Zethson · 2025-10-20T10:27:55Z

ehrapy/preprocessing/_normalization.py

        >>> edata = ed.dt.mimic_2()
        >>> edata_norm = ep.pp.scale_norm(edata, copy=True)
+        >>> # Works automatically with both 2D and 3D data
+        >>> edata_3d_norm = ep.pp.scale_norm(edata_3d, copy=True)


Let's keep it simple and not distinguish between 2D and 3D. We should rather finally have a proper 3D test dataset @eroell .

Gathering a few comments from other places here, to make it less dispersed
a) Showing some output is very helpful, see e.g. here
b) One test of 2D or 3D is enough; Can you use a 3D here, ideally the physionet2012 dataset?
c) edata_3d variable would never have been introduced

I'll just reply to this one and mark the duplicates resolved :) I've written the examples with fake numbers for now, but will try to get it actually running with the physionet2012 dataset. it certainly looks better but doesnt work yet

ehrapy/preprocessing/_normalization.py

tests/conftest.py

… properly handle NaN values

for more information, see https://pre-commit.ci

tests/preprocessing/test_normalization.py

eroell · 2025-10-20T15:02:33Z

tests/preprocessing/test_normalization.py

    assert np.array_equal(expected_adata.X, ep.pp.log_norm(to_normalize_adata, copy=True).X)
+
+
+def test_scale_norm_3d(edata_blob_small_3d):


Every changed function should be tested - are you preferring to make one classic example here, which you then expand to every other normalization function once we have iterated, or create them now already? Both options are OK

I'm not sure what you mean by this. I have written a 3D test function for every normalization function?

ehrapy/preprocessing/_normalization.py

agerardy added 4 commits September 25, 2025 17:47

changed _scale_func_group to also work with 3D objects, updated doc c…

0623b9b

…omments for affected functions

added 3d object fixture to conftest

9ef4818

attempted 3D version of scale_norm

4d5873c

test for scale_norm

0247e5a

agerardy linked an issue Oct 7, 2025 that may be closed by this pull request

Longitudinal normalization #944

Open

14 tasks

[pre-commit.ci] auto fixes from pre-commit.com hooks

f579bc6

for more information, see https://pre-commit.ci

eroell reviewed Oct 7, 2025

View reviewed changes

agerardy and others added 5 commits October 9, 2025 14:40

fixed scale_norm and added 3D functionality and tests for minmax_norm…

7fd87aa

… maxabs_norm and robust_scale_norm

Merge branch '944-longitudinal-normalization' of github.com:theislab/…

a6117ad

…ehrapy into 944-longitudinal-normalization

[pre-commit.ci] auto fixes from pre-commit.com hooks

07d187c

for more information, see https://pre-commit.ci

added 3D functionality and tests for all normalization functions. rem…

2260102

…oved old 3d tests that only raised valueErrors

[pre-commit.ci] auto fixes from pre-commit.com hooks

fefda75

for more information, see https://pre-commit.ci

Zethson mentioned this pull request Oct 16, 2025

norm axis #913

Closed

agerardy and others added 5 commits October 20, 2025 11:42

updated normalization to correctly work with selected variables. adde…

799e84f

…d more tests

Merge branch 'main' into 944-longitudinal-normalization

654d355

[pre-commit.ci] auto fixes from pre-commit.com hooks

e7887d8

for more information, see https://pre-commit.ci

fixed small 3d fixture not returning anything

155a72f

minor comment edits

0efca8a

agerardy marked this pull request as ready for review October 20, 2025 10:16

agerardy requested a review from Zethson October 20, 2025 10:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

cbb07d6

for more information, see https://pre-commit.ci

Zethson requested changes Oct 20, 2025

View reviewed changes

agerardy and others added 5 commits October 20, 2025 14:16

Changed logic to work with layers and R as just a layer

cd26ac2

removed unnecessary comments and a nonfunctional test

1f3c554

3D normalization tests now work with edata_blobs_timeseries_small and…

f7e6bfa

… properly handle NaN values

removed unmecessary copy and fixed docstrings

da9cc22

[pre-commit.ci] auto fixes from pre-commit.com hooks

82bd238

for more information, see https://pre-commit.ci