You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/!\ Don't waste your time rereading. Need to tidy all this up :)
This PR introduces the multiprocessing and xarray/dask uses for raster apply_matrix function, improving memory efficiency and processing speed.
Development
The development of this two backends, I was inspired by the development of the reproject function in geoutils : multiprocessing here and dask here.
Both backends rely on the same logic:
Redirection into one or the other process:
if multiprcessing config given => mp backend
if dask is installed and the input elevation is a chunked RasterAccessor => dask backend
Note that it can't be both simultaneously (ValueError)
Georeferenced tiling to identify source/destination blocks (chunk size) and the link between the two. To know what blocks gives/is needed to compute what blocks, the idea is to:
For each source blocks:
-> get bounds projected
-> apply_matrix of the four corners => polygon (*)
Later, verify intersection with each destination block bounds projected polygon.
(*) As we need to now elevation to compute the real output point after apply_matrix, the tricks is to:
-> find z_min and z_max on each source blocs (_wrapper_multiproc_zmin_zmax_per_blocks for multiprocess case + _delayed_zmin_zmax dask case)
-> compute apply_matrix(corners_x, corners_y, z_min)
-> compute apply_matrix(corners_x, corners_y, z_max)
-> destination bounds = intersection of the two polygons to obtain the maximum aera
Some others things are done for this step/in the function _build_geotiling_and_meta_apply_matrix , but this is the tricky part.
Launch Tasks/Building the task graph
Multi process : for each destination block, a task is submitted to the cluster via mp_config.cluster.launch_task(), calling _wrapper_multiproc_apply_matrix_per_block(). This function:
Extracts the source blocks that intersect this destination block
Merges them into a single rectangular array (comb_src_arr)
Calls apply_matrix() on that sub-array ..... TO EXPLAIN
Dask : For each destination block, a task (node) is created for _delayed_apply_matrix_per_block() calling _apply_matrix_per_block() that do the same as _wrapper_multiproc_apply_matrix_per_block().
Write/Construct results
Multi process : run _write_multiproc_result() with the results of each task => file that will be read => Raster
Dask : Constructs the concatenation of the results => array that will be use in gu.raster.xr_accessor.RasterAccessor.from_array() => RasterAccessor
After replacing RegularGridInterpolator by Raster.interp_points() : tz = dem_rst.interp_points(points=(tx, ty), method=resampling)["z"]
I saw that some of my tests don't pass anymore
After some checks, I saw that the results of the tz can be really different than the tz = z_interp((ty, tx)), even >~ 1.5 !
Here an example: pytest tests/test_coreg/test_base.py::TestAffineManipulation::test_apply_matrix_dask_multi[None-True-None-True-0-matrix4-real]
I have another error : "ValueError: cannot convert float NaN to integer" when tx ou ty contain nan values. Don't know how to change inputs properly without making more tz errors
Exemple : pytest tests/test_coreg/test_base.py::TestAffineManipulation::test_apply_matrix_dask_multi[None-False-quintic-True-0-matrix4-fake]
Opti apply_matrix_rst()
I started to move (*) the reproject_horizontal_shift_samecrs in _apply_matrix_rst(), but as this one is called in apply process here, I introduced a wrapper to do this here
(*) Need to optimize this part when all the problem will be resolved (also in apply_matrix())
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
/!\ Don't waste your time rereading. Need to tidy all this up :)
This PR introduces the multiprocessing and xarray/dask uses for raster
apply_matrixfunction, improving memory efficiency and processing speed.Development
The development of this two backends, I was inspired by the development of the reproject function in geoutils : multiprocessing here and dask here.
Both backends rely on the same logic:
Note that it can't be both simultaneously (
ValueError)For each source blocks:
-> get bounds projected
-> apply_matrix of the four corners => polygon (*)
Later, verify intersection with each destination block bounds projected polygon.
(*) As we need to now elevation to compute the real output point after apply_matrix, the tricks is to:
-> find z_min and z_max on each source blocs (
_wrapper_multiproc_zmin_zmax_per_blocksfor multiprocess case +_delayed_zmin_zmaxdask case)-> compute apply_matrix(corners_x, corners_y, z_min)
-> compute apply_matrix(corners_x, corners_y, z_max)
-> destination bounds = intersection of the two polygons to obtain the maximum aera
Some others things are done for this step/in the function
_build_geotiling_and_meta_apply_matrix, but this is the tricky part.Multi process : for each destination block, a task is submitted to the cluster via
mp_config.cluster.launch_task(), calling_wrapper_multiproc_apply_matrix_per_block(). This function:comb_src_arr)Dask : For each destination block, a task (node) is created for
_delayed_apply_matrix_per_block()calling_apply_matrix_per_block()that do the same as_wrapper_multiproc_apply_matrix_per_block().Multi process : run
_write_multiproc_result()with the results of each task => file that will be read =>RasterDask : Constructs the concatenation of the results => array that will be use in
gu.raster.xr_accessor.RasterAccessor.from_array()=>RasterAccessorCore functions
Tests
Apply Matrix
Parameters
See: docstring
Input data:
As inputs size 10x12 (not square to have not alwars the same blocks size), chunk_size varies with:
Tests
Compute z min and z max
Input data:
Differents Chunks Size :
Tests:
Old problem corrected
Remaining problems
Question dev here
Dev (easy)