Skip to content
This repository was archived by the owner on Oct 1, 2020. It is now read-only.

File based UDF support

Florian Lahn edited this page Jun 29, 2018 · 1 revision

Starting with version 0.2.5 we integrated a first prototypical approach of a file-based UDF. "File-based" in the regard of exchanging data between the back-end and the UDF unit.

The scripts written in R are utilizing a UDF client library [1] that takes care of loading the incoming raster data into a stars object. The data processing will be done by using st_apply of the stars package, meaning that the user can define a function that will be applied to the data collection.

Usage

You can find also a description on how to set up the R script at [1]. Accordingly an example script can look like the following:

cat("Loading openeo.R.UDF package\n")
library(openeo.R.UDF)

cat("Defining median function\n")
myfunc = function(obj) {
  median(obj)
}

cat("Running UDF function\n")
# drop dimension is time = 4, input is default x,y,b,t,raster, function
tryCatch({
  run_UDF(drop_dim = 4, function_name = myfunc)
}, error = function(e) {
  cat(paste("UDF-EXEC-ERROR:",e))
  stop(e)
})

cat("Finished running UDF function\n")

The script will calculate the median values at each pixel by applying the function on the temporal aggregation for the given raster collection. This means that we perform a time reduce operation (dimension 4 equals time).

The important parts are • Loading the library • Definine the function as a variable • Calling run_UDF with the specific set of parameters

To perform the UDF you need to integrate it as an offered process. This means that there are several UDF functions defined which can then be used to execute a user defined function, e.g. aggreagte_time. An example process graph can look like the following:

{
  "process_id": "aggregate_time",
  "args": {
    "imagery": {
      "process_id": "filter_daterange",
      "args": {
        "imagery": {
          "product_id": "sentinel2_subset"
        },
        "from": "2017-05-01",
        "to": "2017-05-30"
      }
    },
    "script": "/udf/script.R"
  }
} 

Further development

The file-based approach was a first simple approach, that revealed to already be challenging in terms of data preparation, metadata exchange and executin the UDF. However, we will catch up to the recent development about UDFs in the openeo project, by developing a R UDF webservice as suggested by openeo-udf [3]. This will shift the tasks of data preparation and serialization towards back-end, leaving the webservice with the task of merely processing the data that was passed / streamed into the webservice in a either native byte object or a general JSON representation.

References:

Clone this wiki locally