Skip to content

[FEATURE] Make currentness of used input files configurable #103

Description

@magnuspalmblad

Feature Description

When tools are chained together that pass along multiple files, with at least one of the same type and format being generated by multiple tools, or provided as workflow input, there is currently no way to specify or require that the most recently generated file of that type and format is used. For example, mass spectrometry imaging workflows may pass along both an imzML metadata file (http://edamontology.org/format_3682) and ibd (http://edamontology.org/format_3839) and perform multiple operations (e.g. alignments, filtering) affecting one or both of these files.

Motivation

I am not requesting a feature that solves this issue for every use case (likely impossible), but in most cases, using the most recent version of the file(s) would make most sense, and it would be nice to see this reflected in the generated workflows presented to the user.

Proposed Solution

This issue may warrant further discussion, but possible solutions would be either making the necessary changes to APE itself, or by filtering the generated workflows afterwards, to keep only those that always use the most recent version of a file. This filter could be optional, but on by default.

APE implementation clarification

Currently, APE supports only hard constraints, and the user can in the configuration specify whether they want to enforce usage of ALL, at least ONE or NONE output per tool. See use_all_generated_data under runtime configuration. The problem occurs when we prefer a second output to be used.

Example

Image

Here is an example of a workflow that generates Mass spectrometry data, imzML metadata file as part of the msiwarp step, but because it is a second output of the tool, it's usage is not enforced by APE.

Note:
I have manually updated the image with the second output, because the "taverna-style visualisation" does not present unused outputs. For that a user can request "data-flow visualisation" instead.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions