Skip to content

TempResultsManager deletes results prematurely if multiple top-level variables point to the same DataBag #226

@ggevay

Description

@ggevay

For example, the following code fails with Flink:

var v = DataBag()
val r = v
v = DataBag()
r

The problem is that the TempResultsManager garbage collects the temp result of the 1. line after it executes the 3. line, but the 4. line then looks for the deleted file.

(A real-life example of a similar code is the inner loop of KMeans, where the last line is similar to the 2. line here. If the solution = ... line would use centroids not from the closure, but as a TempSource, then the problem would occur there.)

A solution would be to translate the val r = v line into a TempSource and an immediate TempSink.

I guess we don't want to fix this for the old backend, but we will close this issue when the backend for the new ir is done, and the problem doesn't occur there.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions