Skip to content

[Improvement]: Implement heap-based flush mechanism for SortedPosDeleteWriter to prevent OOM #4166

@slfan1989

Description

@slfan1989

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently, SortedPosDeleteWriter only flushes buffered position deletes based on a record count threshold. There is a TODO comment in the code indicating the need for a heap memory-based flush policy:

// TODO Flush buffer based on the policy that checking whether whole heap memory size exceed the
// threshold.
if (records >= recordsNumThreshold) {
  flushDeletes();
}

Problem: When processing large-scale position deletes, the in-memory buffer in SortedPosDeleteWriter can grow unbounded (if record threshold is set very high or to Long.MAX_VALUE), potentially causing OutOfMemoryError (OOM) issues, especially in memory-constrained environments.

Current behavior:

  • Only flushes when record count reaches recordsNumThreshold
  • No protection against heap memory pressure
  • Can lead to OOM when processing large delete operations

How should we improve?

Implement a heap memory-based flush mechanism with the following features:

1. New table properties:

  • pos-delete.flush.heap.ratio (default: 0.8) - Heap usage ratio threshold to trigger flush
  • pos-delete.flush.records (default: Long.MAX_VALUE) - Record count threshold
  • pos-delete.flush.heap.min-records (default: 1000) - Minimum records before heap-based flush kicks in

2. Implementation details:

  • Add HeapUsageProvider interface to monitor JVM heap usage
  • Implement shouldFlushByHeap() method to check if heap usage exceeds threshold
  • Modify flush logic to: if (records >= recordsNumThreshold || shouldFlushByHeap())
  • Ensure backward compatibility through constructor overloads

3. Safety guards:

  • Prevent frequent small flushes with minimum record count
  • Allow disabling heap-based flush by setting invalid ratio (≤0 or ≥1)
  • Non-intrusive monitoring (no forced GC)

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions