Search before asking
What would you like to be improved?
Currently, SortedPosDeleteWriter only flushes buffered position deletes based on a record count threshold. There is a TODO comment in the code indicating the need for a heap memory-based flush policy:
// TODO Flush buffer based on the policy that checking whether whole heap memory size exceed the
// threshold.
if (records >= recordsNumThreshold) {
flushDeletes();
}
Problem: When processing large-scale position deletes, the in-memory buffer in SortedPosDeleteWriter can grow unbounded (if record threshold is set very high or to Long.MAX_VALUE), potentially causing OutOfMemoryError (OOM) issues, especially in memory-constrained environments.
Current behavior:
- Only flushes when record count reaches recordsNumThreshold
- No protection against heap memory pressure
- Can lead to OOM when processing large delete operations
How should we improve?
Implement a heap memory-based flush mechanism with the following features:
1. New table properties:
pos-delete.flush.heap.ratio (default: 0.8) - Heap usage ratio threshold to trigger flush
pos-delete.flush.records (default: Long.MAX_VALUE) - Record count threshold
pos-delete.flush.heap.min-records (default: 1000) - Minimum records before heap-based flush kicks in
2. Implementation details:
- Add
HeapUsageProvider interface to monitor JVM heap usage
- Implement
shouldFlushByHeap() method to check if heap usage exceeds threshold
- Modify flush logic to:
if (records >= recordsNumThreshold || shouldFlushByHeap())
- Ensure backward compatibility through constructor overloads
3. Safety guards:
- Prevent frequent small flushes with minimum record count
- Allow disabling heap-based flush by setting invalid ratio (≤0 or ≥1)
- Non-intrusive monitoring (no forced GC)
Are you willing to submit PR?
Subtasks
No response
Code of Conduct
Search before asking
What would you like to be improved?
Currently,
SortedPosDeleteWriteronly flushes buffered position deletes based on a record count threshold. There is aTODOcomment in the code indicating the need for a heap memory-based flush policy:Problem: When processing large-scale position deletes, the in-memory buffer in
SortedPosDeleteWritercan grow unbounded (if record threshold is set very high or toLong.MAX_VALUE), potentially causing OutOfMemoryError (OOM) issues, especially in memory-constrained environments.Current behavior:
How should we improve?
Implement a heap memory-based flush mechanism with the following features:
1. New table properties:
pos-delete.flush.heap.ratio (default: 0.8)- Heap usage ratio threshold to trigger flushpos-delete.flush.records (default: Long.MAX_VALUE)- Record count thresholdpos-delete.flush.heap.min-records (default: 1000)- Minimum records before heap-based flush kicks in2. Implementation details:
HeapUsageProviderinterface to monitor JVM heap usageshouldFlushByHeap()method to check if heap usage exceeds thresholdif (records >= recordsNumThreshold || shouldFlushByHeap())3. Safety guards:
Are you willing to submit PR?
Subtasks
No response
Code of Conduct