Skip to content

feat: Support DELETE operation on Lance datasets#19

Open
fightBoxing wants to merge 1 commit intomainfrom
feature/delete-support
Open

feat: Support DELETE operation on Lance datasets#19
fightBoxing wants to merge 1 commit intomainfrom
feature/delete-support

Conversation

@fightBoxing
Copy link
Copy Markdown
Collaborator

Summary

Implements DELETE support for Lance datasets, addressing #8.

Inspired by the lance-spark DELETE implementation, this PR adds predicate-based row deletion using Lance SDK's native Dataset.delete(String predicate) method.

Changes

New Files

  • LanceDeleteExecutor: Standalone executor class that wraps Lance SDK's Dataset.delete() method
    • Supports predicate-based deletion (SQL-like syntax)
    • Provides delete(), deleteAndCount(), and countRows() methods
    • Implements Closeable for proper resource management

Modified Files

  • LanceSink: Enhanced to handle DELETE RowKind
    • Separate delete buffer for batch processing DELETE operations
    • Builds delete predicates from RowData fields automatically
    • Supports common data types: BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, CHAR, VARCHAR
    • Properly handles NULL values in predicates
    • Flushes deletes before inserts at checkpoint boundaries
  • LanceDynamicTableSink: Updated ChangelogMode to support DELETE when requested

How DELETE Works

When a RowData with RowKind.DELETE is received:

  1. The row is buffered in a separate delete buffer
  2. When the buffer is full (or at checkpoint/close), a predicate is built from all buffered rows
  3. The predicate combines all field values with AND for each row, and OR between rows
  4. Example: For rows (id=1, name='Alice') and (id=2, name='Bob'):
    (id = 1 AND name = 'Alice') OR (id = 2 AND name = 'Bob')
  5. The predicate is executed via Dataset.delete(predicate)

Usage Example

Standalone DELETE:

try (LanceDeleteExecutor executor = new LanceDeleteExecutor("/path/to/dataset")) {
    executor.delete("id = 1");
    executor.delete("age > 30 AND status = 'inactive'");
}

Testing

  • Added 15 unit tests covering:
    • LanceDeleteExecutor creation and validation
    • Predicate validation (null, empty, blank)
    • ChangelogMode support (INSERT-only and INSERT+DELETE)
    • LanceSink builder validation
    • Copy and summary methods

Closes #8

- Add LanceDeleteExecutor for predicate-based row deletion
- Update LanceSink to handle DELETE RowKind with buffered batch processing
- Update LanceDynamicTableSink to support DELETE in ChangelogMode
- Build delete predicates from RowData fields (supports common types)
- Add comprehensive unit tests for DELETE functionality
- Inspired by lance-spark DELETE implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support DELETE

1 participant