Skip to content

Conversation

@CTTY
Copy link
Collaborator

@CTTY CTTY commented Nov 6, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Use project to calculate partition values for record batches
  • Repartition inputs for table_provider::insert_into
  • Initialize partition_splitter in TaskWriter's constructor
  • Use TaskWriter in IcebergWriteExec to support partitioned data

Are these changes tested?

Added an ut

@CTTY CTTY marked this pull request as ready for review November 6, 2025 15:25
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr!

}

#[tokio::test]
async fn test_insert_into_partitioned() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this test to newly added sqllogictests? I think we should be able to read/write now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, I'll work on adding sqllogictests for INSERT INTO overall.

This test case still has its value as it validates the inserted data files are put under the partitioned path correctly. I'll create a tracking issue as a follow up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #1835

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall direction is to remove integration tests like this, and move them in sqllogictests. But we could delete this after we fix #1835 .

This test case still has its value as it validates the inserted data files are put under the partitioned path correctly.

I'm not convinced. Such detailed check could be done in ut of low level apis rather in integration tests.

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr!

@liurenjie1024 liurenjie1024 merged commit 1fcad93 into apache:main Nov 12, 2025
16 checks passed
@CTTY CTTY deleted the ctty/apply-task branch November 12, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support INSERT INTO partitioned data for DataFusion

2 participants