feat(spark): add Iceberg + MinIO integration example by digvijay-y · Pull Request #501 · kubeflow/sdk

digvijay-y · 2026-05-24T08:34:34Z

What this PR does / why we need it:
Adds a runnable local development example demonstrating SparkClient with Apache Iceberg and MinIO as S3-compatible object storage.

Includes:

iceberg_minio.py — end-to-end: create namespace, table, write and read data
docker-compose-iceberg-minio.yml — one-command local setup
Updated README.md with setup instructions

Tested with:

iceberg-spark-runtime 1.9.1
AWS SDK v2 (software.amazon.awssdk:bundle:2.26.24)
PySpark 3.5.0
MinIO + tabulario/iceberg-rest

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #500

Checklist:

Docs included if any changes are user facing

Signed-off-by: digvijay-y <144053736+digvijay-y@users.noreply.github.com>

google-oss-prow · 2026-05-24T08:34:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a local-development example for using Spark with Apache Iceberg backed by MinIO (S3-compatible) and an Iceberg REST catalog.

Changes:

Added a runnable PySpark example that creates, writes, and reads an Iceberg table using MinIO + REST catalog.
Added a docker-compose stack to spin up MinIO, initialize a warehouse bucket, and run the Iceberg REST catalog.
Documented setup/run/config/teardown steps in the Spark examples README.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
examples/spark/iceberg_minio.py	New end-to-end Iceberg + MinIO example script with Spark session configuration and basic table operations
examples/spark/docker-compose-iceberg-minio.yml	New docker-compose stack for MinIO + bucket init + Iceberg REST catalog
examples/spark/README.md	Documents how to run the Iceberg + MinIO example and its configuration

Signed-off-by: digvijay-y <144053736+digvijay-y@users.noreply.github.com>

tariq-hasan

Hi @digvijay-y! Thanks for raising the PR. This is a great start. I left a few high-level comments for now. We can go deeper on the review once we align on the approach.

tariq-hasan · 2026-05-28T08:36:24Z

+#!/usr/bin/env python3
+# Copyright 2025 The Kubeflow Authors.
+#


The copyright should be year-less as per the new guidelines.

Suggested change

#!/usr/bin/env python3

# Copyright 2025 The Kubeflow Authors.

#

#!/usr/bin/env python3

# Copyright The Kubeflow Authors.

#

tariq-hasan · 2026-05-28T08:42:06Z

+        .config("spark.sql.catalog.lakehouse.s3.secret-access-key", MINIO_SECRET_KEY)
+        .getOrCreate()
+    )
+


The test uses pyspark directly. We need to run the test through the Kubeflow Spark SDK.

The value of this example for #470 is proving the Iceberg path works through the SDK's SparkConnect path end-to-end. A local SparkSession validates Iceberg+Spark but exercises no Kubeflow code, so it can't go in the example harness.

tariq-hasan · 2026-05-28T08:46:33Z

+
+```bash
+pip install pyspark==3.5.0
+```


The pyspark version should flow directly from pyproject.toml. Since the SDK targets Spark 4.0 (Scala 2.13), the Iceberg runtime must be iceberg-spark-runtime-4.0_2.13, and there should be no separate pyspark install at all.

tariq-hasan · 2026-05-28T09:16:22Z

+
+
+if __name__ == "__main__":
+    main()


The example needs to be integrated with the test harness to verify that it works as part of the e2e.

tariq-hasan · 2026-05-28T09:22:33Z

+      CATALOG_S3_PATH__STYLE__ACCESS: "true"
+    depends_on:
+      minio:
+        condition: service_healthy


Ideally we would like to have Iceberg/MinIO as part of e2e setup or similar.

tariq-hasan · 2026-05-28T09:34:18Z

+    print(f"\nGenerated range with {df.count()} rows across {num_executors} executors")
    print(f"Session name: {session_name}")
    df.show(10)



We should split the changes to spark_connect_simple.py into its own PR as they do not touch the Iceberg MinIO example.

examples/spark: add Iceberg + MinIO example

a3565c2

Signed-off-by: digvijay-y <144053736+digvijay-y@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 24, 2026 08:34

google-oss-prow Bot requested review from Electronic-Waste, astefanutti and szaher May 24, 2026 08:34

google-oss-prow Bot added the size/L label May 24, 2026

Copilot AI reviewed May 24, 2026

View reviewed changes

update review changes

1271a0f

Signed-off-by: digvijay-y <144053736+digvijay-y@users.noreply.github.com>

tariq-hasan suggested changes May 28, 2026

View reviewed changes

google-oss-prow Bot assigned tariq-hasan May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spark): add Iceberg + MinIO integration example#501

feat(spark): add Iceberg + MinIO integration example#501
digvijay-y wants to merge 2 commits into
kubeflow:mainfrom
digvijay-y:examples/iceberg-minio

digvijay-y commented May 24, 2026

Uh oh!

google-oss-prow Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tariq-hasan left a comment

Uh oh!

tariq-hasan May 28, 2026

Uh oh!

tariq-hasan May 28, 2026

Uh oh!

tariq-hasan May 28, 2026

Uh oh!

tariq-hasan May 28, 2026

Uh oh!

tariq-hasan May 28, 2026

Uh oh!

tariq-hasan May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

digvijay-y commented May 24, 2026

Uh oh!

google-oss-prow Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tariq-hasan left a comment

Choose a reason for hiding this comment

Uh oh!

tariq-hasan May 28, 2026

Choose a reason for hiding this comment

Uh oh!

tariq-hasan May 28, 2026

Choose a reason for hiding this comment

Uh oh!

tariq-hasan May 28, 2026

Choose a reason for hiding this comment

Uh oh!

tariq-hasan May 28, 2026

Choose a reason for hiding this comment

Uh oh!

tariq-hasan May 28, 2026

Choose a reason for hiding this comment

Uh oh!

tariq-hasan May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants