Skip to content

fix: port-forward process cleanup in connect() failure paths#461

Open
Goku2099 wants to merge 7 commits into
kubeflow:mainfrom
Goku2099:fix-port-forward-leak
Open

fix: port-forward process cleanup in connect() failure paths#461
Goku2099 wants to merge 7 commits into
kubeflow:mainfrom
Goku2099:fix-port-forward-leak

Conversation

@Goku2099
Copy link
Copy Markdown

Problem

In kubeflow/spark/backends/kubernetes/backend.py inside connect(), the port-forward process created via subprocess.Popen is not properly cleaned up in failure scenarios.

Specifically:

  • During retry scenarios (when port-forward dies and is recreated), the previous process is not terminated and reaped.
  • In exception and timeout paths, the process may remain running or not be reaped (wait() not called).

Fix

  • Added cleanup before reassigning pf_proc in retry paths
  • Ensured wait() is always called to properly reap subprocesses
  • Added a try/finally block to guarantee cleanup in failure scenarios
  • Preserved active connection on success by avoiding termination of the running process

Impact

  • Prevents orphaned kubectl port-forward processes
  • Avoids zombie subprocesses
  • Improves subprocess lifecycle handling in failure cases

Related

Fixes #460

Copilot AI review requested due to automatic review settings April 15, 2026 12:36
@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes subprocess lifecycle handling for kubectl port-forward created during Spark Connect setup, aiming to prevent orphaned/zombie processes when connect() fails or retries.

Changes:

  • Wraps connect() port-forward setup/connection logic in try/finally to ensure cleanup on failure paths.
  • Adds explicit termination/reaping of prior port-forward processes before restarting in retry scenarios.
  • Adds a new test_pf_leak.py script intended to demonstrate subprocess leakage.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
kubeflow/spark/backends/kubernetes/backend.py Adds cleanup logic for port-forward subprocesses across failure/retry/timeout paths in connect().
test_pf_leak.py Introduces a standalone subprocess-leak reproduction script (currently conflicts with pytest collection).

Comment thread test_pf_leak.py Outdated
Comment thread kubeflow/spark/backends/kubernetes/backend.py Outdated
@Goku2099
Copy link
Copy Markdown
Author

cc @andreyvelich @Shekharrajak @tariq-hasan
As discussed on Slack, I’ve raised this PR , happy to get feedback if anyone is available.

Copy link
Copy Markdown
Member

@Shekharrajak Shekharrajak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refine this for production; the current method is quite cluttered and could use some simplification.

Always add some unit tests (even though unit tests not present) show that we can validate the changes and requirements .

@Goku2099
Copy link
Copy Markdown
Author

We should refine this for production; the current method is quite cluttered and could use some simplification.

Always add some unit tests (even though unit tests not present) show that we can validate the changes and requirements .

Thanks for the review, I will simplify the logic and add a basic unit test to validate the failure scenarios.

@Goku2099
Copy link
Copy Markdown
Author

@Shekharrajak
I have updated the PR please review whenever you have time. I will keep in mind to keep the code cleaner and include tests going forward.

@Goku2099
Copy link
Copy Markdown
Author

@Shekharrajak @tariq-hasan gentle ping on this PR, would appreciate a review when you have time.

Copy link
Copy Markdown
Member

@tariq-hasan tariq-hasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Goku2099 Thanks for raising the PR. I have left a few comments.

/cc @Shekharrajak @andreyvelich

Comment thread kubeflow/spark/backends/kubernetes/backend.py Outdated
Comment thread kubeflow/spark/backends/kubernetes/backend.py Outdated
Comment thread kubeflow/spark/backends/kubernetes/backend.py
Comment thread kubeflow/spark/backends/kubernetes/backend.py
Signed-off-by: Sameer_yadav <159073326+Goku2099@users.noreply.github.com>
Signed-off-by: Sameer_yadav <159073326+Goku2099@users.noreply.github.com>
Signed-off-by: Sameer_yadav <159073326+Goku2099@users.noreply.github.com>
…ailure path

Signed-off-by: Sameer_yadav <159073326+Goku2099@users.noreply.github.com>
Signed-off-by: Sameer_yadav <159073326+Goku2099@users.noreply.github.com>
@Goku2099 Goku2099 force-pushed the fix-port-forward-leak branch from 2958fb3 to 253d6f6 Compare April 29, 2026 06:11
Copy link
Copy Markdown
Member

@tariq-hasan tariq-hasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Goku2099 Thanks for addressing the review comments. I have left a few nits and additional comments on the tests.

Comment thread kubeflow/spark/backends/kubernetes/backend.py
Comment thread kubeflow/spark/backends/kubernetes/backend.py
Comment thread kubeflow/spark/backends/kubernetes/backend.py
Comment thread kubeflow/spark/backends/kubernetes/backend.py Outdated
Comment thread kubeflow/spark/backends/kubernetes/backend.py Outdated
Comment thread kubeflow/spark/backends/kubernetes/backend.py
Comment thread kubeflow/spark/backends/kubernetes/backend_test.py Outdated
Comment thread kubeflow/spark/backends/kubernetes/backend.py
Comment thread kubeflow/spark/backends/kubernetes/backend.py
… fix tests

Signed-off-by: Sameer_yadav <159073326+Goku2099@users.noreply.github.com>
Comment thread kubeflow/spark/backends/kubernetes/backend_test.py Outdated
Signed-off-by: Sameer_yadav <159073326+Goku2099@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port-forward process not cleaned up in connect() failure paths

4 participants