fix(trainer): changed ValueError to raise RuntimeError in localprocess and container in backends #433#469
Conversation
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Aligns LocalProcess and Container backends with the documented TrainerClient error contract by raising RuntimeError (instead of ValueError) for operational “not found” conditions such as missing jobs, containers, networks, or runtimes.
Changes:
- LocalProcess backend: switch job-not-found and runtime-not-found errors to
RuntimeErrorwhile keepingValueErrorfor input validation (e.g.,polling_interval > timeout). - Container backend: switch job/container/network/runtime lookup failures to
RuntimeErrorand update related docstrings. - Update unit tests to assert
RuntimeErrorfor these not-found scenarios.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
kubeflow/trainer/backends/localprocess/backend.py |
Raises RuntimeError for runtime/job not found to match the public API contract. |
kubeflow/trainer/backends/localprocess/backend_test.py |
Updates expectations to RuntimeError for missing runtime/job cases. |
kubeflow/trainer/backends/container/backend.py |
Raises RuntimeError for missing containers/network/runtime during job reconstruction and lookup. |
kubeflow/trainer/backends/container/backend_test.py |
Updates expectations to RuntimeError for missing job cases. |
kramaranya
left a comment
There was a problem hiding this comment.
Minor formatting issue found — see inline comment.
| local_runtime = next((rt for rt in local_runtimes if rt.name == runtime.name), None) | ||
| if not local_runtime: | ||
| raise ValueError(f"Runtime '{runtime.name}' not found.") | ||
| raise RuntimeError(f"Runtime '{runtime.name}' not found.") |
There was a problem hiding this comment.
| raise RuntimeError(f"Runtime '{runtime.name}' not found.") | |
| raise RuntimeError(f"Runtime '{runtime.name}' not found.") |
|
Thanks @kramaranya for the review I've fixed the line now |
|
Could you rebase the changes and sign your commits please? |
fb06144 to
811cfb9
Compare
| config={"name": BASIC_TRAIN_JOB_NAME, "timeout": 10, "polling_interval": -1}, | ||
| expected_error=ValueError, | ||
| config={"job_name": "nonexistent-job"}, | ||
| expected_error=RuntimeError, | ||
| ), |
There was a problem hiding this comment.
This TestCase currently passes config and expected_error twice, which is a Python syntax error and will prevent the test module from importing; split the job-not-found assertion into its own TestCase entry (using config={"name": "nonexistent-job"} and expected_error=RuntimeError, since wait_for_job_status(**config) expects name).
811cfb9 to
fb1c808
Compare
| if not containers: | ||
| raise ValueError(f"No TrainJob with name {name}") | ||
| raise RuntimeError(f"No TrainJob with name {name}") |
| network_id = containers[0]["labels"].get(f"{self.label_prefix}/network-id") | ||
| if not network_id: | ||
| raise ValueError(f"TrainJob {job_name} is missing network metadata") | ||
| raise RuntimeError(f"TrainJob {job_name} is missing network metadata") | ||
|
|
||
| network_info = self._adapter.get_network(network_id) | ||
| if not network_info: | ||
| raise ValueError(f"TrainJob {job_name} network not found") | ||
| raise RuntimeError(f"TrainJob {job_name} network not found") |
| try: | ||
| job_runtime = self.get_runtime(runtime_name) if runtime_name else None | ||
| except Exception as e: | ||
| raise ValueError(f"Runtime {runtime_name} not found for job {job_name}") from e | ||
| raise RuntimeError(f"Runtime {runtime_name} not found for job {job_name}") from e | ||
|
|
||
| if not job_runtime: | ||
| raise ValueError(f"Runtime {runtime_name} not found for job {job_name}") | ||
| raise RuntimeError(f"Runtime {runtime_name} not found for job {job_name}") |
fb1c808 to
41eba7d
Compare
…s and container backends Signed-off-by: aashvgit <167199295+aashvgit@users.noreply.github.com>
41eba7d to
912039f
Compare
What this PR does / why we need it:
backend now raise RuntimeError for job-not-found
backend now raise RuntimeError for job/network/runtime not found
Fixes #
#433