Fix/preflight: suppress distributed init warning during local checks#604
Fix/preflight: suppress distributed init warning during local checks#604alexsu52 wants to merge 5 commits intoAMD-AGI:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an expect_distributed flag to the preflight info/network collection path so local-only preflight runs can avoid emitting a misleading “process group not initialized” warning before torch.distributed is intentionally initialized.
Changes:
- Add
expect_distributed: bool = Trueparameter torun_preflight_infoand pass it through to network info collection. - Thread
expect_distributedintocollect_network_info→run_network_full_checks. - Downgrade the “Runtime process group not initialized” finding from WARN → INFO when
expect_distributed=False, and use this mode for the initial local-only preflight runs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
primus/tools/preflight/preflight_perf_test.py |
Adds expect_distributed to run_preflight_info and uses False for local-only preflight runs to suppress early PG-init warnings. |
primus/tools/preflight/network/info.py |
Threads expect_distributed through network info collection into full checks. |
primus/tools/preflight/network/network_full.py |
Makes the PG-not-initialized finding severity conditional on expect_distributed (WARN vs INFO). |
Comments suppressed due to low confidence (2)
primus/tools/preflight/preflight_perf_test.py:65
run_preflight_infonow acceptsexpect_distributed, but the docstring’s Args section doesn’t document this parameter or its effect (downgrading distributed/PG-init findings during local-only runs). Please update the docstring so callers understand when to passexpect_distributed=Falseand what behavior changes.
def run_preflight_info(args: Any, expect_distributed: bool = True) -> int:
"""
Run lightweight preflight info collection (host/gpu/network), aggregate across ranks,
and write Markdown/PDF report on rank0.
Args:
args: Namespace with optional fields:
- check_host (bool)
- check_gpu (bool)
- check_network (bool)
- dump_path (str)
- report_file_name (str)
- save_pdf (bool)
primus/tools/preflight/network/network_full.py:20
run_network_full_checksnow takesexpect_distributed, but the function docstring doesn’t mention the parameter or the changed severity behavior. Please document the flag so callers know how to control the PG-init finding severity.
def run_network_full_checks(expect_distributed: bool = True) -> Dict[str, Any]:
"""
Level: full
Verify runtime process group sanity (best-effort).
"""
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds an explicit “distributed expected” flag to preflight info collection so local-only runs don’t emit a misleading “process group not initialized” warning when torch.distributed hasn’t been initialized yet.
Changes:
- Add
expect_distributed: bool = Truetorun_preflight_infoand plumb it into network info collection. - Downgrade “Runtime process group not initialized” from WARN → INFO when
expect_distributed=False. - Configure initial local-only preflight info runs to call
run_preflight_info(..., expect_distributed=False).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
primus/tools/preflight/preflight_perf_test.py |
Adds expect_distributed to info collection and uses False for the initial local-only report before attempting distributed init. |
primus/tools/preflight/network/network_full.py |
Makes the “PG not initialized” finding INFO when distributed isn’t expected. |
primus/tools/preflight/network/info.py |
Threads expect_distributed into the full network checks call. |
Comments suppressed due to low confidence (2)
primus/tools/preflight/preflight_perf_test.py:65
run_preflight_infonow acceptsexpect_distributed, but the docstring’s Args section doesn’t mention it. Please document what this flag controls (e.g., whether missingtorch.distributedPG init should be treated as WARN vs INFO) so callers understand when to passFalse.
def run_preflight_info(args: Any, expect_distributed: bool = True) -> int:
"""
Run lightweight preflight info collection (host/gpu/network), aggregate across ranks,
and write Markdown/PDF report on rank0.
Args:
args: Namespace with optional fields:
- check_host (bool)
- check_gpu (bool)
- check_network (bool)
- dump_path (str)
- report_file_name (str)
- save_pdf (bool)
primus/tools/preflight/network/network_full.py:20
run_network_full_checksadded anexpect_distributedparameter, but the docstring doesn’t describe how it changes behavior (WARN vs INFO when PG isn’t initialized). Please update the docstring to clarify intended usage and default semantics.
def run_network_full_checks(expect_distributed: bool = True) -> Dict[str, Any]:
"""
Level: full
Verify runtime process group sanity (best-effort).
"""
There was a problem hiding this comment.
Pull request overview
This PR adds an expect_distributed flag to preflight network/runtime checks so local-only preflight runs don’t emit a “process group not initialized” warning before distributed initialization is attempted.
Changes:
- Add
expect_distributed: bool = Truetorun_preflight_info, threading it through to network info collection. - Add
expect_distributed: bool = Truetocollect_network_infoandrun_network_full_checks. - Run initial local-only preflight checks with
expect_distributed=Falseto downgrade the PG-not-initialized warning to an info-level finding.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
primus/tools/preflight/preflight_perf_test.py |
Threads expect_distributed through run_preflight_info and uses False for the initial local-only report runs. |
primus/tools/preflight/network/info.py |
Adds expect_distributed passthrough to runtime/full network checks. |
primus/tools/preflight/network/network_full.py |
Downgrades the “process group not initialized” finding from WARN→INFO when distributed runtime is not expected. |
Comments suppressed due to low confidence (1)
primus/tools/preflight/preflight_perf_test.py:56
run_preflight_infogained theexpect_distributedparameter, but the docstring’s Args section still only documentsargs. Please documentexpect_distributed(what it controls and when to set it toFalse) so the CLI/report behavior is clear to callers.
def run_preflight_info(args: Any, expect_distributed: bool = True) -> int:
"""
Run lightweight preflight info collection (host/gpu/network), aggregate across ranks,
and write Markdown/PDF report on rank0.
There was a problem hiding this comment.
Pull request overview
This PR adds an expect_distributed switch to preflight info collection so that local-only preflight runs don’t emit the “process group not initialized” warning before distributed initialization is attempted.
Changes:
- Add
expect_distributed: bool = Truetorun_preflight_infoand plumb it into network info collection. - Update network runtime checks to downgrade the “PG not initialized” finding from WARN → INFO when distributed runtime is not expected.
- Configure initial local-only preflight report generation to call
run_preflight_info(..., expect_distributed=False).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
primus/tools/preflight/preflight_perf_test.py |
Adds expect_distributed flag and uses it to suppress PG warnings during the first local-only info pass. |
primus/tools/preflight/network/info.py |
Threads expect_distributed through collect_network_info into full runtime network checks. |
primus/tools/preflight/network/network_full.py |
Uses expect_distributed to choose WARN vs INFO for “Runtime process group not initialized”. |
Comments suppressed due to low confidence (1)
primus/tools/preflight/network/network_full.py:19
run_network_full_checksintroduces theexpect_distributedparameter, but the docstring doesn’t explain it. Please document how this flag affects the severity of the “process group not initialized” finding so the behavior is clear to future callers.
def run_network_full_checks(expect_distributed: bool = True) -> Dict[str, Any]:
"""
Level: full
Verify runtime process group sanity (best-effort).
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds an explicit “distributed expected” toggle to preflight’s info/network collection so local-only preflight runs don’t emit the distributed process-group initialization warning.
Changes:
- Add
expect_distributed: bool = Truetorun_preflight_infoand thread it into network info collection. - Add
expect_distributed: bool = Truetocollect_network_infoandrun_network_full_checks. - Invoke
run_preflight_info(..., expect_distributed=False)for the initial local-only report paths.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
primus/tools/preflight/preflight_perf_test.py |
Adds expect_distributed to info collection and uses False during local-only preflight report generation. |
primus/tools/preflight/network/info.py |
Threads expect_distributed into the full network runtime checks. |
primus/tools/preflight/network/network_full.py |
Downgrades “PG not initialized” from WARN→INFO when distributed runtime is not expected. |
There was a problem hiding this comment.
Pull request overview
This PR reduces noise in Primus preflight “info-only” runs by suppressing the “Runtime process group not initialized” warning when the invocation is intentionally local-only (i.e., before distributed init is attempted).
Changes:
- Add an
expect_distributed: bool = Trueoption torun_preflight_info. - Thread
expect_distributedthroughcollect_network_info→run_network_full_checks. - Downgrade “process group not initialized” from WARN to INFO when
expect_distributed=Falseto avoid stderr warning spam during local checks.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
primus/tools/preflight/preflight_perf_test.py |
Adds expect_distributed to run_preflight_info and uses False for initial local-only reporting passes. |
primus/tools/preflight/network/info.py |
Adds expect_distributed parameter and forwards it into the full network checks. |
primus/tools/preflight/network/network_full.py |
Uses expect_distributed to emit INFO instead of WARN when PG isn’t initialized but a local-only run is expected. |
Changes:
expect_distributed = Truetorun_preflight_infopreflightwithexpect_distributed=Falseduring initial local-only checks.Reason for changes:
Warning suppression:
[Primus:Preflight] WARN: Runtime process group not initialized