Skip to content

Fix/preflight: suppress distributed init warning during local checks#604

Open
alexsu52 wants to merge 5 commits intoAMD-AGI:mainfrom
alexsu52:fix/preflight/init_dist
Open

Fix/preflight: suppress distributed init warning during local checks#604
alexsu52 wants to merge 5 commits intoAMD-AGI:mainfrom
alexsu52:fix/preflight/init_dist

Conversation

@alexsu52
Copy link
Contributor

Changes:

  • Add optional argument expect_distributed = True to run_preflight_info
  • Configure preflight with expect_distributed=False during initial local-only checks.

Reason for changes:

Warning suppression:
[Primus:Preflight] WARN: Runtime process group not initialized

Copilot AI review requested due to automatic review settings March 16, 2026 03:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an expect_distributed flag to the preflight info/network collection path so local-only preflight runs can avoid emitting a misleading “process group not initialized” warning before torch.distributed is intentionally initialized.

Changes:

  • Add expect_distributed: bool = True parameter to run_preflight_info and pass it through to network info collection.
  • Thread expect_distributed into collect_network_inforun_network_full_checks.
  • Downgrade the “Runtime process group not initialized” finding from WARN → INFO when expect_distributed=False, and use this mode for the initial local-only preflight runs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
primus/tools/preflight/preflight_perf_test.py Adds expect_distributed to run_preflight_info and uses False for local-only preflight runs to suppress early PG-init warnings.
primus/tools/preflight/network/info.py Threads expect_distributed through network info collection into full checks.
primus/tools/preflight/network/network_full.py Makes the PG-not-initialized finding severity conditional on expect_distributed (WARN vs INFO).
Comments suppressed due to low confidence (2)

primus/tools/preflight/preflight_perf_test.py:65

  • run_preflight_info now accepts expect_distributed, but the docstring’s Args section doesn’t document this parameter or its effect (downgrading distributed/PG-init findings during local-only runs). Please update the docstring so callers understand when to pass expect_distributed=False and what behavior changes.
def run_preflight_info(args: Any, expect_distributed: bool = True) -> int:
    """
    Run lightweight preflight info collection (host/gpu/network), aggregate across ranks,
    and write Markdown/PDF report on rank0.

    Args:
        args: Namespace with optional fields:
            - check_host (bool)
            - check_gpu (bool)
            - check_network (bool)
            - dump_path (str)
            - report_file_name (str)
            - save_pdf (bool)

primus/tools/preflight/network/network_full.py:20

  • run_network_full_checks now takes expect_distributed, but the function docstring doesn’t mention the parameter or the changed severity behavior. Please document the flag so callers know how to control the PG-init finding severity.
def run_network_full_checks(expect_distributed: bool = True) -> Dict[str, Any]:
    """
    Level: full

    Verify runtime process group sanity (best-effort).
    """

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 16, 2026 03:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an explicit “distributed expected” flag to preflight info collection so local-only runs don’t emit a misleading “process group not initialized” warning when torch.distributed hasn’t been initialized yet.

Changes:

  • Add expect_distributed: bool = True to run_preflight_info and plumb it into network info collection.
  • Downgrade “Runtime process group not initialized” from WARN → INFO when expect_distributed=False.
  • Configure initial local-only preflight info runs to call run_preflight_info(..., expect_distributed=False).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
primus/tools/preflight/preflight_perf_test.py Adds expect_distributed to info collection and uses False for the initial local-only report before attempting distributed init.
primus/tools/preflight/network/network_full.py Makes the “PG not initialized” finding INFO when distributed isn’t expected.
primus/tools/preflight/network/info.py Threads expect_distributed into the full network checks call.
Comments suppressed due to low confidence (2)

primus/tools/preflight/preflight_perf_test.py:65

  • run_preflight_info now accepts expect_distributed, but the docstring’s Args section doesn’t mention it. Please document what this flag controls (e.g., whether missing torch.distributed PG init should be treated as WARN vs INFO) so callers understand when to pass False.
def run_preflight_info(args: Any, expect_distributed: bool = True) -> int:
    """
    Run lightweight preflight info collection (host/gpu/network), aggregate across ranks,
    and write Markdown/PDF report on rank0.

    Args:
        args: Namespace with optional fields:
            - check_host (bool)
            - check_gpu (bool)
            - check_network (bool)
            - dump_path (str)
            - report_file_name (str)
            - save_pdf (bool)

primus/tools/preflight/network/network_full.py:20

  • run_network_full_checks added an expect_distributed parameter, but the docstring doesn’t describe how it changes behavior (WARN vs INFO when PG isn’t initialized). Please update the docstring to clarify intended usage and default semantics.
def run_network_full_checks(expect_distributed: bool = True) -> Dict[str, Any]:
    """
    Level: full

    Verify runtime process group sanity (best-effort).
    """

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an expect_distributed flag to preflight network/runtime checks so local-only preflight runs don’t emit a “process group not initialized” warning before distributed initialization is attempted.

Changes:

  • Add expect_distributed: bool = True to run_preflight_info, threading it through to network info collection.
  • Add expect_distributed: bool = True to collect_network_info and run_network_full_checks.
  • Run initial local-only preflight checks with expect_distributed=False to downgrade the PG-not-initialized warning to an info-level finding.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
primus/tools/preflight/preflight_perf_test.py Threads expect_distributed through run_preflight_info and uses False for the initial local-only report runs.
primus/tools/preflight/network/info.py Adds expect_distributed passthrough to runtime/full network checks.
primus/tools/preflight/network/network_full.py Downgrades the “process group not initialized” finding from WARN→INFO when distributed runtime is not expected.
Comments suppressed due to low confidence (1)

primus/tools/preflight/preflight_perf_test.py:56

  • run_preflight_info gained the expect_distributed parameter, but the docstring’s Args section still only documents args. Please document expect_distributed (what it controls and when to set it to False) so the CLI/report behavior is clear to callers.
def run_preflight_info(args: Any, expect_distributed: bool = True) -> int:
    """
    Run lightweight preflight info collection (host/gpu/network), aggregate across ranks,
    and write Markdown/PDF report on rank0.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an expect_distributed switch to preflight info collection so that local-only preflight runs don’t emit the “process group not initialized” warning before distributed initialization is attempted.

Changes:

  • Add expect_distributed: bool = True to run_preflight_info and plumb it into network info collection.
  • Update network runtime checks to downgrade the “PG not initialized” finding from WARN → INFO when distributed runtime is not expected.
  • Configure initial local-only preflight report generation to call run_preflight_info(..., expect_distributed=False).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
primus/tools/preflight/preflight_perf_test.py Adds expect_distributed flag and uses it to suppress PG warnings during the first local-only info pass.
primus/tools/preflight/network/info.py Threads expect_distributed through collect_network_info into full runtime network checks.
primus/tools/preflight/network/network_full.py Uses expect_distributed to choose WARN vs INFO for “Runtime process group not initialized”.
Comments suppressed due to low confidence (1)

primus/tools/preflight/network/network_full.py:19

  • run_network_full_checks introduces the expect_distributed parameter, but the docstring doesn’t explain it. Please document how this flag affects the severity of the “process group not initialized” finding so the behavior is clear to future callers.
def run_network_full_checks(expect_distributed: bool = True) -> Dict[str, Any]:
    """
    Level: full

    Verify runtime process group sanity (best-effort).

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 16, 2026 05:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an explicit “distributed expected” toggle to preflight’s info/network collection so local-only preflight runs don’t emit the distributed process-group initialization warning.

Changes:

  • Add expect_distributed: bool = True to run_preflight_info and thread it into network info collection.
  • Add expect_distributed: bool = True to collect_network_info and run_network_full_checks.
  • Invoke run_preflight_info(..., expect_distributed=False) for the initial local-only report paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
primus/tools/preflight/preflight_perf_test.py Adds expect_distributed to info collection and uses False during local-only preflight report generation.
primus/tools/preflight/network/info.py Threads expect_distributed into the full network runtime checks.
primus/tools/preflight/network/network_full.py Downgrades “PG not initialized” from WARN→INFO when distributed runtime is not expected.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces noise in Primus preflight “info-only” runs by suppressing the “Runtime process group not initialized” warning when the invocation is intentionally local-only (i.e., before distributed init is attempted).

Changes:

  • Add an expect_distributed: bool = True option to run_preflight_info.
  • Thread expect_distributed through collect_network_inforun_network_full_checks.
  • Downgrade “process group not initialized” from WARN to INFO when expect_distributed=False to avoid stderr warning spam during local checks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
primus/tools/preflight/preflight_perf_test.py Adds expect_distributed to run_preflight_info and uses False for initial local-only reporting passes.
primus/tools/preflight/network/info.py Adds expect_distributed parameter and forwards it into the full network checks.
primus/tools/preflight/network/network_full.py Uses expect_distributed to emit INFO instead of WARN when PG isn’t initialized but a local-only run is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants