Skip to content

Conversation

alinah-amd
Copy link

@alinah-amd alinah-amd commented Sep 23, 2025

Describe your changes

Overview

Added the EstimateNPULatency pass under olive/onnx/vitis_ai/estimate_latency.py
EstimateNPULatency makes use of the NPU Perf Estimator tool to predict computational performance of workloads given a set of parameters.

This is an analysis pass and does not transform the graph at all. Used for performance analysis only.

Installation

To install (if not installed through requirements.txt), run the following:
pip install [placeholder for wheel]

Confirm python version installed is >= 3.10 for compatibility.

If perf estimator package is not installed, the following warning will show and the pass will simply be bypassed:
c5adac46-8376-49c1-a9c0-f78d12d2af9d

Usage

Inputs
EstimateNPULatency takes in both the model in the form of an OnnxModelHandler object and a list of optional parameters in the form of a dict of PassConfigParams (consistent with all other passes).

Optional Parameters
To pass in optional parameters, list parameter name and parameter value as key-value pairs in the json file. See example:

a3d72e0e-0717-4f5d-a590-31333b6a81bb
  1. target_device: Target device type. This is used to provide default config specific to that device type. Currently, only Strix is supported but will modify to include other devices in future.
  • Type: str
  • Default: stx
  • Allowed Values: ["stx"]

Adding Pass to Config File
Should ideally be run as the last pass and listed last in the <model>.json file. For example:

4b0dce1b-a906-46c4-9528-8f94b4a0876e

Output
Generates a concise_summary directory within the run directory with the following files:

557c58ea-7593-429b-9abf-d73ab9d58069

{model_name}_concise_summary.txt will display the following info on roofline latency, total compute ops, and what conclusion can be drawn on performance bottleneck (whether it is DDR Bandwidth bound or Compute bound):

1b2d13c4-4b2a-4775-929b-f429c27e1757

{model_name}_concise_summary.csv will display the same info but specific to per op. Ops are listed in descending order of latency:

528e3263-39e9-4a71-8db8-545325ff6f05

Known Passing Tests

Resnet w/ Perf Estimator
Refer to Olive Recipes Repo

MobileNet w/ Perf Estimator
Refer to Olive Recipes Repo

Unit Test

  • Run python -m pytest test/unit_test/passes/vitis_ai/test_estimate_latency.py
  • This unit test runs a dummy model with the estimate latency pass (calls it directly instead of end to end flow) and asserts that a concise_summary directory with dummy csv and txt files are generated

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

@alinah-amd
Copy link
Author

@microsoft-github-policy-service agree company="Microsoft"

@microsoft-github-policy-service agree company="AMD"

@jambayk
Copy link
Contributor

jambayk commented Oct 1, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

from estimator.run import run_perf_estimate
except ImportError:
perf_installed = False
logger.warning("Estimator module not found. Skipping EstimateNPULatency pass.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of raising a warning, it might be better to fail with import a helpful import error which tells what package to install. since olive caches runs, to rerun the pass with the dependency installed, they would have to clean the cache or delete the cached run that skipped the estimation.

"supported_accelerators": [ "*" ],
"supported_precisions": [ "*" ],
"supported_algorithms": [ ],
"supported_quantization_encodings": [ ]
Copy link
Contributor

@jambayk jambayk Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a "module_dependencies" option like under the autoawqquantizer pass for the package required to run this estimation?

class TestEstimateNPULatency:
"""Test cases for EstimateNPULatency pass."""

def test_estimate_latency_basic(self, tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add the required dependency to the requirements-test.txt under test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants