Add a simple accelerator selection mechanism. #895

zolkis · 2025-10-08T13:21:32Z

Fixes #815

As explained in #884, add context options and attributes/internal slots that can be used for conveying application hints wrt the preferred acceleration type (CPU or massively parallel processing, i.e. NPU or GPU).

This is a minimal change, and we might want to refine more the algorithms wrt. context power preferences and acceleration options (currently not addressed). These could be done in this PR, or in a separate subsequent PR.

Preview | Diff

Signed-off-by: Zoltan Kis <[email protected]>

…d the poll CPU fallback status steps. Invoke it from graph.dispatch(). Signed-off-by: Zoltan Kis <[email protected]>

anssiko · 2025-10-21T06:55:50Z

@zolkis thank you for formalizing the group’s current thinking into this PR!

@huningxin @RafaelCintron, this spec PR is on the WebML WG Teleconference – 23 October 2025 agenda. Reviews, comments, questions prior in this PR appreciated.

@handellm to check we remain aligned with Google Meet requirements.

FYI @mtavenrath who expressed interest in this space.

handellm · 2025-10-21T12:06:40Z

Seems good!

index.bs

Co-authored-by: Reilly Grant <[email protected]>

huningxin · 2025-10-25T23:13:18Z

index.bs

    1. Enqueue the following steps to |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[timeline]]}}:
        1. Run these steps, but [=/abort when=] [=this=] [=MLContext/is lost=]:
-            1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|.
+            1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|, as well as |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[powerPreference]]}} and |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[accelerated]]}}.


I suppose powerPreference and accelerated options should be used by build steps rather than dispatch?

These steps were meant for the dispatch phase, when the actual accelerators are selected.
If underlying accelerators cannot be modified during dispatch, then yes, these could be in the build steps, for static preparation.
However, if supported, they should also be included in the dispatch steps, which is the final decision point in dynamic execution.
I guess for now we could just move it to the build phase.

huningxin · 2025-10-25T23:14:26Z

index.bs

        1. Run these steps, but [=/abort when=] [=this=] [=MLContext/is lost=]:
-            1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|.
+            1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|, as well as |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[powerPreference]]}} and |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[accelerated]]}}.
+            1. Run the steps to [=poll CPU fallback status=] for |graph|.{{MLGraph/[[context]]}}.


This step seems to be unnecessary because cpuFallbackActive getter already runs it?

Right, these would only be needed if there was an event (discussed earlier and agreed that polling is enough for now).

huningxin · 2025-10-25T23:17:20Z

index.bs

+  </summary>
+    1. If [=this=].{{MLContext/[[accelerated]]}} is `false`, then:
+        1. Set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `true` and return.
+    1. If the underlying execution device is available, then:


Is it worth adding a definition for "underlying execution device"?

They are mentioned in the device selection section, though no formal definition is given.

If we wanted to give one, it's important to stress it's not a single device, but the final, eventually heterogeneous execution plan that maps specific parts of the model graph to the best available combination of accelerators at the exact moment of inference.

During the build phase, we should not select a device, but define preferences (e.g. prioritized list of execution providers/delegates), which the runtime / underlying platform uses for the actual decisions.

This opens again the discussion on the relationship between context and underlying execution device(s). I think we should not refer to a single device here, in the light of past discussions.

In general we should bind the context not to a device, but to the execution plan (prioritized list of execution providers) mentioned above. Then, a separate concept (internal slot) would be the actual execution plan in the moment of inference. The text formulation should allow for a single device per context to supporting heterogeneous sub-graph execution on different devices.

I think we could track that in a separate issue. In this PR, I have just removed the text "currently only the {{MLPowerPreference}} option" in line 751, and used the term from the device selection section in this algorithm.

For this PR, I modify the text so that it's compatible with my explanation above.

huningxin · 2025-10-25T23:36:11Z

index.bs

+        The {{MLContext}}'s processing type (CPU or massively parallel processing).
+    : <dfn>\[[cpuFallbackActive]]</dfn> of type {{boolean}}.
+    ::
+        The {{MLContext}}'s status for CPU fallback type (CPU or massively parallel processing).


AFAIK, the major native ML runtimes, including Core ML, Windows ML (ONNX Runtime) and TFLite, enable CPU fallback by default. Some runtimes, e.g. ONNX Runtime, allow developers to disable CPU fallback explicitly through a session option disable_cpu_ep_fallback 1. Without CPU fallback, model compilation may fail if the accelerator cannot execute all ops. Chromium prototype has a switch for that only for debugging purpose 2. What are the other cases that a WebNN implementation may set this to false?

Setting the CPU fallback option to false is when the application wants to have an (error) indication if massively parallel execution is not guaranteed with high chance (not an exact thing, but among many contradicting options, it's good enough). The use case is laid out in issue #815, see e.g. comment, and the following discussion.
(Feel free to suggest other solutions.)

EDIT (w.r.t. where to check for CPU fallback): this use case would prefer early warning of CPU fallback likelihood (to be able to choose another inference path), so for that the checks make more sense in the build steps, indeed.

application wants to have an (error) indication if massively parallel execution is not guaranteed with high chance

How could an application indicate that? Should MLContextOptions add another property, something like boolean cpuFallback, default to true? An application can set contextOptions.cpuFallback to false for this use case.

That was discussed in earlier calls (in the explainer related discussions): exposing a context option for setting CPU fallback to false hits some constraints and could be accomplished with the accelerated option, hence was discarded as an approach.

In #884 there is a code example for this use case:

// create a context that should use massive parallel processing (e.g. GPU/NPU) context = await navigator.ml.createContext({accelerated: true}); if (context.accelerated) { // the context will mostly use GPU/NPU, but CPU fallback may happen } else { // the platform tells it likely cannot provide NPU or GPU, so try something else } // create a context that should preferably use NPU context = await navigator.ml.createContext({accelerated: true, powerPreference: 'low-power'}); if (context.accelerated) { // NPU is likely used -- further requirements could be set by opSupportLimitsPerDevice } else { // NPU is likely not available, and since GPU needs high power, it is not used }

Thanks for the code example. I understand an implementation should preferably use GPU/NPU if accelerated option is set to true. However, as I shared, the CPU fallback is enabled by default by major native ML runtimes. It's not clear to me how an implementation can tell an application wants to disable the CPU fallback.

could be accomplished with the accelerated option

Do you mean the implementation should disable CPU fallback if accelerated option is set to true? Then how could an application indicate it is fine with CPU fallback while preferring GPU/NPU execution?

This sounds reasonable to me.

@zolkis if you agree and are available, please open a separate issue for cpuFallbackActive and seed it with your insights. If you also update this PR accordingly we should be able to merge this PR by the end of the week.

We have already removed the context option for preventing CPU fallback.

I'd like to understand the concerns with the cpuFallbackActive attribute. If it is it because of the polling steps, I already removed calling them from graph.dispatch() and didn't include them in build(), so there is only the getter, for which @handellm said would be good enough for Meet (instead of an event, which would present more issues).

Are there any further issues to be clarified, @huningxin , @reillyeon?

According to the offline discussion, cpuFallbackActive seems to be a useful attribute of MLGraph (maybe coordinating with @philloooo 's proposal #854) rather than MLContext. I'll let @reillyeon and @philloooo chime in and share more thoughts.

cpuFallbackActive seems to be a useful attribute of MLGraph

I agree, that makes a lot of sense. Even more, actually a sub-graph or individual ops might fall back to CPU (as mentioned before, a context / graph should be associated with an execution plan, not only with underlying execution devices).

For this PR, exposing cpuFallbackActive on context was chosen for the "simplicity" argument, also because a context still is associated with an underlying execution device -- for which we opened another issue in #897. Once we relax that and work with these terms, I think we should properly address CPU fallback as well.

This is a good development, so I will remove it from this PR.

I agree with @zolkis's comments above that an MLContext represents a preferred order of execution providers (determined by power/acceleration preference) while only when you construct an MLGraph do you know what the actual execution plan for a given graph will look like.

…he steps checking CPU fallback Signed-off-by: Zoltan Kis <[email protected]>

handellm · 2025-10-29T11:55:26Z

index.bs

+  </summary>
+    1. If [=this=].{{MLContext/[[accelerated]]}} is `false`, then:
+        1. Set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `true` and return.
+    1. Issue a request to check whether the underlying platform uses CPU as the main underlying execution device for inference. If yes, then implementations <span class=allow-2119>should</span> set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `true` and return.


Is "issue a request" achievable in a synchronous manner? If not perhaps this should be an async method.

True, the parlance needs changing, in order to avoid confusions. The underlying platform already knows this information - so the assumption is that it can be obtained synchronously (even if the operation would be asynchronous on OS level). Again, this speaks for an event based design.
Nevertheless, looks like we are going to drop cpuFallbackActive for now.

Signed-off-by: Zoltan Kis <[email protected]>

zolkis · 2025-10-29T20:33:29Z

@huningxin your feedbacks have been addressed:

removed cpuFallbackActive from this PR (for later introduction);
moved the accelerated consideration from the dispatch() steps to the build() steps.
When merging this PR, please keep the cpuFallbackActive commit in history.

huningxin

LGTM, thanks @zolkis !

anssiko

@zolkis, thank you for addressing the review feedback swiftly and professionally.

@huningxin @fdwr, I will give you as the editors a mandate to merge this PR at your earliest opportunity. Please check that any remaining non-blocking feedback for future enhancements informed by the PR review comments is recorded in issues, and open new issues as appropriate.

Signed-off-by: Zoltan Kis <[email protected]>

anssiko · 2025-10-31T07:22:47Z

I will now merge this and open a new issue #900 for more narrowly scoped discussion wrt the CPU fallback mechanism.

Again, thank you @zolkis and everyone who contributed to this feature through issue discussions, design proposals and review. This establishes a baseline for future device selection hints and enhancements.

SHA: 0ce9f32 Reason: push, by anssiko Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add design rationale, background and examples for the `accelerated` hint introduced in #895 * Add a design proposal for the proposed future enhancement, a CPU fallback hint * Document new requirements for post-compile query Signed-off-by: Zoltan Kis <[email protected]>

fdwr · 2025-11-05T05:56:20Z

This establishes a baseline for future device selection hints and enhancements.

👍

I've mostly been observing the chatter on this one, but it seems we have multiple independent attributes that can contribute to device selection, including aspects like: how big a workload is, how immediately you want the result (a GPU can finish a large load faster than the CPU, but when desiring small latency with small workloads, the CPU is often better), whether it's continuous repeating work or single shot, desired power usage...

acceptable latency: low, medium, high
continuity: continuous, moderate, single shot
workload size: small, medium, large
power usage: low, medium, high
...

e.g.

App scenarios	preferred latency	workload size	continuity	ideal device
Realtime audio filtering	low	small	continuous	CPU/NPU (not GPU due to overhead of readback time)
Audio to text	ignorable	medium	continuous	Any
Realtime video processing	low	large	continuous	GPU (since likely to be displayed afterward anyway)
Offline video processing	ignorable	large	continuous	GPU/NPU (since large continuous workload)
Image generation	medium	large	multiple executions	GPU/NPU (much faster than CPU)
Image recognition	ignorable	small	single execution	Any (CPU is fast enough for simple recognition)
Code editor prediction	low	small	single execution	CPU since small workload and want to avoid pegging the GPU while typing

It's tempting to try to smash those all down into a concise set of enums, but I don't know if we can. With separate attributes, it reminds me conceptually of the font selection problem (family & weight & width & slope ...). 🤔

Update 2025-11-09; Issue created #902

anssiko · 2025-11-05T08:21:20Z

@fdwr, thank you, this is great input. I agree this is analogous to font selection.

Would you mind opening a new issue for this, seed it with your comment? I'd like to put this on our F2F agenda to capture feedback on use case-driven priorities.

mtavenrath · 2025-11-07T08:43:01Z

@fdwr I really like that list.

Still I am curious how to really determine the perfect device and latency requirements? What if someone wants to launch a low latency, high compute intensive audio network? How is the WebNN backend supposed to know which dGPU to use if there are multiple dGPUs? What if the dGPU is slower than the iGPU, but has just been added to add certain display features like full sync or 4/8/16 displays? What if the dGPU is used by a WebGPU rendering process which would affect everything? What if a website wants to run 2 large models at once on two different dGPU?

Do we want WebNN to be limited to the simple use cases only or do we want to allow the complex use cases as well? There are ISVs in the professional space who port their software to the web and might be interested to choose a specific device. For those use cases it'd be great if it's possible to do an explicit device selection.

This leaves the question, how to not make such an API available to generic fingerprinting? With the risk of slightly lowering usability, would it make sense to make this feature available upon user-request only like webcams, audio or the GPU position?

By default only the simple API is available. If a more complex web application requires the advanced API it could request access and make the user aware that he is exposing more information from his system.

Add a simple accelerator selection mechanism.

f9cb8b4

Signed-off-by: Zoltan Kis <[email protected]>

anssiko added the device selection label Oct 16, 2025

Add getters for context.accelerated and context.cpuFallbackActive. Ad…

e1d90de

…d the poll CPU fallback status steps. Invoke it from graph.dispatch(). Signed-off-by: Zoltan Kis <[email protected]>

anssiko assigned huningxin and RafaelCintron Oct 21, 2025

reillyeon approved these changes Oct 25, 2025

View reviewed changes

index.bs Outdated Show resolved Hide resolved

Update index.bs

093adaa

Co-authored-by: Reilly Grant <[email protected]>

huningxin reviewed Oct 26, 2025

View reviewed changes

Address review comments, remove check from dispatch() steps, modify t…

36f6902

…he steps checking CPU fallback Signed-off-by: Zoltan Kis <[email protected]>

anssiko mentioned this pull request Oct 28, 2025

Define "underlying execution device" concept #897

Open

handellm reviewed Oct 29, 2025

View reviewed changes

zolkis added 2 commits October 29, 2025 21:15

Remove cpuFallbackActive for later introduction

cb5213c

Signed-off-by: Zoltan Kis <[email protected]>

Move accelerated as input to the build() steps, remove from dispatch()

a73a443

Signed-off-by: Zoltan Kis <[email protected]>

zolkis force-pushed the simple-device-selection branch from 0af64f8 to a73a443 Compare October 29, 2025 20:29

reillyeon approved these changes Oct 29, 2025

View reviewed changes

huningxin approved these changes Oct 29, 2025

View reviewed changes

handellm approved these changes Oct 30, 2025

View reviewed changes

anssiko approved these changes Oct 30, 2025

View reviewed changes

anssiko mentioned this pull request Oct 30, 2025

Update explainer with new proposal for simple accelerator mapping #884

Merged

Clarify contradictory context option inputs

739123e

Signed-off-by: Zoltan Kis <[email protected]>

anssiko approved these changes Oct 30, 2025

View reviewed changes

anssiko merged commit 0ce9f32 into webmachinelearning:main Oct 31, 2025
2 checks passed

github-actions bot added a commit that referenced this pull request Oct 31, 2025

Add a simple accelerator selection mechanism. (#895)

b359446

SHA: 0ce9f32 Reason: push, by anssiko Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

anssiko mentioned this pull request Oct 31, 2025

CPU fallback hint #900

Open

anssiko mentioned this pull request Nov 5, 2025

WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) webmachinelearning/meetings#35

Open

fdwr mentioned this pull request Nov 9, 2025

Device selection criteria for usecase-driven scenarios #902

Open

Add a simple accelerator selection mechanism. #895

Add a simple accelerator selection mechanism. #895

Conversation

zolkis commented Oct 8, 2025 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anssiko commented Oct 21, 2025

Uh oh!

handellm commented Oct 21, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zolkis Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huningxin Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zolkis commented Oct 29, 2025

Uh oh!

huningxin left a comment

Choose a reason for hiding this comment

Uh oh!

anssiko left a comment

Choose a reason for hiding this comment

Uh oh!

anssiko commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fdwr commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anssiko commented Nov 5, 2025

Uh oh!

mtavenrath commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

zolkis commented Oct 8, 2025 •

edited by pr-preview bot

Loading

zolkis Oct 26, 2025 •

edited

Loading

huningxin Oct 29, 2025 •

edited

Loading

anssiko commented Oct 31, 2025 •

edited

Loading

fdwr commented Nov 5, 2025 •

edited

Loading