Skip to content

Conversation

@RyanMetcalfeInt8
Copy link

@RyanMetcalfeInt8 RyanMetcalfeInt8 commented Oct 30, 2025

Description

For stateful NPUW flow, remove the WA where SinCos is offloaded to CPU when MAX_PROMPT_LEN + MIN_RESPONSE_LEN >= 2048

Motivation and Context

This is no longer needed, as there have been accuracy fixes in NPU plugin / NPU drivers.

ref. CVS-173936

@RyanMetcalfeInt8 RyanMetcalfeInt8 marked this pull request as draft October 30, 2025 13:29
@RyanMetcalfeInt8 RyanMetcalfeInt8 marked this pull request as ready for review October 30, 2025 21:23
@RyanMetcalfeInt8 RyanMetcalfeInt8 marked this pull request as draft October 30, 2025 22:26
@RyanMetcalfeInt8 RyanMetcalfeInt8 marked this pull request as ready for review November 7, 2025 22:07
@MayureshV1 MayureshV1 requested a review from Copilot November 7, 2025 22:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes a workaround in the NPUW (NPU Workload) stateful flow that was offloading SinCos operations to CPU when the context length exceeded 2048 tokens. The workaround is no longer necessary due to accuracy fixes in the NPU plugin and drivers.

Key Changes:

  • Removed conditional logic that configured CPU fallback for SinCos operations in contexts >= 2048 tokens
  • Simplified the UpdateNPUConfig function by eliminating the threshold check and associated configuration overrides

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MayureshV1 MayureshV1 requested a review from ankitm3k November 7, 2025 22:16
@MayureshV1
Copy link

Changes look harmless in terms of functionality but need compatibility assessment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants