Skip to content

Conversation

@chraac
Copy link
Contributor

@chraac chraac commented Nov 28, 2025

Changes

Fix rope op implementation, after these changes, the impl correctly handles the operation and passes all rope tests.

Before

[ROPE] NMSE = 1.881769338 > 0.000000100   ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0,inplace=0): �[1;31mFAIL�[0m
[ROPE] NMSE = 1.916275620 > 0.000000100   ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0,inplace=0): �[1;31mFAIL�[0m
[ROPE] NMSE = 1.883398782 > 0.000000100   ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0,inplace=0): �[1;31mFAIL�[0m
[ROPE] NMSE = 1.873313624 > 0.000000100   ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0,inplace=0): �[1;31mFAIL�[0m
[ROPE] NMSE = 1.924327319 > 0.000000100   ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0,inplace=1): �[1;31mFAIL�[0m
[ROPE] NMSE = 1.974410582 > 0.000000100   ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0,inplace=1): �[1;31mFAIL�[0m

After

  ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0,inplace=0): �[1;32mOK�[0m
  ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0,inplace=0): �[1;32mOK�[0m
  ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0,inplace=0): �[1;32mOK�[0m
  ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0,inplace=0): �[1;32mOK�[0m
  ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0,inplace=1): �[1;32mOK�[0m
  ROPE(type=f32,ne_a=[128,32,2,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0,inplace=1): �[1;32mOK�[0m

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 28, 2025
}

// TODO: use simd to speed up the remaining elements copy
memcpy(dst_data_loc, src_loc, (ne0 - rope_ctx->n_dims) * sizeof(float));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: did we have simd acceleration in the memcpy?

const uint64_t name##_end_cycles = HAP_perf_get_qtimer_count(); \
FARF(HIGH, __VA_ARGS__, (unsigned) HAP_perf_qtimer_count_to_us(name##_end_cycles - name##_start_cycles)); \
} while (0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro provides a convenient way to handle profiling logs within the NPU. It allows profiling logs to be easily disabled via a compiler flag.

}
if (ir > ir1) {
break;
}
Copy link
Contributor Author

@chraac chraac Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two inner if statements can be merged into the for loop's condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant