Skip to content

Conversation

@morgolock
Copy link
Contributor

  • Updated convolution reference to branch epilogue:
    • TO=float: int32 to float dequant (acc * sA * sB + bias_f32)
    • TO!=float: usual quantize_down_scale_by_fixedpoint with int32 bias
  • Changed fixture to use F32 bias tensor for Q->F32 runs (instead of S32), matching arm_gemm dequant epilogue which only supports float bias.
  • Added explicit template instantiations for convolution_layer with TBias=float, TO=float to fix linker errors in validation.
  • Disabled activation in arm_gemm dequant path: offsets are applied afterwards by CpuGemmLowpOffsetContributionKernel, so activation must run there to see the correct final accumulator.

This aligns target and reference for quantized to F32 convolution tests and prevents premature clamping before offset contributions.

Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b

@morgolock morgolock requested a review from gunes-arm September 19, 2025 10:56
@morgolock morgolock force-pushed the pr/conv_f32_dequant branch 4 times, most recently from 02cc3e6 to 4c780b3 Compare October 7, 2025 08:45
@morgolock morgolock force-pushed the pr/conv_f32_dequant branch 3 times, most recently from 6a8b64a to 88c1594 Compare October 7, 2025 17:22
@morgolock morgolock force-pushed the pr/conv_f32_dequant branch 3 times, most recently from 780b190 to 950a00b Compare October 13, 2025 21:26
- Updated convolution reference to branch epilogue:
  * TO=float: int32 to float dequant (acc * sA * sB + bias_f32)
  * TO!=float: usual quantize_down_scale_by_fixedpoint with int32 bias
- Changed fixture to use F32 bias tensor for Q->F32 runs (instead of S32),
  matching arm_gemm dequant epilogue which only supports float bias.
- Added explicit template instantiations for convolution_layer with
  TBias=float, TO=float to fix linker errors in validation.
- Disabled activation in arm_gemm dequant path:
  offsets are applied afterwards by CpuGemmLowpOffsetContributionKernel,
  so activation must run there to see the correct final accumulator.
- src/cpu/kernels/gemmlowp/generic/neon/impl.h
    neon_run_offset_contribution_float(): replace per-batch offset
    for vector_sum_col from Y stride to W stride.

This aligns target and reference for quantized to F32 convolution tests
and prevents premature clamping before offset contributions.

Change-Id: I6fffc98dc0798542a2702e6a593b850c16561e3b
Signed-off-by: Pablo Marquez Tello <[email protected]>
@morgolock morgolock force-pushed the pr/conv_f32_dequant branch from 950a00b to b8824c1 Compare October 16, 2025 09:07
@morgolock morgolock merged commit a977868 into main Oct 16, 2025
2 checks passed
@morgolock morgolock deleted the pr/conv_f32_dequant branch October 16, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants