Add dequantize operator (int32 → bfloat16)

Description:

  Add a dequantize operator that converts int32 accumulator outputs to bfloat16 with per-group scale factor multiplication. This is needed to use the output of the INT8 GEMM operator (#93) in the rest of a model's forward pass.

  Motivation:

  The INT8 GEMM produces i32 accumulators, but the rest of the inference pipeline (RMSNorm, SiLU, RoPE, residual adds) operates in bf16. Without this conversion step, INT8 GEMM results can't feed back into the model, blocking end-to-end W8A8 quantized inference.

  The existing dequant operator (iron/operators/dequant/) handles int4→bf16 for GPTQ/AWQ-style weight loading, which is a different use case. This operator would handle the GEMM accumulator conversion path.

  Proposed behavior:
  - Input: int32 tensor + bfloat16 scale factors (one per group, precomputed as scale_activations * scale_weights)
  - Output: bfloat16 tensor
  - Operation: output_bf16 = int32_value * combined_scale
  - Group size configurable (default 32 or 128, matching quantization granularity)

  Implementation approach:
  - C++ kernel: similar structure to expand.cc, load i32 vector, convert to float, multiply by broadcast scale, cast to bf16, store
  - Python operator: custom MLIROperator subclass following the existing dequant/ pattern (mixed input/output dtypes)
  - Reuse the same ObjectFIFO data movement pattern from the existing dequant design

  Related:
  - #93 INT8 GEMM support (produces the i32 output this operator will consume)
  - Existing iron/operators/dequant/ int4→bf16, different use case but useful as implementation reference
  - Future: quantize operator (bf16→i8) to complete the W8A8 pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dequantize operator (int32 → bfloat16) #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add dequantize operator (int32 → bfloat16) #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions