Skip to content

Add dequantize operator (int32 → bfloat16) #95

@albiol2004

Description

@albiol2004

Description:

Add a dequantize operator that converts int32 accumulator outputs to bfloat16 with per-group scale factor multiplication. This is needed to use the output of the INT8 GEMM operator (#93) in the rest of a model's forward pass.

Motivation:

The INT8 GEMM produces i32 accumulators, but the rest of the inference pipeline (RMSNorm, SiLU, RoPE, residual adds) operates in bf16. Without this conversion step, INT8 GEMM results can't feed back into the model, blocking end-to-end W8A8 quantized inference.

The existing dequant operator (iron/operators/dequant/) handles int4→bf16 for GPTQ/AWQ-style weight loading, which is a different use case. This operator would handle the GEMM accumulator conversion path.

Proposed behavior:

  • Input: int32 tensor + bfloat16 scale factors (one per group, precomputed as scale_activations * scale_weights)
  • Output: bfloat16 tensor
  • Operation: output_bf16 = int32_value * combined_scale
  • Group size configurable (default 32 or 128, matching quantization granularity)

Implementation approach:

  • C++ kernel: similar structure to expand.cc, load i32 vector, convert to float, multiply by broadcast scale, cast to bf16, store
  • Python operator: custom MLIROperator subclass following the existing dequant/ pattern (mixed input/output dtypes)
  • Reuse the same ObjectFIFO data movement pattern from the existing dequant design

Related:

  • INT8 GEMM support #93 INT8 GEMM support (produces the i32 output this operator will consume)
  • Existing iron/operators/dequant/ int4→bf16, different use case but useful as implementation reference
  • Future: quantize operator (bf16→i8) to complete the W8A8 pipeline

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions