Update documentation of input_embedding_scale parameter in TransformerDecoderV1Config by NeoLegends · Pull Request #91 · rwth-i6/i6_models

NeoLegends · 2025-12-22T09:55:42Z

Mohammad did not use the scale in a pure LM setup.

albertz · 2025-12-22T11:59:36Z

I think in a pure LM setup you don't use the scale.

I don't think this is true in general. I have seen both variants. Also for LMs. For example, Gemma3:

self.embed_tokens = Gemma3TextScaledWordEmbedding(
    config.vocab_size, config.hidden_size, self.padding_idx, embed_scale=self.config.hidden_size**0.5
)

Note, there are some other things to consider:

If you don't apply the scale in forward, what people do instead then is to apply the scale during init, or make the random init very large. E.g. nanochat:

elif isinstance(module, nn.Embedding):
    torch.nn.init.normal_(module.weight, mean=0.0, std=1.0)

I also saw that some people use custom (much larger) LRs for embeddings, which again might compensate the fact of not using a scale. E.g. see nanochat.

If you share the embedding weights with the LM head, this might affect whether you want such a scale or not (I'm not sure in what way, though...). Most LMs do this.

albertz · 2026-01-20T12:47:57Z

        input_dropout: Dropout applied to the input embedding.
        input_embedding_scale: Scale applied to the input embedding.
-            Set to `None` to apply a (tuned) default.
+            Set to `None` to apply a default that is suitable for ASR AED decoder models.


I would not mention any specific model at all here. I think this is just confusing. I would instead just say what default you use.

Transformer Decoder: extend docs on input embedding scale

c8a0d33

NeoLegends requested review from curufinwe and mmz33 December 22, 2025 09:55

NeoLegends self-assigned this Dec 22, 2025

michelwi approved these changes Dec 22, 2025

View reviewed changes

Remove LM mention

0d38e01

albertz reviewed Jan 20, 2026

View reviewed changes

describe scale

0981626

albertz approved these changes Jan 20, 2026

View reviewed changes

curufinwe changed the title ~~Transformer Decoder: extend docs on input embedding scale~~ Update documentation of input_embedding_scale parameter in TransformerDecoderV1Config Apr 17, 2026

curufinwe merged commit f347906 into main Apr 17, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update documentation of input_embedding_scale parameter in TransformerDecoderV1Config#91

Update documentation of input_embedding_scale parameter in TransformerDecoderV1Config#91
curufinwe merged 3 commits intomainfrom
moritz-trafo-docs-scale

NeoLegends commented Dec 22, 2025 •

edited

Loading

Uh oh!

albertz commented Dec 22, 2025

Uh oh!

albertz Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

NeoLegends commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertz commented Dec 22, 2025

Uh oh!

albertz Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NeoLegends commented Dec 22, 2025 •

edited

Loading