Skip to content

Conversation

chethanuk
Copy link

Add ModelConfig.gemma3_270m() classmethod with architecture specs:

  • 18 layers, 640 embed_dim, 2048 hidden_dim
  • 4 heads (256 head_dim), 1 KV head (GQA)
  • 512 sliding window, 262144 vocab size
  • Add checkpoint path constants (GEMMA3_270M_PT, GEMMA3_270M_IT)

Resolves #500

image

It's a good idea to open an issue first for discussion.

Reference

Colab Notebook

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and all unit tests pass.
  • I have added all appropriate doc-strings/documentation.
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have signed the Contributor License Agreement.
  • I have followed Contribution Guidelines.

- Add ModelConfig.gemma3_270m() classmethod with architecture specs:
  * 18 layers, 640 embed_dim, 2048 hidden_dim
  * 4 heads (256 head_dim), 1 KV head (GQA)
  * 512 sliding window, 262144 vocab size
- Add checkpoint path constants (GEMMA3_270M_PT, GEMMA3_270M_IT)
- Add test coverage for gemma3-270m model routing
- Verified against official HuggingFace config

Resolves google#500
Copy link
Collaborator

@tianshub tianshub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the support! Btw, you don't need to manually merge from github, it will be automatically merged once the internal test passed.

Copy link
Contributor

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chethanuk - do you have a notebook comparing outputs with the reference model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can you add gemma3_model_lib.ModelConfig for Gemma3-270m models?
3 participants