Skip to content

Latest commit

 

History

History
119 lines (95 loc) · 3.51 KB

File metadata and controls

119 lines (95 loc) · 3.51 KB

Transformer

Build Status License PyTorch HuggingFace Compatible Stars Downloads

A polished PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer. Designed for clarity, reproducibility, and interoperability with HuggingFace Transformers, this repository provides a robust baseline for Research and Engineering being Fully Configurable. The codebase emphasizes readable and well-documented components so you can iterate on Feed-Forward, Attention and Normalization blocks and other architectural variants with minimal friction.

Features

  • Fully Configurable architecture (layers, heads, model dimensions, dropout, etc.)
  • HuggingFace-compatible API alignment.
  • Compact and easily extensible design for rapid prototyping and research experiments.
  • Clear, well-documented modules to facilitate experimentation with attention, FFNs, etc.

Download the code

git clone --depth=1 https://github.com/lof310/transformer
cd transformer

Installation

# Install dependencies
pip install -r requirements.txt

# Install on developer mode (Recommended)
pip install -e .

# Install Normally
pip install .

Quick Start

import torch
import torch.nn as nn
import torch.nn.functional as F

from transformer import Transformer, TransformerConfig

# Configure the model
config = TransformerConfig(
    n_layers = 12,
    n_heads = 32,
    d_model = 1536,
    attn_qk_norm = False,     
    tied_weights = False,
    seq_len = 1024,
    max_seq_len = 4096,
)

# Initialize model
model = Transformer(config)

# Forward Pass
B, N = 16, 1024
input_ids = torch.randint(low=0, high=config.vocab_size, size(B, N))
output = model(input_ids, return_states=False)

Default Configuration

The default configuration implements the latest SOTA Transformer design.

from transformer import TransformerConfig

TransformerConfig(
    n_layers = 12,
    d_model = 1536,
    n_heads = 32,
    n_kv_heads = None, # QKA Disabled
    vocab_size = 50000,
    d_ff = None, # Choosen Automatically, ratio 8/3=2.666
    norm_design = "pre_norm",
    norm_class = "rms_norm",
    ffn_class = "SwiGLU",
    attn_class = "MHA",
    block_class = None, # transformer.TransformerBlock
    attn_bias = False,
    ffn_bias = True,
    lm_head_bias = False,
    attn_qk_norm = True,
    attn_dropout = 0.0,
    tied_weights = False,
    seq_len = 1024,
    pos_encoding = "RoPE",
    rope_base = 10000.0,
    max_seq_len = 4096
)

Documentation

Full Documentation available at This Page

Contributing

Contributions are welcome!

License

Distributed under the Apache License 2.0. See LICENSE for more information.

Citation

If you use transformer in your research, please cite:

@software{transformer2026,
  author = {Leinier Orama},
  title = {transformer: PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/lof310/transformer}
}