Transformer

A polished PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer. Designed for clarity, reproducibility, and interoperability with HuggingFace Transformers, this repository provides a robust baseline for Research and Engineering being Fully Configurable. The codebase emphasizes readable and well-documented components so you can iterate on Feed-Forward, Attention and Normalization blocks and other architectural variants with minimal friction.

Features

Fully Configurable architecture (layers, heads, model dimensions, dropout, etc.)
HuggingFace-compatible API alignment.
Compact and easily extensible design for rapid prototyping and research experiments.
Clear, well-documented modules to facilitate experimentation with attention, FFNs, etc.

Download the code

git clone --depth=1 https://github.com/lof310/transformer
cd transformer

Installation

# Install dependencies
pip install -r requirements.txt

# Install on developer mode (Recommended)
pip install -e .

# Install Normally
pip install .

Quick Start

import torch
import torch.nn as nn
import torch.nn.functional as F

from transformer import Transformer, TransformerConfig

# Configure the model
config = TransformerConfig(
    n_layers = 12,
    n_heads = 32,
    d_model = 1536,
    attn_qk_norm = False,     
    tied_weights = False,
    seq_len = 1024,
    max_seq_len = 4096,
)

# Initialize model
model = Transformer(config)

# Forward Pass
B, N = 16, 1024
input_ids = torch.randint(low=0, high=config.vocab_size, size(B, N))
output = model(input_ids, return_states=False)

Default Configuration

The default configuration implements the latest SOTA Transformer design.

from transformer import TransformerConfig

TransformerConfig(
    n_layers = 12,
    d_model = 1536,
    n_heads = 32,
    n_kv_heads = None, # QKA Disabled
    vocab_size = 50000,
    d_ff = None, # Choosen Automatically, ratio 8/3=2.666
    norm_design = "pre_norm",
    norm_class = "rms_norm",
    ffn_class = "SwiGLU",
    attn_class = "MHA",
    block_class = None, # transformer.TransformerBlock
    attn_bias = False,
    ffn_bias = True,
    lm_head_bias = False,
    attn_qk_norm = True,
    attn_dropout = 0.0,
    tied_weights = False,
    seq_len = 1024,
    pos_encoding = "RoPE",
    rope_base = 10000.0,
    max_seq_len = 4096
)

Documentation

Full Documentation available at This Page

Contributing

Contributions are welcome!

License

Distributed under the Apache License 2.0. See LICENSE for more information.

Citation

If you use transformer in your research, please cite:

@software{transformer2026,
  author = {Leinier Orama},
  title = {transformer: PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/lof310/transformer}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer

Features

Download the code

Installation

Quick Start

Default Configuration

Documentation

Contributing

License

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Transformer

Features

Download the code

Installation

Quick Start

Default Configuration

Documentation

Contributing

License

Citation