Scaling Specialization in Dense LMs

Do dense transformers, without routers, develop sparse, modular structure that becomes more specialized as model size grows? We:

Measure activation sparsity (AS), feature Specialization Index (SI), and graph modularity (Q) across a consistent scaling suite.
Explain features via Sparse Autoencoders (SAEs) to reveal monosemantic circuits.
Exploit the structure using dynamic-k MLP execution for real FLOPs savings at fixed quality.

TL;DR — Specialization scales with size; you can cash it out for speed.

Quickstart

python -m venv .venv && source .venv/bin/activate
pip install -U pip
pip install -e .

# list MLP-ish layers in a checkpoint
python - <<'PY'
from sdlms.activations import list_layers
print(list_layers("EleutherAI/pythia-410m-deduped"))
PY

Minimal workflow

# 1) Capture activations on small probe tasks
python scripts/run_capture.py --model EleutherAI/pythia-410m-deduped --task-id ioi_minimal --layers model.layers.10.mlp

# 2) Train SAEs (separate tool) and export features

# 3) Compute metrics (AS, SI, Q)
python scripts/run_metrics.py

# 4) Dynamic-k eval (throughput vs perplexity)
python scripts/run_dynamick_eval.py --k 0.35

CLI quickstart

# install dependencies through uv (preferred)
uv sync --all-groups

# measure activation sparsity with a prompt (writes CSV to artifacts/sparsity)
uv run sparsity --model EleutherAI/pythia-70m-deduped --probe-manifest data/probe_tasks.jsonl --task-id toy_arithmetic

# launch a notebook to inspect results
uvx jupyter lab

Reproducibility

Deterministic seeds where possible
Configs + exact prompts for probe tasks
All figures generated from notebooks/

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
data		data
docs		docs
notebooks		notebooks
scripts		scripts
src/sdlms		src/sdlms
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT.md		PROJECT.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scaling Specialization in Dense LMs

Quickstart

Minimal workflow

CLI quickstart

Reproducibility

License

About

Uh oh!

Releases

Packages

Languages

License

wasim/scaling-specialization-dense-lms

Folders and files

Latest commit

History

Repository files navigation

Scaling Specialization in Dense LMs

Quickstart

Minimal workflow

CLI quickstart

Reproducibility

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages