Skip to content

Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.

Notifications You must be signed in to change notification settings

OpenMOSS/Language-Model-SAEs

Repository files navigation

Language-Model-SAEs

Important

Currently the examples are outdated and some parallelism strategies are not working due to lack of bandwidth. We are working on better organizing recent updates and will make everything work ASAP.

Language-Model-SAEs is a comprehensive, fully-distributed framework designed for training, analyzing and visualizing Sparse Autoencoders (SAEs), empowering scalable and systematic Mechanistic Interpretability research.

News

Features

  • Scalability: Our framework is fully distributed with arbitrary combinations of data, model, and head parallelism for both training and analysis. Enjoy training SAEs with millions of features!
  • Flexibility: We support a wide range of SAE variants, including vanilla SAEs, Lorsa (Low-rank Sparse Attention), CLT (Cross-layer Transcoder), MoLT (Mixture of Linear Transforms), CrossCoder, and more. Each variant can be combined with different activation functions (e.g., ReLU, JumpReLU, TopK, BatchTopK) and sparsity penalties (e.g., L1, Tanh).
  • Easy to Use: We provide high-level runners APIs to quickly launch experiments with simple configurations. Check our examples for verified hyperparameters.
  • Visualization: We provide a unified web interface to visualize learned SAE variants and their features.

Installation

We use uv to manage the dependencies, which is an alternative to poetry or pdm. To install the required packages, just install uv, and run the following command:

uv sync --extra default

This will install all the required packages for the codebase in .venv directory. For Ascend NPU support, run

uv sync --extra npu

A forked version of TransformerLens is also included in the dependencies to provide the necessary tools for analyzing features.

If you want to use the visualization tools, you also need to install the required packages for the frontend, which uses bun for dependency management. Follow the instructions on the website to install it, and then run the following command:

cd ui
bun install

bun is not well-supported on Windows, so you may need to use WSL or other Linux-based solutions to run the frontend, or consider using a different package manager, such as pnpm or yarn.

Launch an Experiment

The guidelines and examples for launching experiments are generally outdated. At this moment, you may explore src/lm_saes/runners folder for the interface for generating activations and training & analyzing SAE variants. For analyzing SAEs, a MongoDB instance is required. More instructions will be provided in near future.

Visualizing the Learned Dictionary

The analysis results will be saved using MongoDB, and you can use the provided visualization tools to visualize the learned dictionary. First, start the FastAPI server by running the following command:

uvicorn server.app:app --port 24577 --env-file server/.env

Then, copy the ui/.env.example file to ui/.env and modify the VITE_BACKEND_URL to fit your server settings (by default, it's http://localhost:24577), and start the frontend by running the following command:

cd ui
bun dev --port 24576

That's it! You can now go to http://localhost:24576 to visualize the learned dictionary and its features.

Development

We highly welcome contributions to this project. If you have any questions or suggestions, feel free to open an issue or a pull request. We are looking forward to hearing from you!

TODO: Add development guidelines

Acknowledgement

The design of the pipeline (including the configuration and some training details) is highly inspired by the mats_sae_training project (now known as SAELens) and heavily relies on the TransformerLens library. We thank the authors for their great work.

Citation

Please cite this library as:

@misc{Ge2024OpenMossSAEs,
    title  = {OpenMoss Language Model Sparse Autoencoders},
    author = {Xuyang Ge, Wentao Shu, Junxuan Wang, Guancheng Zhou, Jiaxing Wu, Fukang Zhu, Lingjie Chen, Zhengfu He},
    url    = {https://github.com/OpenMOSS/Language-Model-SAEs},
    year   = {2024}
}

About

Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published