Skip to content

Conversation

@bryce13950
Copy link
Collaborator

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

bryce13950 and others added 8 commits September 21, 2025 17:57
* created individual processing functions

* extracted state dict and inserted back into instance after processing

* created weight processing shared class

* added test coverage for new functions

* updated hooked transformer to use new shared functions

* created test

* moved over weight processing

* replaced keys

* used the correct function

* created test for making sure path translation works correctly

* fixed weight processing

* added additional tests

* formatted tests a bit

* cleaned up

* fixed unit test

* fixed indentation

* fixed doc string

* fixed unit test

* fixed type

* fixed some tests

* fixed test

* fixed setup of tests
* created individual processing functions

* extracted state dict and inserted back into instance after processing

* created weight processing shared class

* added test coverage for new functions

* updated hooked transformer to use new shared functions

* created test

* moved over weight processing

* replaced keys

* used the correct function

* created test for making sure path translation works correctly

* fixed weight processing

* added additional tests

* formatted tests a bit

* cleaned up

* fixed unit test

* fixed indentation

* fixed doc string

* fixed unit test

* fixed type

* fixed some tests

* fixed test

* fixed setup of tests

* cleaned up test

* started working through individual matches

* added test coverage

* tested function a bit

* integrated weight conversion into weight proccessing

* simplified functions

* identified individual problem lines

* identified divergences more clearly

* brought back error lines
* created individual processing functions

* extracted state dict and inserted back into instance after processing

* created weight processing shared class

* added test coverage for new functions

* updated hooked transformer to use new shared functions

* created test

* moved over weight processing

* replaced keys

* used the correct function

* created test for making sure path translation works correctly

* fixed weight processing

* added additional tests

* formatted tests a bit

* cleaned up

* fixed unit test

* fixed indentation

* fixed doc string

* fixed unit test

* fixed type

* fixed some tests

* fixed test

* fixed setup of tests

* cleaned up test

* started working through individual matches

* added test coverage

* tested function a bit

* integrated weight conversion into weight proccessing

* simplified functions

* identified individual problem lines

* identified divergences more clearly

* brought back error lines
* imporoved accuracy a bit

* got models to match

* removed forward pass stuff

* cleaned up weight processing a bit

* removed working attention

* restored files
* imporoved accuracy a bit

* got models to match

* removed forward pass stuff

* cleaned up weight processing a bit

* removed working attention

* restored files

* created loop to verify weight conversion

* finished compatibility layer

* finished testing hugging face weights

* setup correct init

* added some tests

* removed seperate component

* fixed some integration tests
@bryce13950 bryce13950 changed the title Dev 3.x folding Transformer bridge layer norm folding Sep 27, 2025
bryce13950 and others added 21 commits September 29, 2025 22:33
* imporoved accuracy a bit

* got models to match

* removed forward pass stuff

* cleaned up weight processing a bit

* removed working attention

* restored files

* created loop to verify weight conversion

* finished compatibility layer

* finished testing hugging face weights

* setup correct init

* added some tests

* removed seperate component

* fixed some integration tests

* fixed typing issue

* fixed typing and format issues

* fixed ci issues

* ran format

* fixed mypy issues

* removed extra file

* removed old scripts

* tested format

* fixed some tests

* ran format

* fixed tests

* fixed acceptance tests

* fixed some more tests

* synced functionality completely

* reduced old references

* removed remaining references

* moved forward functions

* removed forward

* tested various forwards

* worked on getting original forwards back into place

* added more coverage

* cleaned up model

* git status

* Fix automatic weight extraction to use reference HookedTransformer

This restores the working weight extraction mechanism that creates a reference
HookedTransformer internally and extracts exact processed weights for perfect
compatibility with ablation studies.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* moved embed stuff from bridge

* moved MLP stuff

* claned up a bit

* cleaned up a bit

* removed extra block

* created pos embed bridge

* fixed unembed

---------

Co-authored-by: Claude <[email protected]>
* moved final layer norm

* moved layer norm forward

* cleaned up more things

* updated attention weight loading

* fixed function names
* fixed some ci issues

* fixed type issues

* ran format

* fixed test

* fixed type issues

* fixed type issue

* fixed type issue

* fixed test

* fixed test

* fixed issues

* ran format

* fixed typing

* fixed tests

* fixed tests

* simplified test

* sped up tests

* added check for kv cache

* ran format

* skipped some tests

* marked a couple tests to skip

* ran some more optimizations

* ran poetry lock

* regenerated lock

* fixed commands

* set random seed

* updated parallelism prop

* updated command

* reverted some changes

* updated notebook settings

* updated verbosity

* removed extra test

* cleaned up tests some more

* marked test as skipped

* fixed more tests

* sped up CI

* reverted CI changes

* reverted actions changes

* improved cache

* sped up some tests

* optimzed more tests

* sped up some more tests

* made more speed improvements

* fixed error

* fixed typing
* cleaned up some debug points

* fixed attention hooks

* enabled hooks in test
* split out some tasks into their own jobs

* removed bad file

* updated name
* fixed batch dimension

* removed log point

* fixed potential error

* sped up load

* ran format

* improved hf cache handling

* fixed bridge

* fixed cache again

* added more checks

* removed parallel execution
* fixed cache hooks

* fixed test and typing

* fixed test
* fixed bias displaying

* fixed ablation issue

* fixed type issue
* setup new hooks properly

* fixed type checks
* fixed alias hook props

* ran format
* made all hooks show properly

* ran format

* fixed type checks
* updated loading in main demo to use transformers bridge

* updated model name

* updated imports

* updated some cells

* reran demo

* updated some cells

* reran some cells

* reran demo

* ran demo again

* finished generating new cells
* Update README.md (#957)

Update link to Streamlit tutorial and guide.

Co-authored-by: Bryce Meyer <[email protected]>

* improve model properties table in docs (#769)

* add static to gitignore

* making a meaningless change to see if tests pass at all

* making a meaningless change to see if tests pass at all

* add interactive table static html only

adding things one at a time to see what causes things to break

* run poetry update with no changes to deps

* revert lockfile change

* add tiktoken >=0.7.0 to group docs

* add dep muutils >=0.6.15 to group docs

* add improved interactive table generation

we still generate a plain markdown table

code is from the old PR: https://github.com/mivanit/TransformerLens/blob/add-better-model-properties-table/docs/make_docs.py
which is in turn a modified version of https://github.com/mivanit/transformerlens-model-table

* fix format -- missing trailing newline

* fix type hints for compatibility

* fix torch device meta in make docs script, also improved hot reload

* TEMPORARY: allow_except when getting models to deal with mixtral HF_TOKEN issue

* added simple test for get_model_info

* context manager for controlling device, tests were breaking due to default device meta

* formatted with wrong version of black, oops

* fix path to generated model_properties_table

* fix md table header, add title in yaml frontmatter

* add line to frontmatter yaml, re-run tests bc huggingface down?

* do not allow exceptions when getting models

* re-run poetry lock

* attempt fix lockfile

* re-run poetry lock

---------

Co-authored-by: Bryce Meyer <[email protected]>

* switch pyproject from toml to uv, generate lockfile

also update tiktoken dep for 3.13 compatibility

* update makefile to use uv

* update actions

* hack to get version to work

* wip

* make dep

* update contributing.md to reflect switch from poetry to uv

* add type hints to supported_models

* fix paths in make_docs.py

* docs group not in default, update install instructions for docs

* POETRY_PYPI_TOKEN_PYPI -> PYPI_TOKEN_PYPI

* make format

* fix default groups, re-add docs

* add some deps needed in notebooks

* removed use of torchtyping in othello_GPT.ipynb and deps

- torchtyping causes various issues if it's imported
- presumably jaxtyping should be used instead??
- othello GPT notebook doesn't actually use the imported TT
  - shouldn't a linter/formatter catch this sort of unused import?

* fix: add pythonpath "." to pytest config for test imports

Configure pytest to include project root in Python path, enabling
`from tests.foo import bar`
style imports, which were broken by switching to uv

* attempt jupyter issue fix

* issue ref explaining ipython version restriction

* updated ci commands after recent work

* fixed more setup items

* added tabulate dependency

* updated make docs command

* updated dependencies

* fixed docs

---------

Co-authored-by: jmole <[email protected]>
Co-authored-by: Bryce Meyer <[email protected]>
* setup tests for hooks

* ran format

* merged legacy hooks tests

* ran format

* enabled compatibility mode

* added remaining hooks

* fixed type issue

* added main demo cached output

* removed debug items

* reran notebook

* marked cell for skipping

* reran notebook

* regenerated demo

* regenerated notebook
* updated loading in arena content demo to use transformer bridge

* updated install reference

* removed extra params

* ran some cells

* updated arena notebook

---------

Co-authored-by: Bryce Meyer <[email protected]>
* regeneerated with new hooks

* ran first cell
* added test coverage for ensuring compatibility

* ran format

* fixed unit tests

* resolved type issue

* added init files

* added init file

* fixed tokaize function

* fixed attention mask issues

* reverted invalid change to test
bryce13950 and others added 29 commits November 6, 2025 04:02
* finalized bench mark logic

* ran format
* improved various models

* improved llama

* fixed benchmark utils now

* fixed test

* fixed opt adapter

* fixed phi-3

* fixed format

* fixed type issues

* added line break

* fixed phi-3
* improved benchmarking tools

* added individual benchmarks

* ran format

* ran format

* revised gpt2 again
* resolved experts mapping issues

* ran format

* Fix GPT-OSS JointGateUpMLPBridge hook alias resolution and add tests

This commit addresses hook alias resolution issues for GPT-OSS MoE models
and adds comprehensive unit tests.

Changes:
1. Fixed JointGateUpMLPBridge hook_aliases to use gate.hook_out instead of
   in.hook_out/input.hook_out, which don't exist in this bridge type
2. Added 7 comprehensive unit tests in test_gpt_oss_moe.py that verify:
   - Model loads without downloading weights (using meta device)
   - Bridge creation works correctly
   - MLP uses JointGateUpMLPBridge (not regular MLPBridge)
   - Compatibility mode hooks are accessible
   - Experts structure is correct (batched tensors, not iterable modules)
   - Hook aliases resolve correctly
   - No incorrect BlockBridge wrapper around experts

Root cause:
- JointGateUpMLPBridge inherits from MLPBridge which has hook_aliases
  expecting in.hook_out or input.hook_out submodules
- JointGateUpMLPBridge creates gate and up submodules instead, causing
  AttributeError when resolving aliases
- Solution: Override hook_aliases at class level to use gate.hook_out

Testing:
All 7 tests pass, verifying GPT-OSS loads correctly and hooks work in
compatibility mode without downloading the full 20B parameter model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* added model name to avilable models

* added eps config

* updated to rotary bridge

* passed through to parent

* fixed tuple passing

* changed oss bridge

* fixed gpt oss activations

* ran format

* removed colab compat from checks

* decoupling weight processing completely from hooked transformer

* fixed weight processing issues

* fixed test

* fixed tests

* fixed forward tests

* updated test

* fixed last test

* fixed format

* ran format

* added whitespace

* finished weight processing generalization

* fixed type check

* fixed test

* fixed format

* made checks continue after first failure

* fixed test g

---------

Co-authored-by: Claude <[email protected]>
* fixed tensor storing

* fixed type issue
* disabled patched forwards

* fixed mlp linear for gpt2 for simpler implementation

* rearranged some forward pass items

* fixed type issue and updated demo run sequence

* reran arena notebook

* reran main demo

* regenerated demo

* fixed mid hook issue

* regenerated demo

* fixed test

* regenerated demo
* disabled patched forwards

* fixed mlp linear for gpt2 for simpler implementation

* rearranged some forward pass items

* got gemma 3 a lot closer

* updated gemma 3

* got gemma 3 to pass perfectly

* finished gemma 3 match

* fixed type issue and updated demo run sequence

* reran arena notebook

* reran main demo

* regenerated demo

* fixed mid hook issue

* regenerated demo

* fixed test

* regenerated demo

* fixed type checks

* fixed check

* fixed format

* fixed test

* fixed prescision

* updated test

* updated window
* setup real aliases

* fixed format

* set submodules as props

* fixed submodule setting

* cut out the last bit of extras

* fixed type check

* removed startup alias functiong

* fixed test
* disabled patched forwards

* fixed mlp linear for gpt2 for simpler implementation

* rearranged some forward pass items

* got gemma 3 a lot closer

* updated gemma 3

* got gemma 3 to pass perfectly

* finished gemma 3 match

* fixed type issue and updated demo run sequence

* reran arena notebook

* reran main demo

* regenerated demo

* fixed mid hook issue

* regenerated demo

* matched os

* fixed test

* regenerated demo

* fixed type checks

* fixed check

* fixed format

* fixed test

* fixed prescision

* updated test

* updated window

* ran format

* fixed type check

* fixed test

* updated weight processing to extratct properly

* started moving around weight processing

* fixed v bias folding

* updated weigth loading

* debugged joint qkv a bit further

* fixed gemam 3 init

* fixed hook_z

* fixed mypy

* ran format

* removed extra markdown

* updated key checking to use unified structure

* matched shapes in hooks

* got gpt2 to pass completely again

* ran format

* fixed some shapes

* fixed typing

* updated shapes

* fixed tests

* fixed test

* fixed kv cache issue

* made weigth processign more generic

* fixed tst issue

* removed oss

* updated test

* fixed test

* got tests to run again

* cleaned up component

* fixed test

* fixed tests and regenerated arena notebook

* regenerated demo
* trimmed memory a bit

* fixed type checks
…1123)

* created benchmark suite for unsupported models in hooked transformer

* fixed function registartion

* Add hook_structure.py module for cross-model validation

This module was missing from the repository, causing CI failures.
It provides structure-only validation of hooks that can work across
different model architectures.

Key functions:
- validate_hook_shape_compatibility() - Cross-model shape validation
- benchmark_forward_hooks_structure() - Forward hook structure tests
- benchmark_backward_hooks_structure() - Backward hook structure tests
- benchmark_activation_cache_structure() - Cache structure tests

These enable validation of models without HookedTransformer references
by comparing their hook structure against GPT-2.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fixed memory issues

* ran format

* reran main demo

---------

Co-authored-by: Claude <[email protected]>
* fixed remaining gemma 3 benchmarks

* fixed type check
* got a bunch of models closer

* fixed trust remote issue

* fixed phi after merge

* fixed bos token

* fixed arena issue

* fixed typing

* fixed gqa

* fixed issue

* ran format

* fixed benchmark util and demo

* fixed some hooks

* added bos support

* fixed gemma again

* fixed llama

* skipped t5 in ci

* fixed qwen loading

* removed phi-2 from ci

* fixed type check

* fixed type issues

* fixed some more attention things

* fixed no processing check

* fixed type and ran format

* reverted generalized components changes

* restored tests

* reverted more changes

* reverted transformers chagnes

* skipped test

* fixed skip if check
* setup brenchmark suite, and trimmed out extra tests

* ran format
* fixed remaining gemma 3 benchmarks

* fixed type check

* fixed adapter

* cleaned up attention component

* cleaned up processed weight setting

* ran format

* fixed type issues
* made cross model comparisons more explicit

* finished updating benchmarks for cross model

* finished testing new benchmarks

* fixed typing

* synced attention params
* made cross model comparisons more explicit

* finished updating benchmarks for cross model

* finished testing new benchmarks

* fixed typing

* synced attention params

* wrapped up initial oss work

* skipped irrelevant tests for moe

* ran format
* removed legacy private functions

* cleaned up adapter

* expanded benchmarks

* got gemma 2 closer

* added processing step to architecture adapter

* setup rotary to sub component properly

* cleaned up block

* added recursive test of models

* got gemma 2 closer

* removed extras

* ran format

* restored files

* removed some dead code

* fixed test

* fixed typing

* fixed test

* fixed more tests

* removed more dead code

* added xet to cache

* fixed potential recursion issue

* reverted bad change

* bumped cache

* added verbose output

* Add --tb=short to coverage-report-test for better error visibility

* added more verbose output

* configuired output to display errors right away

* skipped problematic tests

* removed extra flags

* skipped hooked encoder decoder

* skipped failing test
* finished matchign 2 & 3

* cleaned up components

* cleaned up weight processing

* fixed embed key mapping

* cleaned up key processing

* removed extra functions

* cleaned up a bit more

* cleaned up bridge a bit

* generalized weight processing a bit more

* allowed attention to be passed through

* restored gemma 3

* fixed gpt 2 again

* ran format

* cleaned up a couple thigns

* refactored weight processing to not use split weights

* fixed granular tests

* added more granular tests

* removed invalid import

* cleaned up a couple things

* removed extra function call

* moved weight integration

* cleaned up a bit more

* cleaned up some more

* added verbose mode

* ran format

* fixed doc string and mypy issues

* fixed tets

* fixed more tests

* fixed issue

* added function calls again

* updated format

* restored last ln step

* moved around stop at layer

* rested some tests
* updated joint component

* cleaned up a bit

* added working bridge

* removed some detection

* fixed state dict filtering

* removed extra function

* cleaned up a bit

* updated process to use main keys

* Revert "updated process to use main keys"

This reverts commit cacf71a.

* cleaned up some things

* removed extra junk

* tested more models

* fixed gemma 3 ln issue

* refactored conversions

* integrated conversion throughout

* configured gpt2 some more

* cleaned up conversion and reversions

* removed extra param

* untied embed from state dict directly

* broke tying

* validated ship

* cleaned up components

* cleaned up a bit

* resovled some gpt2 items

* fixed unembed bias

* fixed final check

* ran format

* fixed gemma 3

* cleaned up extra stuff

* cleaned up linear bridge a bit

* fixed keys

* fixed unit tests

* fixed tests

* removed extra function

* moved function

* regenerated demo

* ran format

* regenerated demo
* removed merge file

* cleaned up imports

* cleaned up some adapters

* fixed up some adapters

* added ability to use parent paths when mapping components

* cleaned up gpt

* cleaned up more architecture adapters

* ran format

* fixed type issues
* Removed all attributes of  which directly mapped keys. These attributes are now handled by the component mapping Bridge classes

* Formatting update

* Removed additional missed key
* Removed all attributes of  which directly mapped keys. These attributes are now handled by the component mapping Bridge classes

* Remove source keys where they have been made redundant by the bridges

* Formatting update

* Remove source keys where they have been made redundant by the bridges

* created qwen 3 adapter

---------

Co-authored-by: jlarson <[email protected]>
* Removed all attributes of  which directly mapped keys. These attributes are now handled by the component mapping Bridge classes

* Remove source keys where they have been made redundant by the bridges

* Formatting update

* Remove source keys where they have been made redundant by the bridges

---------

Co-authored-by: Bryce Meyer <[email protected]>
* cleaned up a lot of things

* removed extra function

* fixed typing

* fixed index bug

* removed extra stuff

* fixed main demo

* removed bad chunk

* removed attention check

* fixed cache

* fixed type check

* fixed demo issue

* fixed test

* fixed typing

* updated type

* restored patched function

* fixed gemma 3

* fixed test

* fixed typing

* continued working through gemma compat

* fixed more issues

* got closer

* ran format

* fixed extra config

* grouped results by phase

* cleaned up adapter

* fixed weight processing issue

* reevised hooks

* fixed phase 2

* set flags correctly

* revised benchmarks for granularity

* improved gemma compatibility

* claned up memory

* revised architecture adapters

* cleaned up memory

* improved some models

* used correct component

* verified more architectures

* fixed more models

* ran format

* fixed typing

* fixed typing

* fixed test

* fixed t5

* fixed format

* fixed test
@bryce13950 bryce13950 merged commit e4d9784 into dev-3.x Dec 7, 2025
25 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants