Skip to content

Attention and control#4

Open
AI-ELka wants to merge 1 commit into
mainfrom
Attention_ctrl_ab
Open

Attention and control#4
AI-ELka wants to merge 1 commit into
mainfrom
Attention_ctrl_ab

Conversation

@AI-ELka

@AI-ELka AI-ELka commented Sep 28, 2025

Copy link
Copy Markdown
Collaborator

No description provided.

@AI-ELka AI-ELka requested a review from orichardson September 28, 2025 20:19
@josephdviviano

Copy link
Copy Markdown
Collaborator

can you resolve merge conflicts?

@AI-ELka

AI-ELka commented Oct 1, 2025

Copy link
Copy Markdown
Collaborator Author

can you resolve merge conflicts?

I noticed that Mehran's attention code was merged into main (that's where the conflict came from), so I guess I can keep this attention implementation just in case we need it. I was coded it because I wasn't able to solve a problem in the other attention implementation.

@orichardson

Copy link
Copy Markdown
Owner

@AI-ELka Can you help us determine whether or not that problem persists in the implementation that's currently on the main branch? I've forgotten the details of the issue.

@AI-ELka

AI-ELka commented Oct 2, 2025

Copy link
Copy Markdown
Collaborator Author

@AI-ELka Can you help us determine whether or not that problem persists in the implementation that's currently on the main branch? I've forgotten the details of the issue.

The main problem we had was that the loss remained constant (in a case where it should decrease), but after testing now with the code, this problem seems to be gone.
One thing that pops up now when testing with "uniform" (and not with "from_cpd" or "random") is an assertion error coming from:

print(f"Any unfrozen edge changed? {any_changed}")
assert any_changed, "No learnable edges changed; attention/control masks may be misapplied."

So the main problem we had seems to have been solved, but we now have this issue with the assertion error when using uniform initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants