Skip to content

docs(gradmax): rewrite GradMax algorithm page#16

Open
maxencelebaron wants to merge 2 commits into
growingnet:mainfrom
maxencelebaron:maxence/gradmax-doc
Open

docs(gradmax): rewrite GradMax algorithm page#16
maxencelebaron wants to merge 2 commits into
growingnet:mainfrom
maxencelebaron:maxence/gradmax-doc

Conversation

@maxencelebaron
Copy link
Copy Markdown
Collaborator

This PR adds the documentation page for the GradMax algorithm.

Content:

  • Introduction: GradMax focuses solely on the "how to grow" question, initializing new neurons by maximizing gradient norms rather than greedily minimizing the loss
  • Theory: general optimization problem
  • Experiments: two result tables on CIFAR-10/100 and ImageNet comparing GradMax against random initialization, Firefly, and baselines

Files changed:

  • docs/algorithms/gradmax.rst: new algorithm page
  • docs/_static/gradmax.png: figure illustrating neuron addition in GradMax

@TheoRudkiewicz
Copy link
Copy Markdown
Collaborator

Thank you for this PR. Here are a few comments:

  • you should consider that people already red the introduction and in particular https://growingnet.github.io/growing_wiki/overview/neuron_addition_problem.html. As a consequence it would be better if you use coherent notations (Psi / Omega)
  • Even if it's not the goal of GradMax it is important to report how they solve When, Where and How many to be able to compare with other methods.
  • If you could give at least one-line explanation about "Random" (I think the paper is unclear, hopefully the code is clearer). In particular, if it is a Kaiming style init, what is the fan-in size considered ?

Comment on lines +119 to +121

The solution to this maximization problem is found in closed-form by setting the columns of :math:`W_{\ell+1}^{\mathrm{new}}` as the top-:math:`k` left-singular vectors of the matrix

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is false (it's a misstake of the paper). It should add the hypotestis of orthogonality of the different component.

Copy link
Copy Markdown
Collaborator Author

@maxencelebaron maxencelebaron May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! If I understand correctly, without an orthogonality constraint on the columns of $W_{\ell+1}^{\text{new}}$, the SVD solution (top-k left singular vectors) is not the actual optimum. Is that what you mean?

Comment on lines +164 to +166

- Optimizer: SGD with momentum 0.9, weight decay :math:`0.2`, base learning rate :math:`\eta_0 = 0.1` for Wide-ResNet an with cosine decay and :math:`\eta_0 = 0.05` for VGG

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I a am bit skeptical about the weight decay value.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. That's a mistake. The correct value is 2e-4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants