setol_paper/abstract.tex at main · CalculatedContent/setol_paper · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
We present a \SemiEmpirical Theory of Learning (\SETOL)
that explains the remarkable performance of \StateOfTheArt (SOTA) Neural Networks (NNs).
We provide a formal explanation of the origin of the
fundamental quantities in the phenomenological theory of  \HeavyTailedSelfRegularization (\HTSR), the
\HeavyTailed \PowerLaw \LayerQuality metrics,
\ALPHAHAT $(\alpha)$ and \ALPHAHAT $(\hat{\alpha})$.
In prior work, these metrics have been shown to predict trends in the test accuracies of pretrained SOTA NN models,
and, importantly,  without needing access to the testing or even training data.
Our \SETOL
uses techniques from \StatisticalMechanics (\STATMECH) as well as advanced methods from \RandomMatrixTheory (\RMT) and Quantum Chemistry. Our derivation suggests new mathematical preconditions for \emph{\Ideal} learning, including the new \TRACELOG metric (which is equivalent to applying a single step of the Wilson Exact Renormalization Group).
We test the assumptions and predictions of our \SETOL on a simple 3-layer
\MultiLayer \Perceptron (MLP), demonstrating excellent agreement with the key theoretical assumptions.
For SOTA NN models, we show how to estimate the individual layer Qualities of a trained NN by simply computing the \EmpiricalSpectralDensity (ESD) of the layer weight matrices and
then plugging this ESD into our \SETOL formulae.
Notably, we examine the performance of the HTSR $\alpha$ and the \SETOL \TRACELOG \LayerQuality metrics, and find that they align
remarkably well, both on our MLP and SOTA NNs.