hillbig Theoretical analysis of deep learning by Dr. Daiji Suzuki, especially on representation capability, generalization capability, and optimization theory. He covers a wide range of important topics, including the latest Neural Tangent Kernel and dual effect. I don't think there is anything as comprehensive as this in English.
I get an error when I access the original Slideshare, I can see it on X/Twitter, cache?
As an easy-to-understand concrete example, in the case of a function whose value is determined by the distance from the origin, four layers would be of polynomial order with respect to the number of dimensions (I think it's linear, frankly).
regenerative nuclear hillbelt space
I'm redescribing the kernel ridge regression in terms of the idea of a regenerative nuclear Hilbert space, but I'll skip that part.
Deep learning can be interpreted as learning the kernel function itself in accordance with the data.
...
Approximation performance by function class
The various function classes mentioned in past discussions are special cases of [Bezov space
→Sparsity.
Deep learning is superior when spatial smoothness is non-uniform
Non-probabilistic gradient method takes exponential time to get out of the saddle point.
Neural Tangent Kernel
Mean Field
This page is auto-translated from /nishio/鈴木大慈-深層学習の数理 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.