Deep neural nets have caused a revolution in many classification tasks. A related ongoing revolution–also theoretically not understood–concerns their ability to serve as generative models for complicated types of data such as images and texts. These models are trained using ideas like variational autoencoders and Generative Adversarial Networks.
We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with \(n\) hidden layers. A key ingredient is Barron’s Theorem [Barron1993], which gives a Fourier criterion for approximability of a function by a neural network with 1 hidden layer. We show that a composition of n functions which satisfy certain Fourier conditions (“Barron functions”) can be approximated by a \(n+1\)-layer neural network.
For probability distributions, this translates into a criterion for a probability distribution to be approximable in Wasserstein distance–a natural metric on probability distributions–by a neural network applied to a fixed base distribution (e.g., multivariate gaussian). Building up recent lower bound work, we also give an example function that shows that composition of Barron functions is more expressive than Barron functions alone.
Expository note on Barron’s Theorem.
Thoughts, questions, typos? Leave a comment below.