(CHMAL15) The Loss Surfaces of Multilayer Networks
Posted: 2016-03-07 , Modified: 2016-03-07
Tags: none
Posted: 2016-03-07 , Modified: 2016-03-07
Tags: none
Phenomenon: While multilayer nets do have many local minima, the result of multiple experiments consistently give very similar performance
We rst establish that the loss function of a typical multilayer net with ReLUs can be expressed as a polynomial function of the weights in the network, whose degree is the number of layers, and whose number of monomials is the number of paths from inputs to output.
piecewise, continuous polynomial whose monomials are switched in and out at the boundaries between pieces.
first work to give a “theoretical description of the optimization paradigm with neural networks in the presence of large number of parameters.”
Let \(\si(x)=\max(0,x)\). The random network is \[\begin{align} Y&= q \si(W_H^T\si(\cdots (W_1^TX)\cdots))\\ &=q\sumo i{n_0} \sum_{j=1}^{\ga (=\prod n_i)} X_{ij}\prod_{k=1}^H w_{i,j}^{(k)}. \end{align}\]where \(A_{i,j}\) is whether the path is active, \(X_{i,j}=X_i\) is starting, and \(w_{i,j}^{(k)}\) re weights.
Assumptions
\((s,\ep)\) reduction image: has only \(s\) unique weights but prediction accuracy differs by \(\le \ep\). Some kind of approximation!
ReLU’s.