(JSA15) Beating the Perils of Non-Convexity - Guaranteed Training of Neural Networks using Tensor Methods
Posted: 2016-06-27 , Modified: 2016-06-27
Tags: neural net
Posted: 2016-06-27 , Modified: 2016-06-27
Tags: neural net
Gives the first provable risk bounds for 2-layer neural nets via an efficient algorithm. Assumes that the generative distribution for the inputs \(x\)’s is known or estimable (the algorithm uses the 3rd order score function). Algorithm is based on tensor decomposition.
Remarks:
See [JSA14].
Lemma (Stein’s identity): Under some regularity conditions, \[\EE_{x\sim p} [G(x) \ot \nb_x\ln p(x)] = \EE_{x\sim p} \nb_x G(x).\] (For example, for \(p(x)\) Gaussian, \(\nb_x \ln p(x) = -x\).)
Proof. By IbP, \[\int G(x)_i \pdd{x_j} \ln p(x) p(x) \dx = \int G(x)_i \pdd{x_j}p(x) = -\int \pdd{x_j} G(x)_i p(x)\dx.\]