Neural net learning, Anthony and Bartlett

Posted: 2017-02-28 , Modified: 2017-02-28

Tags: neural nets

9 Classification with real-valued functions

9.1 Intro

9.2 Large margin classifiers

10 Covering numbers and uniform convergence

10.3 Uniform convergence

\[ \Pj^m [\exists f\in F, err_P(f) \ge \wh{err}_z^\ga (f) + \ep] \le 2N_\iy(\fc \ga2, F, 2m) e^{-\fc{\ep^2m}{8}}. \] (\(\max\) here signals that \(L^\iy\) is the right covering.)

If covering number grows as \(\poly(m)\), then this approaches 0 exponentially.

11 Pseudo-dimension and fat-shattering dimension

Is there a combo measure like the VC dimension to bound covering numbers?

11.2 Pseudo-dimension

11.3 Fat-shattering dimension

12 Bounding covering numbers with dimensions

13

Large margin SEM algorithm: gives \(\wh{err}_z^\ga (L(\ga, z)) = \min_{f\in F}\wh{err}_z^\ga (f)\).

14 Dimensions of neural networks

We bound the covering numbers and fat-shattering dimensions for net- works that are fully connected between adjacent layers, that have units with a bounded activation function satisfying a Lipschitz constraint, and that have all weights (or all weights in certain layers) constrained to be small.

Two types of bounds. Two notions of complexity: parameters and size of parameters

14.3 In terms of number of parameters

14.4 On Fat-shattering dimensions, independent of number of parameters

\(fat_F(\ga)\) increases with \(\rc{\ga}\), weight bound, and number of layers.