Gradient descent

Posted: 2016-03-04 , Modified: 2016-03-04

Tags: gradient

(See 10/15 notebook for detailed notes.)

See also AO15 and mirror descent.

Summary

Algorithm General \(\al\)-strongly convex \(\be\)-smooth \(\ga\)-convex
Gradient descent \(\rc{\sqrt{T}}\) \(\rc{\al T}\) \(\fc{\be}T\) \(e^{-\ga T}\)
Accelerated gradient descent \(\fc{d}{T}\) \(\rc{\al T^2}\) \(\fc{\be}{T^2}\) \(e^{-\sqrt{\ga} T}\)
GD (average regret) N/A \(\rc{\al} \ln T\) \(\rc{\sqrt T}\) \(\fc{n}{\ga} \ln T\)

\(\ga\)-convex” is also called “condition number \(\ka\)” (\(\ka= \ga\)).

Gradient descent main points

Proofs

Gradient descent lemma: Let \(D=\nb f(x)\). Move to origin. Upper bound is \[f(x) \le \fc{L}2 x^2 + Dx.\] The minimum is at \(-\fc{D}{L}\) and is \(\fc{-b^2}{4a} = -\fc{D^2}{2L}.\)

For strongly convex: Choose \(s\) to maximize the minimum progress in terms of \(x\). \[\fc{s-\rc{l}}{\rc{l}} = \fc{\rc L-s}{\rc L} \implies s = \fc{2}{L+l}.\]

Backtracking analysis:

Different settings

Regret bounds

Review


  1. Think of this as the harmonic average of how much to move to get to the minima of the upper and lower-bounding quadratics.