(SV16) The mixing time of the Dikin walk in a polytope - a simple proof

Posted: 2016-09-07 , Modified: 2016-09-07

Tags: gradient

Problem: sample a point from the uniform distribution in a polytope \(K\subeq \R^n\). (This is used in many algorithms.)

Main theorem (informal): From a warm start (i.e., density \(\Pj(A)\le C\Pj_{U_K}(A)\) where \(U_K\) is the uniform distribution on \(K\)), the Dikin walk mixes in time \(O(mn)\) for a \(n\)-dimensional polytope described with \(m\) inequalities.

Dikin walk

For \(K\) defined by \(a_i^Tx\ge b\), let \[\begin{align} F(x) :&= -\sum_{i\in [m]} \ln (a_i^Tx-b_i)\\ H(x) = \nb^2F(x) &= \sum_{i\in [m]} \rc{(a_i^Tx-b_i)^2}a_ia_i^T\\ \ve{v}_x^2 :&= v^TH(x)v\\ \text{Dikin ellipsoid }\mathcal E_x:&=\set{z}{\ve{z-x}_x\le 1}. \end{align}\]

Consider the random variable \(g_x\) given by \[ z = x + \fc{r}{\sqrt n} H(x)^{-\rc 2}g,\quad g\sim N(0,1),\] i.e. it is a Gaussian with covariance matrix \(\fc{r^2}{n}H(x)^{-1}\) centered at \(x\).

Let \(p_x\) be the \(g_x\) with the Metropolis filter applied, \[ p_x(z) = \min\{g_x(z),g_z(x)\}\] (stay at \(x\) with remaining probability).

Theorem and Proof

Define the cross-ratio distance \[\si(x,y) = \fc{|\ol{xy}||\ol{pq}|}{|\ol{px}||\ol{qy}|},\] where \(xy\) hits the boundary of \(K\) at \(p,q\) (\(pxyq\) are in that order). \(\ln (1+\si(x,y))\) is a metric on \(K\).

Theorem 1: Given a reversible random walk in \(K\) with stationary distribution uniform, suppose \(\exists \De>0\) such that for all \(x,y\in K\) with \(\si(x,y)\le \De\), \(\ve{p_x-p_y}_1\le 1-\Om(1)\). Then after \(O(\De^{-2})\), the lazy version of the walk from a warm start is within \(\rc4\) TV distance from the uniform distribution.

(This is saying something about uniform continuity of \(x\mapsto p_x\). There’s a “smoothing out” effect.)

Proof of main theorem.

  1. Theorem 2: For the Gaussian Dikin walk with \(r\le \fc{\ep}{400}\pa{\ln \pf{200}{\ep}}^{-\fc 32}\), for \(x,y\in K\) with \(\ve{x-y}_x\le \fc{r}{\sqrt n}\), \(\ve{p_x-p_y}\le \ep\).
    • Lemma 3: Let \(r\le 1, c\ge 0, c\le \min\{r,\rc 3\}\), \(x,y\in K\). If \(\ve{x-y}_x\le \fc{c}{\sqrt n}\), then \(\ve{g_x-g_y}_1\le 3c\). (Proof: KL divergence between Gaussians, Pinsker’s inequality.)
    • Lemma 4: Given \(\ep\in [0,\rc 2]\), for \(r\le \fc{\ep}{100}(\ln \fc{50}{\ep})^{-\fc 32}\), we have \(\ve{p_x-g_x}_1\le \ep\). (Proof: Hypercontractivity)
    • Use triangle inequality.
  2. For any polytope \(K\) described using \(m\) inequalities, \(\si(x,y) \ge \rc{\sqrt m}\ve{x-y}_x\).
  3. Use Theorem 1. \(\si(x,y)\le \De\) implies \(\ve{x-y}_x\le De\sqrt m\). Take \(\De =O\prc{\sqrt{mn}}\).