Censored block model
Posted: 2016-02-28 , Modified: 2016-02-28
Tags: research
Posted: 2016-02-28 , Modified: 2016-02-28
Tags: research
Model: Graph \(G\) is given. Each node is iid red/blue with probability \(\rc2\). An edge is labeled \[\begin{cases}+&\text{nodes same color}\\-&\text{nodes different color}\end{cases}.\] Each edge is flipped independently with probability \(\rc2\).
Question: Under what conditions of \(G\) can the communities be recovered (up to sign)
Past work:
Exact recovery occurs with high probability only if \(\fc{d}{\ln n} ge 2\fc{1-\fc{3\tau}2}{(1-2\ep)^2}\).
Proof idea: If more than half of the edges coming out of a node are noisy, then we fail. If the average degree is small, find a large independent set—nodes whose incident edges do not overlap. Now multiply.
Let \(\rh_{ij}\) be the edge labels. Relax \[ \min_{g_i\in \{\pm 1\}} \sum_{ij\in E}(g_i-\rh_{ij}g_j)^2 \] to \[\max_{X_{ii}=1,X\succeq0}\Tr(WX),\; W=A_G-2A_H\] where \(A_G\) is the adjacency matrix of \(G\) and \(A_H\) is the graph where the \(-1\)’s are.
(Details: we want \(\max \rh_{ij}g_ig_j = \max_{g\in \{\pm 1\}^n}Wgg^T\); now replace \(gg^T\) with \(X\).)
By duality, this is \[\le \min_{\sum \nu_i E_{ii} \succeq W}\sum \nu_i = \min_{Q\succeq W} \Tr(Q).\]
WLOG, we can take the solution to be \(W=11^T\).
By SDP duality, \(X=11^T\) is the optimal solution iff \[\begin{align} \Tr(Q)&=\Tr(W11^T)\\ \Tr(11^T(Q-W))&=0. \end{align}\]I’m pretty confused here.2 If \(L_G-2L_H\succeq0\) then we can let \(Q=D_G-2D_H\). Claim: If \(L_G-2L_H\succeq 0\) and \(\la_2(L_G-2L_H)>0\) then the unique solution is \(gg^T\). The first condition makes sure that \(gg^T\) is an optimal solution, the second makes sure it is unique. (Proof: The gap between a solution \(X\) and \(Q\) is \(\Tr((Q-W)X)\). (We can multiply \(Q\) by \(X\) because \(X_{ii}=1\) and \(Q\) is diagonal. The second smallest eigenvalue of \(Q-W\) needs to be \(>0\) for there to be unique \(X\) making this 0.)
For a randomly chosen \(H\subeq G\), what is \(\Pj(L_G-2L_H\succeq0)\) large? Write this as a sum of random matrices with expectation 0: \[ L_G-2L_H = \sum_{i<j} \al_{ij} \La_{ij}, \quad \al_{ij}=\begin{cases}0,&wp(1-p)\\1,&wp(p(1-\ep))\\-1&wp(p\ep)\end{cases}. \] Now use matrix Bernstein \[ \Pj(\la_{\max}(A)\ge t) \le ne^{-\fc{t^2/2}{\si^2+Rt/3}}. \] For \(|ep\to \rc2\), get the bound that when \(\fc{d}{\ln n}\ge 4\fc{1+\de}{(1-2\ep)^2}+o\prc{(1-2\ep)^2}\), get exact recovery w.p. \(\ge 1-n^{-\de}\).
\(G\) is now fixed; all the randomness comes from \(A_H\). The eigenvalues of \(G\) enter in. See Theorem 5.3. Notably, we need \(\la_{\max}\) to be bounded away from 1 (?). This is equivalent up to parameters to having the Cheeger constant be lower-bounded by a constant.
Suggestion: Let \(X\) be the node colors and \(Y_\ep\) be the observed edges. Consider the entropy \(H(X|Y_\ep)\). This is like a measure of connectivity “on average.” Ex. When \(\ep=0\), this measures the number of components. Think of it as a average-case Cheeger: rather than look at the proportionally minimal cut, we’re looking at an average cut size. (Related is the probability of success for ML decoding, which is \(\E_{X,Y_\ep}P(X|Y)\).)
Thoughts: Any nontrivial partition \(A\sqcup B= V\) of the vertices is associated with a cut. For a set \(S\), let \(X_S\) be \(X\) with the colors of vertices in \(S\) flipped. Given observed \(Y_\ep\), \(P(X_S|Y_\ep)\ge P(X|Y_\ep)\) iff most of the edges of the cut \((A,V\bs A)\) are flipped. Thus, the ML decoding is the \(X\) such that in no cut is the majority of the edges wrong.
Let \(G_C\) be the (good) event that the majority of edges of cut \(C\) are correct. Letting \(|C|\) be the number of edges in the cut, \[\begin{align} P\pa{\bigwedge_C G_C} & \ge \prod P(G_C) & \text{Correlation inequality} \\ & = \prod_C P(\Binom(|C|, 1-\ep) > \fc{|C|}2)\\ & = \prod_C (1-e^{-K_\ep|C|})\\ & \ge 1-\sum_C e^{-K_\ep|C|}. \end{align}\](By Chernoff, we can take \(K_\ep = \fc{(\ep-\rc2)^2}{2\ep(1-\ep)}\).)
Question: is there an efficient algorithm to approximate \(\sum_C e^{-K|C|}\)? By Karger’s algorithm, we can estimate the number of minimum cuts. By sampling, we can estimate the number of cuts of some size if it’s \(>\rc{\poly}\) proportion. But what about all cuts in between?
Note: The inequalities are not close to sharp because the events are very correlated when the cuts overlap a lot. Somehow it should be more affected by small cuts which don’t share a lot of edges in common.