(BGKP16) Non-negative matrix factorization under heavy noise

Posted: 2016-06-27 , Modified: 2016-06-27

Tags: NMF

Summary

A provable algorithm for NMF \(A=BC\) without assuming that the noise of every column of \(A\) has small noise.

Under heavy noise \[\forall T\subeq [n], |T|\ge \ep n\implies \rc{|T|}\ve{\sum_{j\in T}N_{\bullet,j}}_1\le \ep\] and in the dominant NMF model (dominant features, dominant basis vectors, nearly pure records), the TSVDNMF algorithm finds \[\ve{B_{\bullet,l}-\wt B_{\bullet, l}}_1\le \ep_0.\]

Under dominant NMF assumptions D1, D3, \(B\) is identifiable.

Remarks:

Assumptions

Algorithm

  1. Apply thresholding to get \(D\).
    • Initialize \(R=[d]\).
    • For each row \(i\), calculate a cutoff \(\ze_i\). Set \(D_{ij}=(A_{ij}\ge \ze_i) \sqrt{\ze_i}\).
    • Sort rows in ascending order and prune rows as follows. (Why? We want to prune the non-catchwords. They may be associated with significantly more documents than the catchwords.)
  2. Take rank-\(k\) SVD \(D^{(k)}\) of \(D\). (We hope that this is close to a block matrix with nonzeros in \(S_l\times T_l\).)

  3. Identify dominant basis vectors.
    • \(k\)-means clustering of columns of \(D^{(k)}\).
    • Apply Lloyd’s algorithm to \(D\) with this initialization.
    • Let \((R_i)\) be the \(k\)-partition of \([n]\).
  4. Identify dominant features \(J_l\) for each basis vector by: for each \(l\), take the features \(i\) (words) with largest \(A_{il}\).
  5. Find the basis vectors by averaging the “purest” documents in each \(J_l\).


  1. The paper isn’t clear on the \(\ep\) dependence… see supplement. In any case this is true in the case that noise is a sum of many \(O(1)\) noises.

  2. The covariance matrix is \((-1_{i\ne j} p_ip_j)_{i,j}\). Max eigenvalue is at most \(\max p_i(1-p_i)\) in absolute value.