Weekly summary 2016-10-01

Posted: 2016-09-26 , Modified: 2016-09-26

Tags: none

Meeting with Arora

Directions

Learning noisy or-nets with 3-tensors (\(n^4\) time) (@Andrej, @Tengyu)

Matrix factorization assuming separability

Mike Collins NLP. Words as features.

Logliear/NN
For bigram classifiers, it’s fine to use word embeddings. (@Tengyu)

Systems with memory

Learning HMM’s with tensor methods
Reinforcement learning (MDP’s)
- Theoretical algorithms are polynomial in state size and mixing time.
- What if we have a succinct reprsentation? @Mengdi Wang. Vector in \(\R^n\).
  - Stochastic Primal-Dual Methods and Sample Complexity of Markov Decision Process. (recent results on learning in MDPs via a primal-dual approach, and new ideas on finding compact representations of MDPs via low-rank approximation.)
cf. contextual bandits (@Elad, @Karan)

Representation learning

Is PCA/SVD “complete” for classification? PCA as preprocessing for classification

PMI for images

How is it NMF? (@Tengyu)

Dictionary learning

Relax incoherence. Only correlated with \(n^\de\) others.
@Elad: Training on top of improperly learned dictionary.

Thoughts about PMI

(9/28)

In the CKN, given that one layer is \(x\), the next layer (before pooling) is computed as \(y_i=(e^{v_i^x+b_i})_i\) for some \(v_i,b_i\). We have that the dimension of \(y\) is larger than the dimension of \(x\).

This looks very much like in PMI for word vectors, where the probability of word with vector \(v\) given context \(x\) is \(e^{-v^Tx}\), and the low-rank approximation to the PMI matrix recovers the \(v\)’s.

But does that mean applying weighted SVD for PMI for the CKN feature vectors is somehow just trying to recover the \(v_i\)? In that case the dimension reduction would just be going from \(y\) back to \(x\), which doesn’t help classification.

(This doesn’t take into account the Gaussian pooling though.)

What would be the “test” for interpretability? For word embeddings, the test was analogy completion.

If the PMI matrix is low-rank, then what do we get beyond the fact that the feature vectors (7200-dim) came from a lower-dimensional (28x28) space? (In what sense would we expect the dimension-reduced feature vectors to be more interpretable than the original image?)

TODO:

Try thresholding + WSVD + SVM.

Do pipeline (data exploration) with other MNIST and CIFAR architectures.

Think about the interpretability test above.

@Tengyu on interpreting CKN as NMF.

Weekly summary 2016-10-01

Threads

Meeting with Arora

Thoughts about PMI