Posted: 2016-08-06
, Modified: 2016-08-06
Tags: none
- Machine learning
- PMI images
- Understood PMI idea more thoroughly.
- Got code from Ben. Having trouble running. See
- https://www.mathworks.com/matlabcentral/newsreader/view_thread/345876#947272
- https://askrc.princeton.edu/question/255/matlab-on-nobel-error-using-mex/
- http://stackoverflow.com/questions/38723051/error-using-mex-g-error-no-such-file-or-directory
- Brainstorm on convnets. Skimmed the following. TODO: understand more thoroughly.
- Fastfood - computing Hilbert space expansions in loglinear time
- Deep-fried convnets
- Ailon-Chazelle
- HA16 Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition
- [MKHS14] Convolutional Kernel Networks
- Representation learning problem (see below)
- Talked with Anand.
- Kernel dictionary learning idea. TODO: try to make this rigorous, and test it out.
- TODO: Think about AGD, talk with Cyril.
- Probability
- TT/ATP/PL
Representation learning
In dictionary learning, we assume we have samples \(y = Ax + e\) where \(x\) comes from a sparse distribution (ex. \(x_i\) independent, \(x_i\neq 0\) with probability \(s/n\) and then is drawn from some distribution not concentrated at 0) and \(e\) is error (ex. Gaussian).
The way we stated our problem is that \(x\cdot a_i\) is large for only a few \(i\). This is similar to dictionary learning with \((A^+)^T\) where the columns of \(A\) are the \(a_i\). (I.e. the \(x\)’s here are really the \(y\)’s in DL.)
I may be wrong but I think that what’s different is that
- Dictionary learning on \((A^+)^T\) would correspond to when \((x\cdot a_i)_i\) is sparse + noise. Our assumption is a bit different that \(x\cdot a_i\) is large for only a few \(i\), ex. large negative values don’t count against the assumption.
- we’re trying to relax the condition on the noise—ex. instead of saying that the noise in the other coordinates is random, we consider worst-case or make an assumption that they’re random-like in some way.
(Actually, I think the undercomplete case when the number of \(a_i\) is less than the dimension of \(n\) doesn’t quite correspond to DL because the map \(x\mapsto (x\cdot a_i)_i\) is not invertible…)