What are convergence guarantees for dictionary learning? Consider the settings
- AGMM15 (Alternating minimization)
- 2-layer NN
- With \(b^Ty\)
- With \(\sgn(b^Ty)\)
Experiments
Code is in dl_convergence.py
. Run on ionic.
Results
- s = 3
- m = 50 # hidden vector
- n = 25 # observed vector
- q = s/m
- alpha = .01
- batchsize = 1024
- vary sigma (how close initialization is)
Next,
- add random initialization - check
- vary (s,m,n)
- check sparsity of learned vectors (do thresholding too) - check
- add initialization from samples - check
- try overcomplete initialization from samples - check
First observations
See am_dl_3_50_25.txt
and slurm-1218768.out
- Converges when close enough (as in AGMM15). For this, even 0.5 is close enough. Note it doesn’t converge to \(A\) - it converges to something that has columns \(\approx 0.1\) away from \(A\), consistant bias. (This makes sense.)
- Random initialization does not converge to global optimum. Initialization with samples seems to do slightly better.
Evaluation
How to evaluate?
- Closeness of columns.
- Loss: how much sparsity, and how far away. (Reconstruction error)
- How does reconstruction error compare to SVD? (Make dimensions comparable.)
- Put in random SVM on top. Can it learn the SVM well?
- Check framework in [HM16].
Code
Data
In am_dls_data.pickle
, data is stored as list (m,n,s, st, loss, mins1, mins2, mins3, A, B, AB)
.