(HA16) Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition
Posted: 2016-08-02 , Modified: 2016-08-02
Tags: nlp, word embeddings
Posted: 2016-08-02 , Modified: 2016-08-02
Tags: nlp, word embeddings
See also [HA15] Convolutional Dictionary Learning through Tensor Factorization.
Q: Why are we considering each dimension separately? Wouldn’t it make more sense to consider them together, and try to write \[ X = \sum_{i\in [L]} F_i*w_i\] where \(X\) is a matrix corresponding to a sentence, and \(w_i\) is in the row direction. In [HA16]’s way of doing it, you’re learning how a generic subject/theme/atom modulates throughout the space (sentence), not how combinations of subjects modulate. Is this what you want? This does need MUCH fewer atoms though, else you can expect \(k\) times more.
(T/F? Tensor algorithms are fragile in the sense that they depend on the model being exactly the way it is. Ex. tensor algorithm for NN—if you change the NN a bit it may fail.)