Learning structured, robust, and multimodal deep models

Posted: 2016-10-21 , Modified: 2016-10-21

Tags: neural networks, deep learning, multimodal

Abstract

Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding. In this talk I will first introduce a broad class of deep learning models and show that they can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. I will next introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities as well as present the Reverse Annealed Importance Sampling Estimator (RAISE) for evaluating these deep generative models. Finally, I will discuss models that can generate natural language descriptions (captions) of images, as well as generate images from captions using attention mechanism. I will show that on several tasks, including modelling images and text, these models significantly improve upon many of the existing techniques.

Learning deep generative models

Learn feature representations!

Multi-modal learning

Open problems

Neural storytelling. Take corpus of books (romantic), generate captions about the image.

Kiros2015 NIPS

One-shot learning: humans vs. machines. How can we learn novel concept from few examples (Lake, S, Tenenbaum 2015, Science)

Questions

CNN better for supervised. We’re trying to build convolutional DBM.

vs. variational autoencoder. Reparametrization trick, backprop through whole model. Optimization better for VA. Both useful.

Learning representation, not with language?

Microsoft dataset: 80000 images, 5 captions each. Not big enough, but captions clean!

Topics vs. coherent model of sentences. What do we need? New architectures, training sets?

Actor-Mimic model.

Transfer learning: learn new games faster by leveraging knowledge about previous games. Ex. star gunner

Continuous state.