MDP's with continuous state space (scratch)

Posted: 2016-10-14 , Modified: 2016-10-25

Tags: none

written-up things

Kalman filter

Come up with a class of MDPs on exponentially large/continuous space that is interesting and tractable. Think of generalizing from contextual bandits * Basically we want a reasonable model of a MDP with a very large (exponential or continuous) state space and be able to do something with it. Wanted to include some dynamics like in Kalman filters but we weren’t sure whether Kalman filters are tractable * Todo: learn about Kalman filters

Starting points

  1. HMM’s have discrete state space. What happens with continuous state space? Suppose there are some dynamics as in Kalman filters. Infer the hidden state. References
    • Continuous HMM paper (RKHS)
    • Kalman filters (see examples)
    • Grad descent learning dynamical systems.
  2. Contextual bandits + MDP’s. Don’t assume there’s a hidden state here, just that next state depends, say, linearly on action and noise.
  3. Context vector/random walk model for documents: transition probabilities \(\propto \exp(-\an{c_1,c_2})\) and observation probabilities \(\propto \exp(-\an{c_1,x})\).

Model

First try

This setting looks like reinforcement learning + control theory. Prior work? How is RL used in continuous systems right now? Basic control theory background?

Need the model to be a generalization of regular MDP.

(*) may be interesting from control theory perspective, but doesn’t generalize discrete MDP. (Seems like best to learn the dynamics, and then do optimal thing from there…)

Second try

Captures deterministic MDP, but not probabilistic, by letting \(A=\{e_i\}\).

Misc

Do as well as best Bayes net? Actions in some class. Finite set of actions, vs. exponential/continuous set of actions. In latter case, will depend on optimizability of that set…

Ex. class is a SVM.

“Do as well as best estimator of \(q\) function in a certain class (assume convexity or something?)” (cf. contextual bandits first)