Function approximation
Posted: 2016-11-01 , Modified: 2016-11-01
Tags: reinforcement learning
Posted: 2016-11-01 , Modified: 2016-11-01
Tags: reinforcement learning
See also “Factored MDPs, MDPs with exponential/continuous state space” in refs.
\(F(\te)(x,u_j) = \phi^T(x,u_j)\te\), \(\phi\) normalized so entries sum to 1.
Kernel-based approximator of \(Q\) function \(\ka: (X\times U)^2\to \R\).
Form and number of BF’s not defined in advance \[ \wh Q(x,u) = \sumo{l_s}{n_s} \ka((x,u), (x_{l_s}, u_{l_s}))\te_{l_s}. \]
(I haven’t been exposed to nonparametric methods - what guarantees do nonparametric methods have?)
In between: derive small number of good BF’s from data.
Approximate Q-learnig requires exploration.
Proofs for approximate value iteration rely on contraction mapping arguments. Ex. require \(F\) and projection \(P\) to be nonexpansions.
Suboptimality for convergence point \(\te^*\) bounded in terms of min distance between \(Q^*\) and fixed point of \(F\circ P\), \(\ze_{QI}^*\).
(Ditto for nonparametric (kernel-based) approximators.)
Fitted \(Q\)-iteration using ensembles of extremely randomized trees.