Weekly summary 2016-12-17

Posted: 2016-12-17 , Modified: 2016-12-17

RL project

LQR
- Understanding
  - Why \(\pi_t(x) = -P_t(x)\), why no closed form in general
  - (What is the Hamiltonian?)
  - Fitted Q-iteration
  - Add a linear term
  - Differential equation for \(P\)
- Remaining questions
  - Prove convergence (check w/ linear term)
  - Hamiltonian: Pontryagin proof
- Approaches
  - Thinking of minimum as circuit - doesn’t work.
  - Local steps using quadratic approximation - how to analyze?
  - Using softmax and sampling - cf. [AH16]. (How to collapse multiple? cf. Toda)
Planning in POMDPs
- Collaboration
- See list of papers.
- Idea: relax memoryless POMDPs, ex. only depending on posterior probabilities? (belief states)
RL learning: tutorial

Contextual MDPs: Bellman rank
[AH16]
[SB] Section 10-13, continue RL tutorial
[LP16] reducing bias for Q-learning
- SSGD theory
(for planning in POMDP)
- Geometry (12/20)
- Contextual MDP paper (spectral)
- Restarts in tensor methods