Toggle navigation
Notebook
Home
Sitemap
Weekly summary 2016-12-17
Posted: 2016-12-17 , Modified: 2016-12-17
Tags:
reinforcement learning
,
control theory
RL project
Alexa/NLP experiments
To read, think about
To do over break
Last week
RL project
LQR
Understanding
Why
\(\pi_t(x) = -P_t(x)\)
, why no closed form in general
(What is the Hamiltonian?)
Fitted Q-iteration
Add a linear term
Differential equation for
\(P\)
Remaining questions
Prove convergence (check w/ linear term)
Hamiltonian: Pontryagin proof
Approaches
Thinking of minimum as circuit - doesn’t work.
Local steps using quadratic approximation - how to analyze?
Using softmax and sampling - cf. [AH16]. (How to collapse multiple? cf. Toda)
Planning in POMDPs
Collaboration
See list of papers.
Idea: relax memoryless POMDPs, ex. only depending on posterior probabilities? (belief states)
RL learning: tutorial
Alexa/NLP experiments
Started programming treeflow.
Minutes
To read, think about
Contextual MDPs: Bellman rank
[AH16]
[SB] Section 10-13, continue RL tutorial
[LP16] reducing bias for Q-learning
SSGD theory
(for planning in POMDP)
Geometry (12/20)
Contextual MDP paper (spectral)
Restarts in tensor methods
To do over break
Write up notes on RL
Try AI gym
Read on AI safety
More reading
Graphical models, exponential families
Hi-dim prob
Convex opt
SoS
PDE
Alexa
LaTeX blog setup
neural nets in Haskell