Linear convex regulator 2

Posted: 2017-01-10 , Modified: 2017-01-10

Tags: none

Part 1

Fix \(x_1\). Let the cost function (jointly convex in \(x,a\)) be \(c(x,a)\) (for example, \(g(x) + \ve{a}^2\)) the actions be \(a_1,a_2,...\) and the recurrence describing the dynamics be \(x_{n+1} =Ax_n+Ba_n\), we want

\[\begin{align} &\quad \min_{a_1} [c(x_1,a_1) + \min_{a_2} [c(Ax_1+Ba_1,a_2) + ...]] \\ &= \min_{a_1,...,a_T} c(x_1,a_1) + c(Ax_1+Ba_1,a_2) + ... \end{align}\]

This objective is jointly convex because we are assuming \(c\) is convex, and a convex function composed with a linear function is convex. Letting \(f_a(x) = Ax + Ba\) be the function describing the dynamics and \(f_{a_1\cdots a_n} (x) = f_{a_n}\circ \cdots \circ f_{a_1}(x)\), we can write this as \[ \min_{a_1,...,a_T} \ub{c(x_1,a_1) + c(f_{a_1}(x_1),a_2) + c(f_{a_1a_2} (x_1),a_3)+\cdots}{F(a_1,\ldots, a_T)} \]

Convex optimization

What guarantees do we get from convex optimization? We can consider different settings:

Full information

This is just gradient descent. The dimension is \(Tn\), the gradient is a sum of gradients of \(c\)’s, see below. If \(c\) is stochastic, then this is SGD.

See only costs of sampled trajectories

Theorem 6.6 in OCO. Online gradient descent without a gradient attains regret bound \[ \E R_T \le 9nDGT^{\fc 34} \] where

Applying this here, we have

Unknown dynamics

Exploration/exploitation involved in balancing learning the dynamics with choosing the best action given known information. Think about each linearly independent \(a\) as giving more information.

Notes

Noisy dynamics

The problem changes when there is noise in the dynamics. \[ x_{n+1} = Ax_n+Ba_n + \ep_n. \]

References

Might be useful.