Control theory
Posted: 2016-12-06 , Modified: 2016-12-06
Tags: control theory
Posted: 2016-12-06 , Modified: 2016-12-06
Tags: control theory
where \(\al: [0,\iy)\to A\) is the control, \(r:\R^n\times A\to \R\) is the reward and \(g:\R^n\to \R\) is the terminal reward. The goal is to find the optimal \(\al\). (We can think of \(x\) as a function of \(t, \al, x^0\), \(x(t, \al, x^0)\).)
Example: Economics (investment) - \(x\) is output and \(\al\) is proportion to reinvest. \[\begin{align} \dot x &= k\al x\\ x(0)&=x^0\\ P(\al)&=\int_0^T (1-\al(t))x(t)\,dt. \end{align}\] Example: Try to stop a train with rockets on both sides - Here \(T\) is not fixed, but the \(\tau\), the first time that \((x,\dot x)=0\). \(\al\in [-1,1]\), \(x=\coltwo qv\). \[\begin{align} \dot x & = \matt 0100 x + \coltwo 01 \al\\ P(\al) & = -\tau \end{align}\]Let \(C(t)=\set{x}{\exists \al, x(t, \al, x^0)= x}\) and \(C=\bigcup_{t\ge 0} C(t)\).
Consider linear systems with solution \[\begin{align} \dot x &= Mx + \ub{N\al}{f}\\ X&=e^{tM}\\ x(t) &= X(t) x^0 + X(t) \int_0^t X^{-1}(s) f(s)\,ds. \end{align}\](If \(A=\R^n\), then \(\rank G=n \iff C=\R^n\).)
Suppose we observe \(y=Nx\) where \(N\in \R^{m\times n}\). Think of \(m<n\).
Say the system is observable if \(Nx_1\equiv Nx_2\) on \([0,t]\) implies \(x_1\equiv x_2\).
Duality. \(\dot x = Mx\), \(y=Nx\) is observable iff \(\dot z = M^Tz + N^T \al\), \(\al\in \R^m\) is controllable.
Proof.
Theorem. Any extreme point of the set of admissible controls \(\set{\al:\R^n\to [-1,1]^n}{x(t,\al,x^0)=x}\) has, for each \(t\ge 0\), \(i\), \(|\al^i|=1\) (is “bang-bang”). In particular, there always exists a bang-bang solution.
Proof.
For the linear system and \(A=[-1,1]^n\), there exist a time-optimal bang-bang solution. I.e. \(\tau^*=\inf \set{t}{x^0\in C(t)}\) is attainable.
Proof. Take \(t_n\to t\), \(\al_n\). Use Alaoglu.
Let the reachable set be \(K(t,x^0) = \set{x^1}{\exists \al, x(x^0, \al, t) = x^1}\). It is convex and closed (Pf. Alaoglu).
Theorem. There is \(h\) (depending on \(x_0\), but not on \(t\)) such that the optimal action is \[ \al^*(t) = \max_{a\in A} [h^T X^{-1}(t)Na]. \]
Proof.
(Take \(p(0) = h\), \(p=h^TX^{-1}\).)
Let \(L:\R^n\times \R^r\to \R\) (a Lagrangian). Suppose we want to solve (action equation) \[ \min I[x], \quad I[x] = \int_0^T L(x,\dot x)\,dt. \] Assume that \(p=\nb_v L(x,v)\) can be solved for \(v\). (How important is this?) The solution satisfies the Euler-Lagrange equation \[ \ddd t \ub{[\nb_v L(x^*, \dot x^*)]}{p} = \nb_x L(x^*, \dot x^*). \]
Proof. Consider “differentiating” in directoin \(y:[0,T]\to \R^n\), \(y(0)=y(T) = 0\). Consider \(i(\tau) = I[x+\tau y]\). \(i(\tau)\ge i(0)\) so \(i'(0)=0\). \[ i'(0) = \sumo in \int_0^T L_x(x,\dot x)y_i + L_{v_i} (x,\dot x) \dot y_i\,dt. \] Choose \(y = \psi(t) e_j\). IbP gives \(L_{x_j} - (L_{v_j})_t=0\).
The solution to EL satisfies Hamiltonian system: let \(H=p^Tv - L(x, v(x,p))\), \[\begin{align} \dot x &= \nb_p H\\ \dot p &= -\nb_x H. \end{align}\] Proof. \[\begin{align} \nb_x H &= p\nb_x v - \nb_x L - \nb_v L \nb_x v = -\nb_xL\\ \nb_p H &= v(p) + p^T \fc{Dv}{Dp} - \nb_p L \\ &= \dot x + p^T \fc{Dv}{Dp} - (\nb_v L)^T\fc{Dp}{Dv}=\dot x. \end{align}\](This is pretty confusing. \(v\) is implicitly defined in terms of \(p\), the value such that \(p=\nb_v L(x,v)\).)
Example: \[\begin{align} L &= \fc{m|v|^2}{2} - V(x)\\ m\ddot x &= -\nb V(x(t))\\ p &= \nb_v L = mv\\ H(x,p) &= \fc{|p|^2}{2m} + V. \end{align}\]4.2. Constraints create Lagrange multipliers, which contain valuable information. If \(x^*\in \pl R\), \(R=\{g\le 0\}\), \(x^*=\amax f\), then \(\nb f = \nb g\), \(\mu \nb f(x^*) = \la \nb g(x^*)\).
is \[ H(x,p,a) = f(x,a)^Tp+r(x,a)\]
(See warning on p. 50.)
Methodology: solve for \(\al(x,p)\), substitute back, solve the DE, then sub \(x,p\) into expression \(\al\). “Feedback controls”: set \(\al(t) = c(t)x(t)\) and write equation for \(c(t)\). (Cf. eigenfunctions??)
Transversality: adding condition to start in \(X_0\) and end in \(X_1\), we have \(p^*(\tau^*)\perp T_1\), \(p^*(0)\perp T_0\).
Adding a variable can help. Ex. \[ I(\al) = \iiy \redd{e^{-\al x}} \fc{\sin x}{x} \dx,\quad I'(\al) = -\rc{\al^2+1}. \]
Fix \(T\). Vary starting time and point: \[ v(x,t) = \sup_{\al \in A} P_{x,t}[\al]. \]
Hamilton-Jacobi-Bellman equation \[\begin{align} v_t + \ub{\max_{a\in A}[f\cdot \nb_x v + r]}{a^*(x,\nb_x v)} &= 0\\ v(x,T) & = g(x). \end{align}\] Proof. Taking the first equation, dividing by \(h\to 0\), using the chain rule \[\begin{align} v_t & \ge \int_t^{t+h} r\,ds + v(x(t+h), t+h) \\ v_t + \nb_x v \cdot x + r&\le 0. \end{align}\]Now take the max. Equality attained at optimal \(\al^*\).
General procedure;:
where \(K\) satisfies the matrix Riccati equation.
5.3. HJ equations…