Linear quadratic regulator

Posted: 2016-12-09 , Modified: 2016-12-09

Solve the Hamiltonian
The algebraic Riccati equation
Solve the Riccati equation
Choose the stable solution

We derive the solution to the linear quadratic regulator problem in the continuous, infinite-horizon setting. Handwavy parts are in italics.

References: Ch. 5 of course notes and Riccati equation paper. Wikipedia on LQR and Riccati.

The problem:

\[\begin{align} \dot x & = Ax + Ba = f(x,a)\\ r(x,a) & = -c(x,a) = -\rc2 x^TQx - a^Ta\\ \pat{Total reward} &= \iiy r(x(t), a(t))\,dt. \end{align}\]

Here \(Q\) is positive definite. Write \(Q=C^TC\).

Solve the Hamiltonian

We write down the corresponding Hamiltonian system and then solve it. (See Control theory Section 5.) \[\begin{align} H(x,a) &= f(x,a)^T p + r(x,a) = f(x,a)^T p - \rc2 x^TQx - a^Ta\\ \dot x &= \nb_p H(x(t),p(t),a(t)) = Ax + Ba \\ \dot p &= -\nb_x H(x(t),p(t),a(t)) = -A^T p + Qx\\ a &= \amax_a H(x(t),p(t),a(t))\\ \iff 0 &= \nb_a H(x(t),p(t),a(t))\\ \iff B^T p - a &=0\\ \iff a &= B^T p \end{align}\] We get \[\begin{align} \dot x &= Ax + BB^Tp\\ \dot p &= -Qx + A^Tp\\ \ddd t \coltwo xp &= \matt{A}{BB^T}{-Q}{A^T} \coltwo xp. \end{align}\]

The algebraic Riccati equation

Look for a solution in the form \(p=-Px\). (Note that in the finite-horizon setting, solving the Hamiltonian system with the appropriate boundary condition \(p(T)=0\) gives that the solution is \(p=-P(t)x\) where \(P(t)\) satisfies a differential equation called the Matrix Riccati differential equation. In the infinite-horizon setting, \(P\) doesn’t change with time and we get a single \(P\).)

Substituting \(p=-Px\), the system becomes \[\begin{align} -(PA - PBB^TP) x &= -P \dot x = \dot p = (Q+A^TP)x\\ -PBB^TP + PA + A^TP + Q &= 0 \end{align}\]

This is called the algebraic Riccati equation.

Solve the Riccati equation

Write the Riccati equation as (set \(Q=C^TC\)) \[ (-P\; I) \matt{A}{-BB^T}{-CC^T}{-A^T}\coltwo IP = 0 \] Let \(K=A-BB^TP\). This is \[ \ub{\matt{A}{-BB^T}{-C^TC}{-A^T}}M\coltwo IP = \coltwo K{PK} = \coltwo IP K. \] Assume \(K\) is diagonalizable. Write \(K=XDX^{-1}\). \[\begin{align} M\coltwo IP &= \coltwo X{PX} DX^{-1}\\ M\coltwo{X}{PX} &= \coltwo{X}{PX}D. \end{align}\]

Hence the columns of \(\coltwo{X}{PX}\) are eigenvalues of \(M\).

So the \(P\) that solve the algebraic Riccati equation are \(P=YX^{-1}\) where the columns of \(\coltwo XY\) are \(n\) of the eigenvectors of \(M\).

Choose the stable solution

First some definitions.

\(\la\) is stable if \(\Re \la<0\). \(A\) is stable if all eigenvalues of \(A\) are stable. (This means that any solution to \(\dot x = Ax\) has \(x(t)\to 0\).)
\(\la\) is a controllable eigenvalue for \((A,B)\) if for all row eigenvectors of \(A\) corresponding to \(\la\) (\(wA=\la w, w\ne 0\)), \(wB\ne 0\). \((A,B)\) is controllable if all its eigenvalues are controllable. (Equivalently, \((A^T,B^T)\) is observable.)
\((A,C)\) is observable if for all (column) eigenvectors of \(A\), \(Cz\ne 0\).
\((A,B)\) is stabilizable if every unstable eigenvalue is controllable. (This is true for e.g. random matrices, because w.p. 1 a random vector is not in the left nullspace of \(B\).) Equivalently, there exists \(L\) (with real entries) such that \(A+BL\) is stable.

Theorem. The following are equivalent.

There exists a stable solution \(P\) to the algebraic Riccati equation, giving the solution \(p = -Px\), \(\boxed{a = B^Tp = -B^TPx}\).
All eigenvalues satisfy \(\Re \la \ne 0\) and \((A,B)\) is stabilizable.

The stable solution is unique.

Proof.

From the previous section, a solution \(P\) has to be in the form \(Y^{-1}X\) where \(X, Y\) are such that \(\coltwo XY\) are the eigenvectors corresponding to the stable eigenvalues. Note that the eigenvectors come in \(\pm\) pairs because letting \(J=\matt O{-I}{I}O\), \(M^T = J M J\), \(M^T \coltwo{-y}x = -JM\coltwo xy\). (\(M\) is a Hamiltonian matrix, i.e. \(JM\) is symmetric.) So exactly \(n\) of the \(2n\) eigenvalues are stable, which is the unique choice for \(\coltwo XY\) up to permutation.

\(\Rightarrow\): Take \(L=-B^TP\). Then \(A+B(-B^TP)=K\) is stable. (Its eigenvalues are exactly the eigenvalues of \(M\).)
\(\Leftarrow\): We just have to check that \(Y\) is invertible, which is implied by the stabilizable condition. Proof omitted.