Matrix perturbation
Posted: 2016-08-30 , Modified: 2016-08-30
Tags: matrices
Posted: 2016-08-30 , Modified: 2016-08-30
Tags: matrices
(Theorem 4.8 here)
Original paper (In a different form here—how does it connect?)
Distance between eigenvectors.
Theorem 1 here, Theorem V.3.6 in Stewart and sun
Let \(\Si,\wh \Si\) be symmetric with eigenvalues \(\la_1\ge \cdots \la_p\), \(\wh \la_1\ge \cdots \ge \wh \la_p\) corresponding to eigenvectors \(\la_i,\wh\la_i\). Fix \(1\le r\le s\le p\). Suppose \(\wh \la_{r-1}>\la_r \ge \la_s >\wh \la_{s+1}\), \(\de = \min (\wh \la_{r-1}-\la_r, \la_s - \wh \la_{s+1})\). (Let \(\wh \la_0=-\iy, \wh \la_{p+1}=\iy\).) Then \[\ve{\sin \Te(\wh V, V)}_F \le \fc{\ve{\wh \Si - \Si}_F}{\de}.\] (\(\ve{\sin \Te(\wh V, V)}_F\) has some definition as a matrix, but I think you can interpret the LHS as the sin of the angle between the spaces.)
References are to Stewart and Sun.
First, define the canonical angles and relate them to the projection matrix. The largest canonical angle can be thought of as the distance between the 2 subspaces. However, a single angle doesn’t characterize the positioning of the 2 subspaces. There are as many canonical angles as dimensions in the subspace.
Definition (Theorem I.5.2). Let \(X_1,Y_1\in \C^{n\times l}\) have orthonormal columns, \(X_1^{\dagger}X_1=I\), \(Y_1^{\dagger}Y_1=I\)
Proof. The CS decomposition block-diagonalizes a matrix with block-diagonal unitary matrices: \(\diag(U_{11},U_{22})^{\dagger} W \diag(V_{11},V_{22}) =\mattn{\Ga}{-\Si}O{\Si}{\Ga}OOOI\) where \(U_{11}\) is \(l\times l\), \(2l\le n\), and the partition is \(l,l,n-2l\). Use the CS decomposition on \(X^{\dagger}Y\).
Use the CS decomposition to make
Lemma. Let \(P_X=XX^T\), \(P_Y=YY^T\) be the projections. Let \(k=\min \{l,n-l\}\). Then
Consider the linear operator \(T:X\mapsto AX-XB\).
Relate the canonical angle to Sylvester’s equation.
(Can be generlized to other norms. More involved—need to show existence of fixed point by a contractive iteration. V.2.1, V.2.11.)
Theorem: Let \(A\) have eigenvalues, vectprs \(\la_i, v_i\), and \(A+E\) have eigenvalues, vectors \(\wh \la_i, \wh v_i\). Suppose \(\la_s-\ve{E}_2>\la_t\). Then \[\sin \angle' (\spn (\wh v_1, \ldots, \wh v_s), \spn (v_1,\ldots, v_{s+t})) \le \fc{ve{E}_2}{\la_s-\ve{E}_2 - \la_t}\] where \(\angle'(U,V)\) is asymmetrically defined as \[ \max_{u\in U}\min_{v\in V} \angle(u,v). \]
Proof. Note that \(X, Y\) don’t have to be the same dimension in the above.
Consider \((I-P_X)P_Y\) where \(X\) has more columns than \(Y\), \(m>l\). Add in \(m-l\) columns from \(Y\), that are orthonormal to the columns of \(X\). Then apply the theorem with \(X,Y\) having the same dimension, noting we have \(m-l\) extra 0’s. These angles are now interpreted as the angle between \(X\) and the projection of \(X\) to \(Y\). (Abusing notation and identifying subspaces with matrices.)
Now the rest of the proof goes as in 3.4. We include the perturbation part. Break up int \[ L_{2.2} X_{2.2}^{\dagger} \]
For simplicity consider \(A\) symmetric. (Otherwise, consider \(\matt OA{A^{\dagger}}O\).) Let \(\wh A\) be diagonalized by \(\wh X_1,\wh X_2\) (first \(s\) eigenvectors). Let \(\wh A+E=A\) be diagonalized by \(X_1,X_{2.1},X_{2.2}\) (first \(s\), next \(t-s\), others). We have \[\begin{align} L_{2.2}X_{2.2}^{\dagger} \wh X_1 - X_{2.2}^{\dagger}\wh X_1 \wh X_1^{\dagger} (A+E) \wh X_1 &= X_{2.2}^{\dagger} (A\wh X_1 - (A+E) \wh X_1) \\ &=X_{2.2}^{\dagger} (-E\wh X_1)\\ \implies \ve{X_{2.2}^{\dagger}\wh X_1} & \le \fc{\ve{E}}{|\si_s-\si_{t+1}|}. \end{align}\](Davis-Kahan for \(r=s=1\).)
Let \(v_1(A)\) be the top eigenvector of \(A\). If \(\de=|\la_1(A)-\la_2(A)|\), then \(\sin(\angle (v_1(A), v_1(A+E)))\le \fc{\ve{E}_2}{\de}\).