Matrix & Vector Techniques

1. Quadratic Form as Matrix Representation

It is often useful when getting the maximum and minimum of eigenvalues of a diagonal matrix. ax2+2bxy+cy2=(xy)(abcd)(xy)ax2+by2+cz2+2dxy+2exz+2fyz=(xyz)(adedbfefc)(xyz)\begin{aligned} a x^2 + 2 b x y + c y^2 &= \left(\begin{array}{cc} x & y \end{array}\right) \left(\begin{array}{cc} a & b \\ c & d \end{array}\right) \left(\begin{array}{c} x \\ y \end{array}\right) \\\\ a x^2 + b y^2 + c z^2 + 2 d x y + 2 e x z + 2 f y z &= \left(\begin{array}{ccc} x & y & z \end{array}\right) \left(\begin{array}{ccc} a & d & e \\ d & b & f \\ e & f & c \end{array}\right) \left(\begin{array}{c} x \\ y \\ z \end{array}\right) \end{aligned}

Mainly, the symmetric matrix is used as above, the important note is that the symmetric matrix is not necessary so. Other matrices can be used for this, but the symmetric matrix is convenient since it has useful properties. However, it may occur the symmetric matrix is not available, and in that case, still this form can be derived. More specifically, the general representation is as follows. ax2+by2+cz2+2dxy+2exz+2fyz=(xyz)M(xyz),M=(m11m12m13m21m22m23m31m32m33)    m11=a,m22=b,m33=c,m12+m21=2d,m13+m31=2e,m23+m32=2f\begin{aligned} a x^2 + b y^2 &+ c z^2 + 2 d x y + 2 e x z + 2 f y z = \left(\begin{array}{ccc} x & y & z \end{array}\right) M \left(\begin{array}{c} x \\ y \\ z \end{array}\right), \quad M = \left(\begin{array}{ccc} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \\ m_{31} & m_{32} & m_{33} \end{array}\right) \end{aligned} \\ \begin{aligned} \implies m_{11} = a, \quad m_{22} = b, \quad m_{33} = c, \quad m_{12} + m_{21} = 2d, \quad m_{13} + m_{31} = 2e, \quad m_{23} + m_{32} = 2f \end{aligned}

But, let AA be the symmetric matrix from the above representations from now. Consider an eigenvector vv and an eigenvalue λ\lambda of a matrix AA, which is Av=λvAv = \lambda v. Then, vtAv=vtλv=λvtvv^tAv = v^t\lambda v = \lambda v^t v. Therefore, λ=vtAvvtv\begin{aligned} \lambda = \cfrac{v^t A v}{v^t v} \end{aligned}

Especially, when v=1\Vert v \Vert = 1, λ=vtAvt\lambda = v^t A v^t. As such, the maximum and minimum of eigenvalues can be produced from vtAvtv^t A v^t. Meanwhile, as if a quadratic expression can be rotated to remove its cross-terms such as xyxy, the diagonalization of a matrix acts as deleting these cross-terms. From this perspective, the principal axis theorem appears which introduces a transformation of quadratic forms used in the beginning. Let λi\lambda_i be an eigenvalue of the symmetric matrix AA. Then, the transformation is as follows. ax2+2bxy+cy2=k    λ1x2+λ2y2=kax2+by2+cz2+2dxy+2exz+2fyz=k    λ1x2+λ2y2+λ3z2=k\begin{aligned} a x^2 + 2 b x y + c y^2 = k &\implies \lambda_1 x^2 + \lambda_2 y^2 = k \\ a x^2 + b y^2 + c z^2 + 2 d x y + 2 e x z + 2 f y z = k &\implies \lambda_1 x^2 + \lambda_2 y^2 + \lambda_3 z^2 = k \end{aligned}

2. Diagonal Matrix

For a matrix ARn×nA \in \mathbb{R^{n \times n}}, AA is called diagonalizable if there exists an invertible matrix PP and a diagonal matrix DD such that D=P1APD = P^{-1}AP. Such PP and DD are not unique, and AA and DD are called similar. There are some conditions to be diagonalizable as follows.

  • AA is symmetric.
  • AA has nn linearly independent eigenvectors.
  • AA has nn different eigenvalues.
  • The minimal polynomial consists of distinct linear factors.
  • The algebraic multiplicity is equal to the geometric multiplicity.
    • Assume that the characteristic polynomial of AA is f(λ)=(λα)m(λβ)f(\lambda) = (\lambda - \alpha)^m(\lambda - \beta) \cdots for 1<m<n1 < m < n. Although the algebraic multiplicity of λ=α\lambda = \alpha is mm, it shoud be checked if dimker(AαI)=m\dim \ker(A - \alpha I) = m. Since dimker(AαI)+rank(AαI)=n\dim \ker (A - \alpha I) + \text{rank}(A - \alpha I) = n as mentioned in this note, AA is diagonalizable if nrank(AαI)=mn - \text{rank}(A - \alpha I) = m.

Furthermore, AA and D=P1APD = P^{-1}AP have the following properties in common.

  • A=D\vert A \vert = \vert D \vert,
  • AA and DD have the same eigenvalues, ranks, and invertibility,
  • AA and DD do not always have the same eigenvectors.

3. Key Properties of Symmetric Matrices

For a symmetric matrix ARn×nA \in \mathbb{R^{n \times n}}, all eigenvalues are real numbers. For example, let A=(abbc)A = \left(\begin{array}{cc} a & b \\ b & c \end{array}\right) and I=(1001)I = \left(\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\right). Also, let λ\lambda is an eigenvalue of AA. Then, the discriminant DD can be obtained as follows. AλI=(aλ)(cλ)b2=λ2(a+c)λ+acb2=0    D=(a+c)24(acb2)=(ac)2+4b20\begin{aligned} \vert A - \lambda I \vert &= (a - \lambda)(c - \lambda) - b^2 = \lambda^2 - (a + c)\lambda + ac - b^2 = 0 \\\\ \implies D &= (a + c)^2 - 4(ac - b^2) = (a - c)^2 + 4b^2 \geq 0 \end{aligned}

Therefore, all eigenvalues of AA are real numbers. Meanwhile, the correspondent eigenvectors for different eigenvalues of AA are perpendicular, so AA can be diagonalized. Let v1v_1 and v2v_2 be the correspondent eigenvectors for λ1λ2\lambda_1 \ne \lambda_2, which means Av1=λ1v1A v_1 = \lambda_1 v_1 and Av2=λ2v2A v_2 = \lambda_2 v_2. Considering that xy=xtyx \cdot y = x^t y for x,yRn×1x, y \in \mathbb{R^{n \times 1}} and At=AA^t = A, λ1(v1v2)=(λ1v1)v2=Av1v2=(v1tAt)v2=v1tAv2=v1t(λ2v2)=λ2(v1tv2)=λ2(v1v2)\begin{aligned} \lambda_1 (v_1 \cdot v_2) = (\lambda_1 v_1) \cdot v_2 = Av_1 \cdot v_2 = (v_1^t A^t)v_2 = v_1^t A v_2 = v_1^t (\lambda_2 v_2) = \lambda_2 (v_1^t v_2) = \lambda_2 (v_1 \cdot v_2) \end{aligned}

This implies that (λ1λ2)(v1v2)=0(\lambda_1 - \lambda_2)(v_1 \cdot v_2) = 0. Since λ1λ2\lambda_1 \ne \lambda_2, v1v2=0v_1 \cdot v_2 = 0. As such, the correspondent eigenvectors are perpendicular. Other key properties can be listed as follows. Let BB be also a symmetric matrix. Then, the following statements hold.

  • A2A^2, A3A^3, and A+BA+B are also symmetric.

    (A2)t=AtAt=AA=A2(A^2)^t = A^t A^t = AA = A^2, (A+B)t=At+Bt=A+B\quad (A+B)^t = A^t + B^t = A+B.

  • ABAB is always not symmetric if ABBAAB \ne BA. Otherwise, ABAB is symmetric.

    (AB)t=BtAt=BAAB(AB)^t = B^t A^t = BA \ne AB.

  • If AA is invertible, A1A^{-1} is symmetric.

    (A1)t=(At)1=A1(A^{-1})^t = (A^t)^{-1} = A^{-1}.

4. Skew-Symmetric Matrix

For a matrix ARn×nA \in \mathbb{R^{n \times n}}, AA is skew-symmetric when At=AA^t = -A. Besides, if AA is a skew-symmetric matrix and invertible, then A1A^{-1} is skew-symmetric as well. Other than these, there are ways to induce a skew-symmetric matrix from any square matrix. (AAt)t=AtA=(AAt)    AAt is skew-symmetric\begin{aligned} (A - A^t)^t = A^t - A = -(A - A^t) \implies A - A^t \text{ is skew-symmetric} \end{aligned}

Note that, similarly, A+AtA + A^t and AtAA^t A are symmetric. Therefore, it implies that any square matrix AA can be represented with the sum of a symmetric matrix and a skew-symmetric matrix. A=12(A+At)+12(AAt)=(symmetric matrix)+(skew-symmetric matrix)\begin{aligned} A = \frac{1}{2} (A + A^t) + \frac{1}{2} (A - A^t) = (\text{symmetric matrix}) + (\text{skew-symmetric matrix}) \end{aligned}

5. Trace Properties

Other than the obvious properties of the trace, there are two things to remember. For square matrices AA and BB, and an invertible square matrix PP,

  • tr(AB)=tr(BA)\text{tr}(AB) = \text{tr}(BA),
  • tr(A)=tr(B)\text{tr}(A) = \text{tr}(B) when B=P1APB = P^{-1}AP,
  • tr(A)=λ1++λn\text{tr}(A) = \lambda_1 + \cdots + \lambda_n, that is, the trace of AA is the sum of all eigenvalues of AA.

Although ABBAAB \ne BA, the traces of them are the same. Besides, when AA and BB are similar, the traces of them are the same as well. It implies that any transformation after a coordinates change keeps the trace if it comes back to the original coordinates.

6. Determinant Consistency

For a matrix ARn×nA \in \mathbb{R^{n \times n}}, the determinant of AA denotes the volume of nn-dimensional parallelopiped. As mentioned in this note, shape deformation of this parallelopiped does not change its original volume. Let A=(a1an)A = (a_1 \cdots a_n) for column vectors aiRn×1a_i \in \mathbb{R^{n \times 1}}. If ai=v1+v2a_i = v_1 + v_2, detA=A=(a1v1an)+(a1v2an)\begin{aligned} \det A = \vert A \vert = \vert (a_1 \cdots v_1 \cdots a_n) \vert + \vert (a_1 \cdots v_2 \cdots a_n) \vert \end{aligned}

7. Determinant of a Block Matrix

Given block matrices ARn×nA \in \mathbb{R^{n \times n}}, BRm×mB \in \mathbb{R^{m \times m}}, CRm×nC \in \mathbb{R^{m \times n}}, and DRn×mD \in \mathbb{R^{n \times m}},

  • AOCB=AB\left\vert \begin{array}{cc} A & O \\ C & B \end{array} \right\vert = \left\vert A \right\vert \left\vert B \right\vert,
  • DABO=(1)nmAB\left\vert \begin{array}{cc} D & A \\ B & O \end{array} \right\vert = (-1)^{nm} \left\vert A \right\vert \left\vert B \right\vert,

where OO is a zero matrix. Note that this does not imply ABCD=ADBC\left\vert \begin{array}{cc}A & B \\ C & D \end{array} \right\vert = \left\vert A \right\vert \left\vert D \right\vert - \left\vert B \right\vert \left\vert C \right\vert.

8. Area of Polygons

For a matrix A=(a1an)Rm×nA = (a_1 \cdots a_n) \in \mathbb{R^{m \times n}} whose column vectors are aiRm×1a_i \in \mathbb{R^{m \times 1}}, the area of the polygon determined from these column vectors is detAtA\begin{aligned} \sqrt{ \vert \det A^t A \vert } \end{aligned}

9. Rank Properties

  • rank(At)=rank(A)=rank(AtA)\text{rank}(A^t) = \text{rank}(A) = \text{rank}(A^t A) for a matrix ARm×nA \in \mathbb{R^{m \times n}}.
  • rank(AB)=rank(B)=rank(BA)\text{rank}(AB) = \text{rank}(B) = \text{rank}(BA) if a matrix AA is invertible for A,BRn×nA, B \in \mathbb{R^{n \times n}}.

10. Orthogonal Projection

Let ProjabProj_{\vec{a}} \vec{b} be the projection of b\vec{b} on a\vec{a}. Then, Projab=bababaa=(aba2)a\begin{aligned} Proj_{\vec{a}} \vec{b} = \Vert \vec{b} \Vert \cfrac{\vec{a} \cdot \vec{b}}{\Vert \vec{a} \Vert \Vert \vec{b} \Vert} \cfrac{\vec{a}}{\Vert \vec{a} \Vert} = \left( \cfrac{\vec{a} \cdot \vec{b}}{\Vert \vec{a} \Vert^2}\right) \vec{a} \end{aligned}

Here, the coefficient of this projection vector is called Fourier coefficient. Moreover, for a matrix ARm×nA \in \mathbb{R^{m \times n}} whose column vectors are a basis of a vector space WW, the projection of b\vec{b} on WW ProjWb=Ab=A(AtA)1(Atb)=(be1)e1++(ben)en\begin{aligned} Proj_{W} \vec{b} &= A \vec{b} = A(A^t A)^{-1} (A^t \vec{b}) \\ &= (\vec{b} \cdot \vec{e_1})\vec{e_1} + \cdots + (\vec{b} \cdot \vec{e_n})\vec{e_n} \end{aligned}

where {e1,,en}\{ \vec{e_1}, \cdots, \vec{e_n} \} is an orthogonal basis of WW. Note that A(AtA)1AtA(A^t A)^{-1} A^t is the standard matrix of this orthogonal projection. Let this standard matrix be PP. Then, Pt=PP^t = P which means PP is symmetric. Also, P2=PP^2 = P which means the projected point keeps the same place after projecting again as mentioned in this note. For example, for a vector bR3×1\vec{b} \in \mathbb{R^{3 \times 1}}, its projection on the plane WW is as below figure.

VectorProjection

11. Cross Product

Other than the basic features, there are things to remember.

  • Triple product a(b×c)=b(c×a)=c(a×b)\vec{a} \cdot (\vec{b} \times \vec{c}) = \vec{b} \cdot (\vec{c} \times \vec{a}) = \vec{c} \cdot (\vec{a} \times \vec{b}) means that the volume of the parallelepiped defined by the three vectors. So, the volume of the tetrahedron is a(b×c)/6\vert \vec{a} \cdot (\vec{b} \times \vec{c}) \vert / 6.
  • a×(b×c)=(ac)b(ab)c(a×b)×c\vec{a} \times (\vec{b} \times \vec{c}) = (\vec{a} \cdot \vec{c})\vec{b} - (\vec{a} \cdot \vec{b})\vec{c} \ne (\vec{a} \times \vec{b}) \times \vec{c}, so it is not associative.
  • By Lagrange’s identity, (a×b)(c×d)=(ac)(bd)(ad)(bc)(\vec{a} \times \vec{b}) \cdot (\vec{c} \times \vec{d}) = (\vec{a} \cdot \vec{c})(\vec{b} \cdot \vec{d}) - (\vec{a} \cdot \vec{d})(\vec{b} \cdot \vec{c}).
  • Interestingly, a×b2+(ab)2=a2b2\Vert \vec{a} \times \vec{b} \Vert^2 + (\vec{a} \cdot \vec{b})^2 = \Vert \vec{a} \Vert^2 \Vert \vec{b} \Vert^2.

12. Cayley-Hamilton Theorem

For a matrix ARn×nA \in \mathbb{R^{n \times n}} and the identity matrix II of size nn, let D(λ)=AλID(\lambda) = \vert A - \lambda I \vert. By observation, D(A)=0D(A) = 0. In other words, AA is a root of the characteristic equation f(λ)=AλI=0f(\lambda) = \vert A - \lambda I \vert = 0. Here, f(A)=0f(A) = 0 is called the Cayley-Hamilton theorem. This theorem leads to the inverse matrix of AA as follows. Define the characteristic equation f(λ)=λn+an1λn1++a1λ+a0I=0f(\lambda) = \lambda^n + a_{n-1}\lambda^{n-1} + \cdots + a_1 \lambda + a_0 I = 0. Then, f(A)=An+an1An1++a1A+a0I=0    A1f(A)=An1+an1An2++a1I+a0A1=0    A1=1a0(An1an1An2a1I)\begin{aligned} f(A) &= A^n + a_{n-1}A^{n-1} + \cdots + a_1 A + a_0 I = 0 \\ \implies A^{-1} f(A) &= A^{n-1} + a_{n-1}A^{n-2} + \cdots + a_1 I + a_0 A^{-1} = 0 \\ \implies A^{-1} &= \frac{1}{a_0} \left( -A^{n-1} - a_{n-1} A^{n-2} - \cdots - a_1 I \right) \end{aligned}

13. Transformations

  • Finding a symmetric point about a line y=(tanθ)xy = (\tan \theta)x
(cos2θsin2θsin2θcos2θ)\begin{aligned} \left(\begin{array}{cc} \cos 2 \theta & \sin 2 \theta \\ \sin 2 \theta & -\cos 2 \theta \end{array}\right) \end{aligned}
  • Finding a projected point about a line y=(tanθ)xy = (\tan \theta)x
(cos2θsinθcosθsinθcosθsin2θ)\begin{aligned} \left(\begin{array}{cc} \cos^2 \theta & \sin \theta \cos \theta \\ \sin \theta \cos \theta & \sin^2 \theta \end{array}\right) \end{aligned}
  • Finding a symmetric point about a plane nx=0n \cdot x = 0 where nn is the normal
I2nntntn\begin{aligned} I - 2 \cfrac{n n^t}{n^t n} \end{aligned}
  • Finding a projected point about a plane nx=0n \cdot x = 0 where nn is the normal
Inntntn\begin{aligned} I - \cfrac{n n^t}{n^t n} \end{aligned}

14. Jacobian

As if the substitution method in a definite integral causes the size of variables to change such as x2=t2xdx=dtx^2 = t \to 2xdx = dt, the substitution method in a double integral makes the region area change. Therefore, this difference must be compensated, which is what the Jacobian determinant does. Meanwhile, geometrically the Jacobian determinant also means the instantaneous rate of change of area. Suppose f:RnRmf: \mathbb{R}^n \to \mathbb{R}^m is a function such that each of its first-order partial derivatives exists on Rn\mathbb{R}^n. This function takes a point xRnx \in \mathbb{R}^n as input and produces the vector f(x)Rmf(x) \in \mathbb{R}^m as output. Then the Jacobian matrix JRm×nJ \in \mathbb{R}^{m \times n} is defined as follows. J=(tf1tfm)=(f1x1f1xnfmx1fmxn)\begin{aligned} J = \left(\begin{array}{c} \nabla^t f_1 \\ \vdots \\ \nabla^t f_m \end{array}\right) = \left(\begin{array}{ccc} \cfrac{\partial f_1}{\partial x_1} & \cdots & \cfrac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \cfrac{\partial f_m}{\partial x_1} & \cdots & \cfrac{\partial f_m}{\partial x_n} \end{array}\right) \end{aligned}

where tfi\nabla^t f_i is the transpose of the gradient of the ii-th component. Now, assume that x=f(u,v)x = f(u, v) and y=g(u,v)y = g(u, v). Then, the Jacobian determinant is J=(x,y)(u,v)=xuxvyuyv=1(u,v)(x,y)=1uxuyvxvy\begin{aligned} \vert J \vert = \left \vert \cfrac{\partial (x, y)}{\partial (u, v)} \right \vert = \left \vert \begin{array}{cc} x_u & x_v \\ y_u & y_v \end{array} \right \vert = \cfrac{1}{\left \vert \cfrac{\partial (u, v)}{\partial (x, y)} \right \vert} = \cfrac{1}{ \left \vert \begin{array}{cc} u_x & u_y \\ v_x & v_y \end{array} \right \vert } \end{aligned}

Moreover, if J=0\vert J \vert = 0, there exists a functional relationship between xx and yy. As such, the transformation is not invertible, which means that there is no way to get xx and yy back from uu and vv. Similarly, assume that x=f(u,v,t)x = f(u, v, t), y=g(u,v,t)y = g(u, v, t), and z=h(u,v,t)z = h(u, v, t). Then, the Jacobian determinant is J=(x,y,z)(u,v,t)=xuxvxtyuyvytzuzvzt\begin{aligned} \vert J \vert = \left \vert \cfrac{\partial (x, y, z)}{\partial (u, v, t)} \right \vert = \left \vert \begin{array}{ccc} x_u & x_v & x_t \\ y_u & y_v & y_t \\ z_u & z_v & z_t \end{array} \right \vert \end{aligned}

Again, if J=0\vert J \vert = 0, there exists a functional relationship between xx, yy, zz. As such, the transformation is not invertible, which means that there is no way to get xx, yy, zz back from uu, vv, tt.


© 2024. All rights reserved.