For an m×n matrix A and a vector b∈Rm, the linear system Ax=b is to find the optimal solution x∈Rn.
1. The Residual Minimization
The problem Ax=b can be replaced by the problem minimizing the residual, ∥b−Ax∥22. Let Ai be the i-th row vector of A. Then the cost function f(x) can be defined as follows: f(x)=∥b−Ax∥22=∥∥⎝⎛b1−A1tx⋮bm−Amtx⎠⎞∥∥22=k∑(bk−Aktx)2
For minimizing f, finding the critical point is required. Let aij be the (i,j) element of A, then ∂xi∂f=k∑2(bk−Aktx)∂xi∂(bk−Aktx)=2k∑(bk−Aktx)(−aki)=0⟹k∑akiAktx=k∑akibk=(a1i⋯ami)⎝⎛A1t⋮Amt⎠⎞x=(a1i⋯ami)⎝⎛b1⋮bm⎠⎞⟹⎝⎛a11a1n⋯⋮⋯am1amn⎠⎞⎝⎛A1t⋮Amt⎠⎞x=⎝⎛a11a1n⋯⋮⋯am1amn⎠⎞⎝⎛b1⋮bm⎠⎞⟹AtAx=Atb
This equation is called normal equation. Moreover, AtA is positive (semi-)definite because xtAtAx=(Ax)t(Ax)=∥Ax∥22≥0, so it minimizes f. Especially, if ∥Ax∥22=0, it tells that there is no x such that Ax=0 except x=0. It also means that A is nonsingular whose column vectors are linearly independent. Therefore, the solution of AtAx=Atb is the optimal solution of Ax=b when A is nonsingular.
2. Another Approach When m>n
In this case, A maps the lower dimension to the higher dimension. For example, when m=3 and n=2, there may be a point b in the objective space which Ax cannot cover.
So, the linear system Ax=b cannot be solved exactly. However, we can still find the point which is closest to b. Let y=Ax be the closest point to b.
When y is closest to b, the vector b−y=b−Ax is perpendicular to the normal of hyperplane ImA. Suppose that e1,⋯,en are the standard basis in the original space. Then Ae1,⋯,Aen consist of the basis of the objective space. In other words, each column vector of A consists of the hyperplane ImA, which means it is perpendicular to b−Ax. Therefore, At(b−Ax)=0, it yields the normal equation. Accordingly, the normal equation estimates the optimal solution of Ax=b even though the exact solution cannot be found.
3. Another Approach Using Projector
A square matrix P is idempotent if P2=P, and it is called projector. P projects any vector to ImP and if the vectors are already on ImP, they would be still after projected. If P is symmetric, it is called orthogonal projector, and I−P is also orthogonal projector to the hyperplane which is perpendicular to ImP. Of course, (I−P)2=I−2P+P2=I−P. For a vector v, (Pv)t((I−P)v)=vtPt(v−Pv)=vtP(v−Pv)=vtPv−vtP2v=vtPv−vtPv=0⟹Pv⊥(I−P)v⟹v=Pv+(I−P)v
Now, to apply this to solving Ax=b, assume that P is the orthogonal projector onto spanA, which means PA=A. Then the cost function f(x) can be rewritten as f(x)=∥b−Ax∥22=∥P(b−Ax)+(I−P)(b−Ax)∥22=∥P(b−Ax)∥22+∥(I−P)(b−Ax)∥22=∥Pb−PAx∥22+∥(I−P)b−(I−P)Ax∥22=∥Pb−Ax∥22+∥(I−P)b∥22
Therefore, for minimizing this cost function, ∥Pb−Ax∥22 should be zero. Pb−Ax=AtPb−AtAx=AtPtb−AtAx=(PA)tb−AtAx=Atb−AtAx=O
Note that it produces the normal equation as well.
4. Evaluation
The solution from the normal equation can be evaluated by the condition number and angle between b and Ax. The condition number of A tells how close to singular A is. However, if m=n, A is not invertible, so the pseudoinverse A+=(AtA)−1At should be introduced. In this case, the condition number of A is Cond2(A)=∥A∥2∥∥A+∥∥2
Meanwhile, as the angle between b and Ax is smaller, b is closer to Ax, so it is a good measure to find whether the solution is well-conditioned. This angle can be obtained as cosθ=∥b∥2∥Ax∥2.
Reference
[1] Michael T. Heath, Scientific Computing: An Introductory Survey. 2nd Edition, McGraw-Hill Higher Education.
Keep going!Keep going ×2!Give me more!Thank you, thank youFar too kind!Never gonna give me up?Never gonna let me down?Turn around and desert me!You're an addict!Son of a clapper!No wayGo back to work!This is getting out of handUnbelievablePREPOSTEROUSI N S A N I T YFEED ME A STRAY CAT