For an matrix and a vector , the linear system is to find the optimal solution .
1. The Residual Minimization
The problem can be replaced by the problem minimizing the residual, . Let be the -th row vector of . Then the cost function can be defined as follows:
For minimizing , finding the critical point is required. Let be the element of , then
This equation is called normal equation. Moreover, is positive (semi-)definite because , so it minimizes . Especially, if , it tells that there is no such that except . It also means that is nonsingular whose column vectors are linearly independent. Therefore, the solution of is the optimal solution of when is nonsingular.
2. Another Approach When
In this case, maps the lower dimension to the higher dimension. For example, when and , there may be a point in the objective space which cannot cover.
So, the linear system cannot be solved exactly. However, we can still find the point which is closest to . Let be the closest point to .
When is closest to , the vector is perpendicular to the normal of hyperplane . Suppose that are the standard basis in the original space. Then consist of the basis of the objective space. In other words, each column vector of consists of the hyperplane , which means it is perpendicular to . Therefore, , it yields the normal equation. Accordingly, the normal equation estimates the optimal solution of even though the exact solution cannot be found.
3. Another Approach Using Projector
A square matrix is idempotent if , and it is called projector. projects any vector to and if the vectors are already on , they would be still after projected. If is symmetric, it is called orthogonal projector, and is also orthogonal projector to the hyperplane which is perpendicular to . Of course, . For a vector ,
Now, to apply this to solving , assume that is the orthogonal projector onto , which means . Then the cost function can be rewritten as
Therefore, for minimizing this cost function, should be zero.
Note that it produces the normal equation as well.
The solution from the normal equation can be evaluated by the condition number and angle between and . The condition number of tells how close to singular is. However, if , is not invertible, so the pseudoinverse should be introduced. In this case, the condition number of is
Meanwhile, as the angle between and is smaller, is closer to , so it is a good measure to find whether the solution is well-conditioned. This angle can be obtained as .
 Michael T. Heath, Scientific Computing: An Introductory Survey. 2nd Edition, McGraw-Hill Higher Education.