当前位置：网站首页>Normal equation

Normal equation

2022-06-24 10:18:00 【Wanderer001】

Reference resources Normal equation - cloud + Community - Tencent cloud

Catalog

One 、 What is a normal equation

Two 、 Use of normal equations

3、 ... and 、 Irreversible case

Four 、 Comparison between normal equation and gradient descent method

One 、 What is a normal equation

The gradient descent method is used to calculate the optimal solution of parameters , The process is to find the partial derivative of each parameter of the cost function , Update step by step through iterative algorithm , Until it converges to the global minimum , The optimal parameters are obtained .

The normal equation is to find the optimal solution at one time .

thought ： For a simple function , Derivation of parameters , Set the value to 0, You get the value of the parameter . Like this ：

Real world examples have many parameters , We're going to find the partial derivatives for all these parameters , Get the optimal solution of each parameter , That is, the global optimal solution . But the difficulty is , This is a waste of time .

Two 、 Use of normal equations

Examples are as follows ：

here 4 Samples , as well as 4 Characteristic variables x1,x2,x3,x4, The observation is y, When listing cost functions , You need to add an end parameter x0, as follows ：

Then save the characteristic parameters in X Matrix , Do the same for the observations and save them in the vector y in , Pictured ：

Then we get the parameters by the following formula θ Optimal solution .

About this formula ：

For all the characteristic parameters of a training sample, we can use x(i) Vector to represent （ Be careful x0(i) To add ） , The design matrix can be expressed as X, Is the transpose of all sample vectors ,y Is the vector of observations , After this expression, you can use the above formula to directly calculate Θ The best solution .

3、 ... and 、 Irreversible case

Notice that the normal equation has a The process of finding the inverse matrix , When the matrix is irreversible , There are generally two reasons ：

Superfluous features （ Linear correlation ）
Too many features （ for example ：m≤n）, terms of settlement ： Delete some features , Or regularization

Actually , The essential reason is linear knowledge ：

First , These are two necessary conditions ,

According to the nature ：r(ATA) = r(A),ATA Reversibility can be transformed into A Reversibility of .

The first one is ： It's actually a linearly related column vector , The rank of a matrix < The dimensions of the matrix , Irreversible ;

The second kind ：

m < n when , That is, the dimension is less than the number of vectors , Here, that is, the number of samples is less than the characteristic number , Linear correlation
m = n when , When |A| = 0 Time is irreversible ,|A| != 0 Time reversible

Four 、 Comparison between normal equation and gradient descent method

Gradient descent method ：

shortcoming ：

We need to choose the learning rate α
It takes several iterations

advantage ：

When the characteristic parameter is large , Gradient descent also works well

Normal equation ：

shortcoming ：

Need to compute , The amount of calculation is about the third power of the matrix dimension , High complexity .
When the characteristic parameter is large , The calculation is slow

advantage ：

No need for learning rate α
No more iterations are required

summary ： Depends on the number of eigenvectors , Quantity less than 10000 when , Choose the normal equation ; Greater than 10000, Consider gradient descent or other algorithms .

原网站

版权声明
本文为[Wanderer001]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/175/202206240914572846.html