当前位置:网站首页>Normal equation

Normal equation

2022-06-24 10:18:00 Wanderer001

Reference resources   Normal equation - cloud + Community - Tencent cloud

Catalog

One 、 What is a normal equation

Two 、 Use of normal equations

3、 ... and 、 Irreversible case

Four 、 Comparison between normal equation and gradient descent method


One 、 What is a normal equation

The gradient descent method is used to calculate the optimal solution of parameters , The process is to find the partial derivative of each parameter of the cost function , Update step by step through iterative algorithm , Until it converges to the global minimum , The optimal parameters are obtained .

The normal equation is to find the optimal solution at one time .

thought : For a simple function , Derivation of parameters , Set the value to 0, You get the value of the parameter . Like this :

                           

Real world examples have many parameters , We're going to find the partial derivatives for all these parameters , Get the optimal solution of each parameter , That is, the global optimal solution . But the difficulty is , This is a waste of time .

Two 、 Use of normal equations

Examples are as follows :

                                      

here 4 Samples , as well as 4 Characteristic variables x1,x2,x3,x4, The observation is y, When listing cost functions , You need to add an end parameter x0, as follows :

                                    

Then save the characteristic parameters in X Matrix , Do the same for the observations and save them in the vector y in , Pictured :

                                     

Then we get the parameters by the following formula θ Optimal solution .

                                                    

About this formula :

                 

For all the characteristic parameters of a training sample, we can use x(i) Vector to represent ( Be careful x0(i) To add ) , The design matrix can be expressed as X, Is the transpose of all sample vectors ,y Is the vector of observations , After this expression, you can use the above formula to directly calculate Θ The best solution .

3、 ... and 、 Irreversible case

Notice that the normal equation has a The process of finding the inverse matrix , When the matrix is irreversible , There are generally two reasons :

  • Superfluous features ( Linear correlation )
  • Too many features ( for example :m≤n), terms of settlement : Delete some features , Or regularization

Actually , The essential reason is linear knowledge :

First , These are two necessary conditions ,

According to the nature :r(ATA) = r(A),ATA Reversibility can be transformed into A Reversibility of .

The first one is : It's actually a linearly related column vector , The rank of a matrix < The dimensions of the matrix , Irreversible ;

The second kind :

  • m < n when , That is, the dimension is less than the number of vectors , Here, that is, the number of samples is less than the characteristic number , Linear correlation
  • m = n when , When |A| = 0 Time is irreversible ,|A| != 0 Time reversible

Four 、 Comparison between normal equation and gradient descent method

Gradient descent method :

shortcoming :

  • We need to choose the learning rate α
  • It takes several iterations

advantage :

  • When the characteristic parameter is large , Gradient descent also works well

Normal equation :

shortcoming :

  • Need to compute , The amount of calculation is about the third power of the matrix dimension , High complexity .
  • When the characteristic parameter is large , The calculation is slow

advantage :

  • No need for learning rate α
  • No more iterations are required

summary : Depends on the number of eigenvectors , Quantity less than 10000 when , Choose the normal equation ; Greater than 10000, Consider gradient descent or other algorithms .

原网站

版权声明
本文为[Wanderer001]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240914572846.html