当前位置:网站首页>Focus! Ten minutes to master Newton convex optimization
Focus! Ten minutes to master Newton convex optimization
2022-06-23 08:13:00 【Red stone】
We know , Gradient descent algorithm is a first-order optimization method using gradient , The Newton optimization algorithm I introduced today adopts second-order optimization . This paper will focus on the basic concept and derivation of Newton's method , The gradient descent is compared with Newton's method .
1
Newton's method solves the root of the equation
occasionally , In the case of complex equations , It is not easy to solve its root by general methods . Newton's method adopts the method of iteration and the idea of continuous approximation , The more accurate root of the equation can be obtained approximately .
The core idea of Newton's method is Taylor's first-order expansion . For example, for the equation f(x) = 0, We're at any point x0 It's about , First order Taylor expansion :
Make f(x) = 0, Bring it up , You can get :
Be careful , The approximation is used here , Got x It's not the root of the equation , It's just an approximate solution . however ,x Than x0 Closer to the root of the equation . The effect is shown below :
then , Using the iterative method to solve , With x by x0, Find the next position closer to the root of the equation . The iterative formula can be written as :
After a certain number of effective iterations , Generally, it can guarantee convergence at the root of the equation . A dynamic demonstration of the whole iterative convergence process is given below .
2
Newton convex optimization
The last part introduces how Newton's method solves the root of the equation , This property can be applied to the optimization of convex functions .
machine learning 、 Deep learning , The optimization problem of loss function is generally based on the gradient descent of the first derivative . Now? , Look at it from another Angle , To minimize the loss function , This is actually a maximum value problem , The first derivative of the corresponding function f'(x) = 0. in other words , If we find a way to f'(x) = 0 The point of x, The loss function is minimized , So the goal of model optimization is realized .
The goal now is to calculate f'(x) = 0 Corresponding x, namely f'(x) = 0 The root of the . It turns into the problem of finding roots , You can use the Newton method in the previous section .
Different from the previous section , First , Yes f(x) stay x0 Let's do a second-order Taylor expansion at :
The condition of the above formula is f(x) Approximately equal to f(x0). Make f(x) = f(x0), Also on (x - x0) Derivation , Available :
Again , although x It's not the optimal solution point , however x Than x0 Closer to the f'(x) = 0 The root of the . such , We can get the optimal iteration formula :
Through the iterative formula , You can find it all the time f'(x) = 0 The approximate roots of , Thus, the optimization objective of minimizing the loss function is realized .
3
gradient descent VS Newton method
Now? , Write the renewal formulas of gradient descent and Newton method respectively :
The gradient descent algorithm is to put the function in xn The position is approximated by a function , It's a straight line . Calculate the gradient , So the next optimization direction is the opposite direction of gradient . And Newton's method is to put the function in xn The position is approximated by a second-order function , It's a quadratic curve . Calculate the gradient and the second derivative , So as to determine the next optimization direction . The schematic diagram of first-order optimization and second-order optimization is as follows :
gradient descent : First order optimization
Newton method : Second order optimization
What is mentioned above is the difference between the gradient descent method and Newton method . So whose optimization effect is better ?
First , Let's look at the advantages of Newton's method . First of all , There is no parameter learning factor in the iterative updating formula of Newton method , There is no need to select appropriate learning factors through cross validation . second , Newton's method is thought to make use of the information of the curve itself , It converges more easily than the gradient descent method ( Fewer iterations ). The following figure is an example of minimizing an objective equation , The red curve is solved iteratively by Newton method , The green curve is solved by gradient descent method . obviously , Newton's optimization is faster .
then , Let's look at the shortcomings of Newton's method . We have noticed that in the iterative formula of Newton's method, in addition to solving the first derivative , And we have to calculate the second derivative . From a matrix point of view , The first derivative and the second derivative correspond to Jacobian matrix respectively (Jacobian matrix) And Hessian matrix (Hessian matrix).
Jacobian matrix
Hessian matrix
Newton's method doesn't just need calculation Hessian matrix , And you need to calculate Hessian The matrix of the inverse . When the amount of data is small , The speed of computation is not greatly affected . however , When there's a lot of data , Especially in deep neural networks , Calculation Hessian Matrix and its inverse matrix are very time consuming . From the overall effect point of view , Newton's optimization is not as fast as gradient descent algorithm . therefore , At present, most optimization strategies of neural network loss function are based on gradient descent .
It is worth mentioning that , In view of the shortcomings of Newton's method , At present, there are some improved algorithms . This kind of improved algorithm is called quasi Newton algorithm . What is more representative is BFGS and L-BFGS.
BFGS The algorithm uses an approximate method to calculate Hessian The matrix of the inverse , The operation speed is effectively improved . But we still need to put the whole Hessian The approximate inverse matrix is stored , Space costs are high .
L-BFGS The algorithm is right BFGS Algorithm improvement , No storage required Hessian approximation inverse matrix , Instead, the search direction of this round is obtained directly through the iterative algorithm , The space cost is greatly reduced .
in general , Optimization algorithm based on gradient descent , It is more widely used in practice , for example RMSprop、Adam etc. . however , An improved algorithm of Newton's method , for example BFGS、L-BFGS It also has its own characteristics , It also has strong practicability .
边栏推荐
- Openvino series 19 Openvino and paddleocr for real-time video OCR processing
- Does huangrong really exist?
- QT irregular shape antialiasing
- Copy image bitmap by C # memory method
- Configuration asmx not accessible
- Sequence table Curriculum
- C Scrollview scroll up or scroll down
- On ThreadLocal and inheritablethreadlocal, source code analysis
- 爬虫框架
- 力扣(LeetCode)173. 二叉搜索树迭代器(2022.06.22)
猜你喜欢

开源软件、自由软件、Copyleft、CC都是啥,傻傻分不清楚?

值得反复回味的81句高人高语

实战监听Eureka client的缓存更新

建立一有序的顺序表,并实现下列操作: 1.把元素x插入表中并保持有序; 2.查找值为x的元素,若找到将其删除; 3.输出表中各元素的值。

VTK. Le bouton gauche de la souris JS glisse pour changer le niveau et la largeur de la fenêtre

如何在conda虚拟环境开启jupyter-notebook

openni. utils. OpenNIError: (OniStatus.ONI_STATUS_ERROR, b‘DeviceOpen using default: no devices found‘
![Vulnhub | dc: 3 | [actual combat]](/img/97/e5ba86f2694fe1705c13c60484cff6.png)
Vulnhub | dc: 3 | [actual combat]

Acwing第 56 場周賽【完結】
![Vulnhub | dc: 4 | [actual combat]](/img/33/b7422bdb18f39e9eb55855dbf1d584.png)
Vulnhub | dc: 4 | [actual combat]
随机推荐
imperva-查找正则匹配超时的方法
值得反复回味的81句高人高语
图像分割-改进网络结构
QT irregular shape antialiasing
MySQL common skills
Match 56 de la semaine d'acwing [terminé]
Google common syntax
GTEST death test
11 字符串函数
Take you to tiktok. That's it
typeScript的介绍与变量定义的基本类型
生产环境服务器环境搭建+项目发布流程
On ThreadLocal and inheritablethreadlocal, source code analysis
船长阿布的灵魂拷问
Tensorboard的使用
@Controller和@RestController的区别?
实战监听Eureka client的缓存更新
C Scrollview scroll up or scroll down
Vulnhub | DC: 3 |【实战】
Ad object of Active Directory