当前位置:网站首页>Understand dynamic calculation diagram, requires_ grad、zero_ grad
Understand dynamic calculation diagram, requires_ grad、zero_ grad
2022-07-24 18:33:00 【blanklog】
Dynamic computing graph is built in the process of forward propagation of program , It is mainly used for back propagation . Compared with the calculation method of each layer when building the network structure , The main perspective of the calculation diagram is the data node (Tensor).
There are some confusing concepts in the process of computational graph construction and back propagation , for example is_leaf、requires_grad、detach()、zero_grad()、 retain_grad()、torch.nograd(). Understand these concepts from the perspective of computational graph back propagation , Everything becomes clear .
Back propagation in dynamic graphs

chart 1 Dynamic calculation diagram
The above figure is the schematic diagram of the calculation diagram :X1 and X2 There are two sets of input data Tensor,P1 and P2 Is the weight of the network Tensor,Y and Z Is the intermediate result of the calculation Tensor,Fn It is the calculation operation to get the intermediate result .
The ultimate goal of the training network is to update P1 and P2 Value , So we need to calculate loss About P1 and P2 Gradient of , In order to get information about P1 and P2 Gradient of , It needs to be calculated in sequence loss About intermediate results Y and Z Gradient of .
The weight Tensor The gradient calculation of does not depend on input data and input data X There is no need to update the value , therefore X1 and X2 There's no need to calculate the gradient .
So by default, user created input data requirs_grad=False, Weight parameters of the network requirs_grad=True
1. Leaf node
Created directly by the user's hand of God Tensor Is a leaf node , These nodes have no records grad_fn Parameters ( For example, input data network weight ).
It is derived from the leaf node that needs gradient calculation through operation Tensor Is a non leaf node , These nodes have grad_fn Parameters .
Leaf nodes are nodes that are on the periphery of or outside the computational graph , They are the end of back propagation .
For example, in the following code X and XX,P1,P2( chart 2 Medium green dot ) Are leaf nodes ,Y and loss( White dots in the picture ) Are non leaf nodes .
p1=torch.tensor([1.0,2.0,3.0],requires_grad=True)
p2=torch.tensor([1.1,2.2,3.3],requires_grad=True)
x=torch.tensor([4.0,5.0,6.0],requires_grad=False)
xx=x**2
y1=p+xx
y1=y1.detach()
y2=torch.sigmoid(y1) y3=y2+p2 y4=torch.sigmoid(y3) loss=y4.mean()

chart 2 Leaf node diagram
2. requires_grad
This attribute indicates the current Tensor Need to calculate from loss Gradient of , Because leaf nodes P1 and P2 You need to calculate the gradient update parameters , All derived from them to loss Access to ( In the middle ) You need to calculate the gradient (requires_grad==True).
For example, the orange node in the following figure .

chart 3 You need to calculate the schematic diagram of gradient nodes
pytorch Cannot add non leaf nodes requires_grad Set to False, After all, one of the above Y Terminating the gradient calculation seems strange and inelegant . If you want to stop right P1 Calculate the gradient , You can directly P1 Of requires_grad Set to False, And so on Y1,Y2 Of requires_grad Also for False.
3. detach()
If at this time P1 and P2 I want to use two loss How to update , For example, the generation of counter networks , Now you can put Y2 Conduct detach(), Here's the picture :

chart 4 detach Sketch Map
detach() What is returned is a copy , This copy becomes a leaf node and is disconnected from the original computing graph , And a new calculation diagram can be constructed . In this way, through two loss It can be calculated separately P1,P2 The gradients of do not interfere with each other .
4. zero_grad()
If we feed data to the network twice , Here's the picture 5 Shown , The calculation diagram has another branch .

chart 5 Schematic diagram of two forward propagation
- If we want to use it twice at this time loss Gradient update network , On the other hand loss1 and loss2 Back propagation respectively ,P1 and P2 The two gradients will be accumulated . It's equivalent to an increase in batch Size (loss1+loss2), Or update the twin network structure .
loss1.back_ward()
loss2.back_ward()
optimizer.step()
# Equivalent to
(loss1+loss2).back_ward()If we want to use it twice loss Update the weight parameters respectively , After each weight parameter update , Set the weight parameter gradient of the network to zero ; Otherwise, it will accumulate to the next loss In the gradient of , Influence weight update .
loss1.back_ward() // Calculation loss1 Gradient of optimizer.step() // according to loss1 Gradient update weight
optimizer.zero_grad() // Set the weight parameter of the network to zero loss2.back_ward() // Calculation loss2 Gradient of
optimizer.step() // according to loss2 Gradient update weight
5. retain_grad()
The ultimate goal of calculating the gradient is to update the weight parameters for the optimizer , That is, only P1 and P2 Gradient information is used , The gradient information of other non leaf nodes only plays the role of backward propagation , Throw it away after use. There is no need to keep it .

chart 6 Calculate the node of the gradient
In order to retain the gradient information of the node after the back propagation , Need to call retain_grad() To prevent release .
6. torch.nograd()
A context manager , Generated during context calculation Tensor No automatic derivation , It is equivalent to stripping from the calculation diagram . It is used to reduce the additional calculation and memory consumption related to back propagation during the evaluation stage .
Set all network weight parameters requires_grad Set up False The same effect can be achieved , however torch.nograd() More concise and elegant .
6. Optimizer
The optimizer updates the weight according to the gradient of the parameter of the weight , When constructing the optimizer, you need to register the weight parameters that need to be updated . The optimizer can perform gradient zeroing and weight updating operations on these weight parameters .
The above is in use pytorch Some insights and conclusions in , If there is any mistake, please correct it .
边栏推荐
- Number of times a number appears in an ascending array
- Admin component
- Go小白实现一个简易的go mock server
- Ionic4 Learning Notes 6 -- using native ionic4 components in custom components
- 【校验】只能输入数字(正负数)
- epoch,batch_ size
- 【TkInter】常用组件(一)
- EasyUI framework dialog repeated loading problem
- The drop-down list component uses iscrol JS to achieve the rolling effect of the pit encountered
- 13. What is the difference between onkeydown, up and onkeypress?
猜你喜欢

网络安全80端口—-PHP CGI参数注入执行漏洞

Ionic4 Learning Notes 6 -- using native ionic4 components in custom components

Is the validity period of the root certificate as long as the server SSL certificate?

["code" power is fully open, and "chapter" shows strength] list of contributors to the task challenge in the first quarter of 2022

QT - animation frame

The 5th Digital China Construction summit opened in Fuzhou, Fujian

Mysql——》BufferPool相关信息

Wechat applet reverse

pycharm配置opencv库

第五届数字中国建设峰会在福建福州开幕
随机推荐
epoch,batch_ size
CF Lomsat gelral(启发式合并)
jmeter -- prometheus+grafana服务器性能可视化
可撤销并查集板子
Pytorch的旅程二:梯度下降
Ionic4 learning notes 8 -- UI component 2 list (no practice, direct excerpt)
3. Variable declaration promotion?
Common methods of string (2)
L4L7负载均衡
理解corners_align,两种看待像素的视角
jmeter --静默运行
QT - animation frame
XSS bypass pose summary
MySQL - bufferpool related information
Is the validity period of the root certificate as long as the server SSL certificate?
ORM introduction and database operation
Highcharts chart and report display, export data
leetcode-记忆化深搜/动态规划v2
CF lomsat gelral (heuristic merge)
Pycharm configuring opencv Library