当前位置:网站首页>RuntimeError: Trying to backward through the graph a second time (or directly access saved variable
RuntimeError: Trying to backward through the graph a second time (or directly access saved variable
2022-06-25 17:39:00 【Reject ellipsis】
use pytorch This error occurred when , Write it down to avoid going into the pit again . Thanks for this pit, I have a clearer understanding of the use of the pre training model .
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
Simply put, the problem is that a variable with gradient information is executed once , The gradient information is released by the computational graph , Our code tries to access these variables during the second back propagation ( Gradient information ).
Catalog
reason
Everyone may have different reasons .
The reason I am here is to Embedding
Written outside the training model cycle .
Here is my error code , This is the first code example .
You can see that the whole dictionary is first Embedding
To train again , The above error will appear .
net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
[3, 2, 1, 5],
[4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long) # The category of each word
print(sample)
embedding = nn.Embedding(10, 32)
embed_sample = embedding(sample) # torch.Size([3, 4, 32])
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
pred = net(embed_sample) # torch.Size([3, 4, 2])
pred = pred.reshape(-1, 2) # torch.Size([12, 2])
loss = Loss(pred, target) # Calculate the loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Back propagation
optimizer.step() # Update gradient
print(i+1, loss)
# Output
---------------------------------------------------------------------------
1 tensor(0.7125, grad_fn=<NllLossBackward>)
RuntimeError Traceback (most recent call last)
D:\Temp\ipykernel_8312\2990520637.py in <cell line: 3>()
7 #sum_loss # One epoch All losses and
8 optimizer.zero_grad() # Zero gradient
----> 9 loss.backward() # Back propagation
10 optimizer.step() # Update gradient
11 print(i+1,loss)
E:\anaconda\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
E:\anaconda\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
145 retain_graph = create_graph
146
--> 147 Variable._execution_engine.run_backward(
148 tensors, grad_tensors_, retain_graph, create_graph, inputs,
149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
This is because the word embedding is static , We already released it when we first back propagated it , So when you enter the second cycle for back propagation , It's a mistake .
resolvent
The solution is easy , We just need to move it inside the loop . To the following code .
net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
[3, 2, 1, 5],
[4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long) # The category of each word
print(sample)
embedding = nn.Embedding(10, 32)
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
# Just write here
embed_sample = embedding(sample) #torch.Size([3, 4, 32])
pred = net(embed_sample) # torch.Size([3, 4, 2])
pred = pred.reshape(-1, 2) # torch.Size([12, 2])
loss = Loss(pred, target) # Calculate the loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Back propagation
optimizer.step() # Update gradient
print(i+1, loss)
Embedding No training ?
The problem with this is Embedding() The parameters in do not participate in the training ( Self verifiable ), As for why ,
because optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
Has not been added to embedding
Parameters of .
The solution is to embedding
Add the parameters of .optimizer =torch.optim.Adam(list(embedding.parameters())+list(net.parameters()), lr=1e-3)
( Underwater code )
net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
[3, 2, 1, 5],
[4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long) # The category of each word
print(sample)
embedding = nn.Embedding(10, 32)
print(list(embedding.parameters()))
net.train()
optimizer = torch.optim.Adam(list(embedding.parameters())+list(net.parameters()), lr=1e-3)
for i in range(100):
# Just write here
embed_sample = embedding(sample) #torch.Size([3, 4, 32])
pred = net(embed_sample) # torch.Size([3, 4, 2])
pred = pred.reshape(-1, 2) # torch.Size([12, 2])
loss = Loss(pred, target) # Calculate the loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Back propagation
optimizer.step() # Update gradient
print(i+1, loss)
print(list(embedding.parameters())) # Compare the previous
Embedding Join directly in Net in
Look at all the above , I think everyone can think of , Which is to directly Embedding
Layer directly added to net
in
net = nn.Sequential(nn.Embedding(10, 32),
nn.Linear(32, 2))
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
[3, 2, 1, 5],
[4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long) # The category of each word
print(sample)
print(list(net[0].parameters())) # Namely Embedding Parameters of the layer
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
# Just write here
pred = net(sample) # torch.Size([3, 4, 2])
pred = pred.reshape(-1, 2) # torch.Size([12, 2])
loss = Loss(pred, target) # Calculate the loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Back propagation
optimizer.step() # Update gradient
print(i+1, loss)
print(list(net[0].parameters())) # Compare the previous
inspire —— Don't use Embedding Gradient information
If I load a pre training model now , And don't want it to be involved Update gradient :
Then I'll just optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
It would be good if this parameter were not added to the .
But in that case , The parameters of this pre training model will still participate in Back propagation In the process of ( Otherwise, why would an error be reported at the beginning ), This actually increases the cost of the computer , We want this pre training model Not involved in the back propagation process , That is, let it have no gradient information .
Method 1:
1. Use .detach
Make input us net
If only the word embedding vector of does not have the gradient information of the pre training model .( My words here are net(embed_sample.detach())
)
( Rewater code )
net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
[3, 2, 1, 5],
[4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long) # The category of each word
print(sample)
embedding = nn.Embedding(10, 32)
embed_sample = embedding(sample) #torch.Size([3, 4, 32])
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
# Just write here
pred = net(embed_sample.detach()) # torch.Size([3, 4, 2])
pred = pred.reshape(-1, 2) # torch.Size([12, 2])
loss = Loss(pred, target) # Calculate the loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Back propagation
optimizer.step() # Update gradient
print(i+1, loss)
Method 2:
2. Use with torch.no_grad()
When we use the pre training model to generate the word embedding vector, the gradient information is not saved .
net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
[3, 2, 1, 5],
[4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long) # The category of each word
print(sample)
embedding = nn.Embedding(10, 32)
with torch.no_grad():
embed_sample = embedding(sample) #torch.Size([3, 4, 32])
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
# Just write here
pred = net(embed_sample) # torch.Size([3, 4, 2])
pred = pred.reshape(-1, 2) # torch.Size([12, 2])
loss = Loss(pred, target) # Calculate the loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Back propagation
optimizer.step() # Update gradient
print(i+1, loss)
边栏推荐
- Jerry's ADC_ get_ Incorrect voltage value obtained by voltage function [chapter]
- [UVM practice== > episode_1] ~ MCDF design update, AMBA standard interface, UVM verification environment update
- 启牛涨乐财付通下载是可以开户吗?开户安全吗
- 数学建模——整数规划
- Create a new ar fashion experience with cheese and sugar beans
- 求满足条件的最长子串长度
- Is Guotai Junan Securities reliable? Is it legal? Is it safe to open a stock account?
- Acy100 oil fume concentration online monitor for kitchen oil fume emission in catering industry
- conda安装的py3.6和py3.7
- How to solve the problem of network disconnection after enabling hotspot sharing in win10?
猜你喜欢
Jerry's ADC_ get_ Incorrect voltage value obtained by voltage function [chapter]
[matlab] curve fitting
匯編語言(5)寄存器(內存訪問)
[black apple] Lenovo Savior y70002019pg0
SDN system method | 10 The future of SDN
Learning Tai Chi makers - mqtt (I) what is mqtt
相同wifi下,笔记本连接台式机上的虚拟机
Super Full Metal PBR Multi - channel Mapping Materials website collation
BILSTM和CRF的那些事
数学建模——非线性规划
随机推荐
WPF development essays Collection - ECG curve drawing
HMS core machine learning service realizes simultaneous interpretation, supports Chinese-English translation and multiple voice broadcast
智能对话01-redis的安装
Introduction to the container of() function
用户调度问题
Langage d'assemblage (5) Registre (accès à la mémoire)
VSCode 自动生成头文件的#ifndef #define #endif
Acy100 oil fume concentration online monitor for kitchen oil fume emission in catering industry
js禁止浏览器默认事件
揭秘GES超大规模图计算引擎HyG:图切分
汇编语言(5)寄存器(内存访问)
Website arrangement of super all metal PBR multi-channel mapping materials
Sword finger offer II 012 The sum of left and right subarrays is equal
About queryinterface functions
Sword finger offer II 025 Adding two numbers in a linked list
超全金屬PBR多通道貼圖素材網站整理
Jericho's method of obtaining reset source and wakeup IO port [chapter]
conda安装的py3.6和py3.7
Use diskgenius to expand the capacity of system disk C
How does social e-commerce operate and promote?