当前位置：网站首页>RuntimeError: Trying to backward through the graph a second time (or directly access saved variable

RuntimeError: Trying to backward through the graph a second time (or directly access saved variable

2022-06-25 17:39:00 【Reject ellipsis】

use pytorch This error occurred when , Write it down to avoid going into the pit again . Thanks for this pit, I have a clearer understanding of the use of the pre training model .

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Simply put, the problem is that a variable with gradient information is executed once , The gradient information is released by the computational graph , Our code tries to access these variables during the second back propagation ( Gradient information ).

Catalog

reason
resolvent
- Embedding No training ？
Embedding Join directly in Net in
inspire —— Don't use Embedding Gradient information
- Method 1：
- Method 2：

reason

Everyone may have different reasons .
The reason I am here is to Embedding Written outside the training model cycle .
Here is my error code , This is the first code example .
You can see that the whole dictionary is first Embedding To train again , The above error will appear .

net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
                       [3, 2, 1, 5],
                       [4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long)  #  The category of each word 
print(sample)
embedding = nn.Embedding(10, 32)
embed_sample = embedding(sample)  # torch.Size([3, 4, 32])

net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
    pred = net(embed_sample)    # torch.Size([3, 4, 2])
    pred = pred.reshape(-1, 2)  # torch.Size([12, 2])
    loss = Loss(pred, target)   #  Calculate the loss 
    optimizer.zero_grad()       #  Zero gradient 
    loss.backward()             #  Back propagation 
    optimizer.step()            #  Update gradient 
    print(i+1, loss)

# Output 
---------------------------------------------------------------------------
1 tensor(0.7125, grad_fn=<NllLossBackward>)
RuntimeError                              Traceback (most recent call last)
D:\Temp\ipykernel_8312\2990520637.py in <cell line: 3>()
      7     #sum_loss # One epoch All losses and 
      8     optimizer.zero_grad()       # Zero gradient 
----> 9     loss.backward()             # Back propagation 
     10     optimizer.step()            # Update gradient 
     11     print(i+1,loss)

E:\anaconda\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

E:\anaconda\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    145         retain_graph = create_graph
    146 
--> 147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

This is because the word embedding is static , We already released it when we first back propagated it , So when you enter the second cycle for back propagation , It's a mistake .

resolvent

The solution is easy , We just need to move it inside the loop . To the following code .

net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
                       [3, 2, 1, 5],
                       [4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long)  #  The category of each word 
print(sample)
embedding = nn.Embedding(10, 32)
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
    # Just write here 
    embed_sample = embedding(sample)  #torch.Size([3, 4, 32]) 
    pred = net(embed_sample)    # torch.Size([3, 4, 2])
    pred = pred.reshape(-1, 2)  # torch.Size([12, 2])
    loss = Loss(pred, target)   #  Calculate the loss 
    optimizer.zero_grad()       #  Zero gradient 
    loss.backward()             #  Back propagation 
    optimizer.step()            #  Update gradient 
    print(i+1, loss)

Embedding No training ？

The problem with this is Embedding() The parameters in do not participate in the training ( Self verifiable ), As for why ,
because optimizer = torch.optim.Adam(net.parameters(), lr=1e-3) Has not been added to embedding Parameters of .
The solution is to embedding Add the parameters of .
optimizer =torch.optim.Adam(list(embedding.parameters())+list(net.parameters()), lr=1e-3)

（ Underwater code ）

net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
                       [3, 2, 1, 5],
                       [4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long)  #  The category of each word 
print(sample)
embedding = nn.Embedding(10, 32)
print(list(embedding.parameters()))
net.train()
optimizer = torch.optim.Adam(list(embedding.parameters())+list(net.parameters()), lr=1e-3)
for i in range(100):
    # Just write here 
    embed_sample = embedding(sample)  #torch.Size([3, 4, 32]) 
    pred = net(embed_sample)    # torch.Size([3, 4, 2])
    pred = pred.reshape(-1, 2)  # torch.Size([12, 2])
    loss = Loss(pred, target)   #  Calculate the loss 
    optimizer.zero_grad()       #  Zero gradient 
    loss.backward()             #  Back propagation 
    optimizer.step()            #  Update gradient 
    print(i+1, loss)
print(list(embedding.parameters())) # Compare the previous

Embedding Join directly in Net in

Look at all the above , I think everyone can think of , Which is to directly Embedding Layer directly added to net in

net = nn.Sequential(nn.Embedding(10, 32),
                    nn.Linear(32, 2))
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
                       [3, 2, 1, 5],
                       [4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long)  #  The category of each word 
print(sample)
print(list(net[0].parameters()))  # Namely Embedding Parameters of the layer 
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
    # Just write here 
    pred = net(sample)    # torch.Size([3, 4, 2])
    pred = pred.reshape(-1, 2)  # torch.Size([12, 2])
    loss = Loss(pred, target)   #  Calculate the loss 
    optimizer.zero_grad()       #  Zero gradient 
    loss.backward()             #  Back propagation 
    optimizer.step()            #  Update gradient 
    print(i+1, loss)
print(list(net[0].parameters())) # Compare the previous

inspire —— Don't use Embedding Gradient information

If I load a pre training model now , And don't want it to be involved Update gradient ：
Then I'll just optimizer = torch.optim.Adam(net.parameters(), lr=1e-3) It would be good if this parameter were not added to the .

But in that case , The parameters of this pre training model will still participate in Back propagation In the process of ( Otherwise, why would an error be reported at the beginning ), This actually increases the cost of the computer , We want this pre training model Not involved in the back propagation process , That is, let it have no gradient information .

Method 1：

1. Use .detach Make input us net If only the word embedding vector of does not have the gradient information of the pre training model .( My words here are net(embed_sample.detach()))

（ Rewater code ）

net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
                       [3, 2, 1, 5],
                       [4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long)  #  The category of each word 
print(sample)
embedding = nn.Embedding(10, 32)
embed_sample = embedding(sample)  #torch.Size([3, 4, 32])
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
    # Just write here 
    pred = net(embed_sample.detach())    # torch.Size([3, 4, 2])
    pred = pred.reshape(-1, 2)  # torch.Size([12, 2])
    loss = Loss(pred, target)   #  Calculate the loss 
    optimizer.zero_grad()       #  Zero gradient 
    loss.backward()             #  Back propagation 
    optimizer.step()            #  Update gradient 
    print(i+1, loss)

Method 2：

2. Use with torch.no_grad() When we use the pre training model to generate the word embedding vector, the gradient information is not saved .

net = nn.Linear(32, 2)
Loss = nn.CrossEntropyLoss()
sample = torch.tensor([[1, 2, 3, 3],
                       [3, 2, 1, 5],
                       [4, 5, 9, 3]])
target = torch.ones((12,)).to(torch.long)  #  The category of each word 
print(sample)
embedding = nn.Embedding(10, 32)
with torch.no_grad():
    embed_sample = embedding(sample)  #torch.Size([3, 4, 32])
net.train()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
for i in range(100):
    # Just write here 
    pred = net(embed_sample)    # torch.Size([3, 4, 2])
    pred = pred.reshape(-1, 2)  # torch.Size([12, 2])
    loss = Loss(pred, target)   #  Calculate the loss 
    optimizer.zero_grad()       #  Zero gradient 
    loss.backward()             #  Back propagation 
    optimizer.step()            #  Update gradient 
    print(i+1, loss)