当前位置：网站首页>Onnxoptimizer, onnxsim usage records

Onnxoptimizer, onnxsim usage records

2022-06-23 05:19:00 【Ten thousand miles' journey to】

onnxoptimizer、onnxsim Known as the onnx Optimization tool of , among onnxsim Constants can be optimized ,onnxoptimizer Nodes can be compressed . To this end, the resnet18 For example , test onnxoptimizer、onnxsim For the optimization effect of the model .onnxoptimizer、onnxsim The installation code for is as follows ：

pip install onnxoptimizer

pip install onnxsim

1、resnet18 Structure

resnet18 The structure of is as follows , It can be seen as multiple CBR Component composition

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

2、 establish onnx Model

import torch
from torchvision import models
model = models.resnet18()
dummy_input=torch.ones((1,3,256,256))
input_names=['input']
output_names=['output']
ONNX_name="resnet18.onnx"
torch.onnx.export(model.eval(), dummy_input, ONNX_name, verbose=True, input_names=input_names,opset_version=16,
                  dynamic_axes={'input': {0: 'batch',1:'width',2:'height'}, },
                  output_names=output_names)#,example_outputs=example_outputs

At this time, the model structure is as follows , The following figure is only partial . Please run the code by yourself , And upload the model to Netron The website is visualized . In the picture Identity Node is BN operation .

2、onnxoptimizer Optimize

import onnx
import onnxoptimizer

model = onnx.load(ONNX_name)
new_model = onnxoptimizer.optimize(model)
onnx.save(new_model,"resnet18_optimize.onnx")
# use model_simp as a standard ONNX model object

resnet18_optimize.onnx The structure of is as follows , Intuitively feel and resnet18.onnx It doesn't make any difference . And the memory size of the two models is the same , in other words onnxoptimizer No optimization BN operation .

3、onnxsim Optimize

import onnx
from onnxsim import simplify
onnx_model = onnx.load(ONNX_name)  # load onnx model
model_simp, check = simplify(onnx_model)
assert check, "Simplified ONNX model could not be validated"
onnx.save(model_simp, "resnet18_simplify.onnx")

resnet18_simplify.onnx The structure of is as follows , Intuitively feel and resnet18.onnx There's a difference ,BN Operations are incorporated into the model .

4、 Run time vs. video memory usage

The following code involves video memory management , Please download dll library vs2019 Export dynamic link library （dll） To others vs Project and python Code using _ A flash of hope to my blog -CSDN Blog adopt vs You can export dynamic link libraries （dll file ） To others c++ project 、c# project 、python Project use . This case will realize vs Export project as dynamic link library , to c++ Project and python Project use . Involving global variables 、 function 、 Custom class export . After the project is created, you will get the following structure , You can write the core code in dllmain.cpp Inside （ The original content can be ignored ）, The header file information is written in pch.h The following contents can be copied to pch.h in （ The blogger's code involves cuda, So you need to configure the following cuda,cuda The configuration of can refer to libtorch Example of video memory management _ A flash of hope to my blog -CSDN Blog , various https://hpg123.blog.csdn.net/article/details/125396626

import onnxruntime
import numpy as np
import time
import pynvml
import ctypes
import os
#os.environ['Path']+=r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin'
#python3.7 Versions above use the following code to add dependencies dll The path of 
os.add_dll_directory(r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin')
lib = ctypes.cdll.LoadLibrary(os.getcwd()+ "/dll_export.dll")
#win32api.FreeLibrary(libc._handle)   # It is found that the program cannot exit normally at the end of running dll, You need to explicitly release dll
lib.reset_cuda()

pynvml.nvmlInit()
def get_cuda_msg(tag=""):
    handle = pynvml.nvmlDeviceGetHandleByIndex(0)
    meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
    print(tag, ", used:",meminfo.used / 1024**2,"Mib","free:",meminfo.free / 1024**2,"Mib")
    
onnx_path=["resnet18.onnx","resnet18_optimize.onnx","resnet18_simplify.onnx"]
for ONNX_name in onnx_path:
    #session = onnxruntime.InferenceSession("traced_sp_model.onnx")

    get_cuda_msg("\nStart run")
    session = onnxruntime.InferenceSession(ONNX_name, providers=[ 'CUDAExecutionProvider'])#'TensorrtExecutionProvider', 'CPUExecutionProvider'
    #print("Input node name:");[print(x.name) for x in session.get_inputs()]
    #print("Output node name:");[print(x.name) for x in session.get_outputs()]
    # Note the value type and shape, This should be related to torch It's the same 
    data={
          "input":np.ones((1,3,256,256),dtype=np.float32)
        }
    st=time.time()
    for i in range(100):
        outputs = session.run(None,data)
    print(ONNX_name,outputs[0].shape,(time.time()-st)/100)
    get_cuda_msg("End run")
    
    # You need to delete objects that occupy video memory , Can be cleared cuda cache . If it is pytorch You need to del model
    del session
    lib.reset_cuda()

The comparison results are as follows , There's basically no difference . After optimization , It's a little faster , but cuda There is no significant advantage in occupation

Start run , used: 11289.984375 Mib free: 998.015625 Mib
resnet18.onnx (1, 1000) 0.03594543933868408
End run , used: 12257.984375 Mib free: 30.015625 Mib

Start run , used: 11289.984375 Mib free: 998.015625 Mib
resnet18_optimize.onnx (1, 1000) 0.024863476753234862
End run , used: 12183.984375 Mib free: 104.015625 Mib

Start run , used: 11289.984375 Mib free: 998.015625 Mib
resnet18_simplify.onnx (1, 1000) 0.026698846817016602
End run , used: 12239.984375 Mib free: 48.015625 Mib

原网站

版权声明
本文为[Ten thousand miles' journey to]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/174/202206230223573616.html