当前位置:网站首页>[tensorrt] video swing transformer deployment
[tensorrt] video swing transformer deployment
2022-06-22 01:31:00 【MaxeeoveCR】
1. TensorRT(.engine) python Interface reasoning
Code
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
def do_inference(context, bindings, inputs, outputs, stream, ctx, batch_size=1):
# Transfer input data to the GPU.
'''
Initialize cuda to avoid error:
[TensorRT] ERROR: ../rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400
(invalid resource handle)
Solution ref.: https://blog.csdn.net/yiyayi1/article/details/111314520
```
ctx.push()
{your inference code}
ctx.pop()
```
'''
ctx.push()
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
ctx.pop()
# Return only the host outputs.
return [out.host for out in outputs]
def inf_trt(engine_path, data_loader):
# Initialize cuda
cuda.init()
ctx = cuda.Device(0).make_context()
# Load engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
runtime = trt.Runtime(TRT_LOGGER)
engine = None
with open(engine_path, mode='rb') as f:
engine_bytes = f.read()
engine = runtime.deserialize_cuda_engine(engine_bytes)
# Allocate inputs/outputs/stream buffers
inputs, outputs, bindings, stream = allocate_buffers(engine)
# Fetch context
context = engine.create_execution_context()
# get output tensor
dataset = data_loader.dataset
# TensorRT inference
results = []
for data_idx, data in enumerate(data_loader):
# Inference
result_buffer = do_inference(context, bindings, inputs, outputs, stream, ctx)
result = copy.deepcopy(result_buffer)
results.append(result_info)
del engine
ctx.pop()
del context
del stream
del inputs
del outputs
return results
2. common problem ( There are many pits )
[TensorRT] ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)
resolvent
Reference: https://github.com/NVIDIA/TensorRT/issues/1107
# Initialize
cuda.init()
ctx = cuda.Device(0).make_context()
...
# Inference
ctx.push()
{your inference code}
ctx.pop()
PyCUDA ERROR: The context stack was not empty upon module cleanup.
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
resolvent
Reference: https://github.com/NVIDIA/TensorRT/issues/1107
stay del engine after , add ctx.pop(), Otherwise, the above error will be reported
del engine
ctx.pop()
del context
del stream
del inputs
del outputs
3. Go on
Praise TRT 8.+, transformer The acceleration of the module is like taking off
Others to be added …
边栏推荐
- SQL操作:WITH表达式及其应用
- Navicat连接不到MySQL
- 动态规划-01背包,分割等和子集,最后一块石头的重量
- The appearance, space, safety and power are all upgraded. The xinjietu x70s will be put on the market from 87900 yuan
- Sqlite3数据库的timestamp类型的使用注意事项
- Jpom 简介: 简而轻的低侵入式在线构建、自动部署、日常运维、项目监控软件
- . Several methods of obtaining hinstance in. Net
- [noi simulation] interval distance (block and convolution)
- Some introduction and transplantation of lvgl
- clean,compile,build,install,package区别
猜你喜欢
![[noi simulation] interval distance (block and convolution)](/img/1f/d2f3ed8a80e2a4fca9669dcfc55a30.png)
[noi simulation] interval distance (block and convolution)

Graphical understanding of the article "text classification of Sina News Based on tensorflow+rnn"

SAP MM 进口采购业务中供应商多送或者少送场景的处理

Shardingsphere-proxy-5.0.0 implementation of distributed hash modulo fragmentation (4)

LeetCode 5242. 兼具大小写的最好英文字母

Example and description of lvgl Demo 1

I just learned a cool 3D pyramid stereoscopic effect. Come and have a look

Brief introduction to jpom: simple and light low intrusive online construction, automatic deployment, daily operation and maintenance, and project monitoring software

Handwriting database connection pool
![[ÑÖÏ Simulation Competition] fading (matrix acceleration, cyclic convolution, Gauss elimination)](/img/4a/9dfcb699e36f67e14c036e3ae26417.png)
[ÑÖÏ Simulation Competition] fading (matrix acceleration, cyclic convolution, Gauss elimination)
随机推荐
How to make your website quickly found by search engines
Brief description of advantages and disadvantages of cloud fortress distributed cluster deployment
Documenter l'utilisation de webcraper
4274. suffix expression
Using SSM framework to realize user login
Special survey of moving average strategy
LeetCode 5218. 个位数字为 K 的整数之和(枚举)
After the counter is completed, you want to count the results whose string length is greater than 2
使用 gomonkey Mock 函数及方法
yolov3 3D 语义点云paper阅读
LeetCode 5218. Sum of integers with K digits (enumeration)
【TensorRT】Video Swin-Transformer部署相关
[GLib][GStreamer] 插件编写思路 —— 继承、覆写 和 虚函数
点云配准--4PCS原理与应用
Pat (a) - 1001 a+b format
BigDecimal basic use
Idea prompt 'optional Get() 'without' ispresent() 'check error.
Install tensorflow and transformer on Orange Pie orangepi4b
[dailyfresh] course record 3 -- product search related
Example and description of lvgl Demo 1