当前位置：网站首页>[tensorrt] video swing transformer deployment

[tensorrt] video swing transformer deployment

2022-06-22 01:31:00 【MaxeeoveCR】

1. TensorRT(.engine) python Interface reasoning

Code

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream
    
def do_inference(context, bindings, inputs, outputs, stream, ctx, batch_size=1):
    # Transfer input data to the GPU.

    '''
    Initialize cuda to avoid error:
        [TensorRT] ERROR: ../rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 
        (invalid resource handle)
    Solution ref.: https://blog.csdn.net/yiyayi1/article/details/111314520
        ```
            ctx.push()
            {your inference code}
            ctx.pop()
        ```
    '''
    ctx.push()
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    ctx.pop()
    # Return only the host outputs.
    return [out.host for out in outputs]

def inf_trt(engine_path, data_loader):
	# Initialize cuda
	cuda.init()
    ctx = cuda.Device(0).make_context()
    
    # Load engine
    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    trt.init_libnvinfer_plugins(TRT_LOGGER, '')
    runtime = trt.Runtime(TRT_LOGGER)
    engine = None
    with open(engine_path, mode='rb') as f:
        engine_bytes = f.read()
    engine = runtime.deserialize_cuda_engine(engine_bytes)
	
	# Allocate inputs/outputs/stream buffers
    inputs, outputs, bindings, stream = allocate_buffers(engine)
	
	# Fetch context 
    context = engine.create_execution_context()
    # get output tensor
    dataset = data_loader.dataset
    
    # TensorRT inference
    results = []
    for data_idx, data in enumerate(data_loader):
        # Inference
        result_buffer = do_inference(context, bindings, inputs, outputs, stream, ctx)
        result = copy.deepcopy(result_buffer)
        results.append(result_info)

    del engine
    ctx.pop()
    del context
    del stream
    del inputs
    del outputs
    return results

2. common problem ( There are many pits )

[TensorRT] ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)

resolvent

Reference: https://github.com/NVIDIA/TensorRT/issues/1107

# Initialize
cuda.init()
ctx = cuda.Device(0).make_context()
...

# Inference
ctx.push()
{your inference code}
ctx.pop()

PyCUDA ERROR: The context stack was not empty upon module cleanup.

PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

resolvent

Reference: https://github.com/NVIDIA/TensorRT/issues/1107

stay del engine after , add ctx.pop(), Otherwise, the above error will be reported

del engine
ctx.pop()
del context
del stream
del inputs
del outputs

3. Go on

Praise TRT 8.+, transformer The acceleration of the module is like taking off

Others to be added …

原网站

版权声明
本文为[MaxeeoveCR]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/173/202206220028593481.html

当前位置：网站首页>[tensorrt] video swing transformer deployment

[tensorrt] video swing transformer deployment

1. TensorRT(.engine) python Interface reasoning

Code

2. common problem ( There are many pits )

[TensorRT] ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)

resolvent

PyCUDA ERROR: The context stack was not empty upon module cleanup.

resolvent

3. Go on

边栏推荐

猜你喜欢

随机推荐