当前位置:网站首页>Understanding of CUDA, cudnn and tensorrt
Understanding of CUDA, cudnn and tensorrt
2022-06-28 08:15:00 【The mountain of ignorance, the valley of despair, the slope of 】
cuda Reference resources :https://www.zhihu.com/question/409350643/answer/1361111350
cuda
cuda yes Compute Unified Device Architecture Abbreviation . It is called unified computing architecture in Chinese . It's to make nvidia gpu An integration technology that can perform general-purpose computing tasks . We can usually use cuda Framework has c,c++,fortran,python,java Of , It can provide a good acceleration function for the work of large data throughput . In a nutshell , Just to make GPU You can not only work with your own scenes , But to use their own advantages , Complete the task of general computing . It is mainly used in addition to daily video coding and decoding , Out of game , It can be applied to computing acceleration . Take the planetary model simulation I've been in contact with ,GPU Acceleration can greatly accelerate the physical computing process we simulate , Accelerate scientific research output .
cuda and cudnn
First ,CUDA yes C Language in GPU Programming expansion package ,CUDNN Is a library that encapsulates convolution and other operators , It's not a level thing . secondly , The relationship between the two ,CUDA Can be used to implement cudnn Various interfaces defined , In the early CUDNN It should be used internally CUDA Realized , But with the development of NVIDIA software Ecology ,CUDNN The team will definitely choose to use the lower level , Closer to hardware , More difficult tools to build Kernel, such as PTX, For example, write assembly directly (SASS). If you don't believe it, you can try it yourself CUDA Realization CUDNN The interface of , See how poor the performance can be . Of course you can CUDA Everyone who writes well must know CUDA Limitations . Last , The position of the two in the ecology . In the beginning CUDA It can be said that it is something NVIDIA uses to fight the world , To a large extent, it has established its position in high-performance computing, especially neural network high-performance computing . because CUDA In the contradiction between exposing hardware features and maintaining software commonality, we found a delicate , The balance that most people can accept . But with the development of technology in recent years , Things have changed again ,CUDA Still shouldering the important task of software ecological universality , And high-performance tasks , More needs to be done by CUDNN,CUBLAS These high-performance software libraries undertake . In NVIDIA's vision , Mature operators , Like convolution , Such as full connection , Users can use the library to get the best performance directly , For new operators or operators unique to each user , Users can still use CUDA It is relatively easy to implement a version with acceptable performance by yourself . Finally through TensorRT, TensorFlow Such a framework links the two .
cuda、cudnn and tensorrt The relationship between
CUDA yes NVIDIA Launched for home GPU The framework of parallel computing , in other words CUDA Only in NVIDIA Of GPU Up operation , And only when the computing problem to be solved is a large number of parallel computing can play CUDA The role of .CUDA Its main function is to connect GPU and Applications , It is convenient for users to pass CUDA Of API Dispatch GPU Calculate .
cuDNN(CUDA Deep Neural Network library): yes NVIDIA The acceleration library for deep neural network is built , It's a deep neural network GPU Acceleration Library . It can optimize the calculation of model training , Re pass CUDA call GPU Carry out operations .
Of course, you can also use it directly CUDA, Not through cuDNN , But the computational efficiency will be much lower . Because your model training calculation is not optimized .
TensorRT It's an acceleration package made by NVIDIA for its own platform , Only responsible for the reasoning of the model (inference) The process , Generally do not use TensorRT To train the model , It is used to accelerate the running speed of the model during deployment .
TensorRT Two things have been done , To speed up the model .
1、TensorRT Support INT8 and FP16 The calculation of . Deep learning network in training , Usually use 32 Bit or 16 Bit data .TensorRT In the reasoning of the network, the accuracy is not so high , Achieve the purpose of accelerating inference .
2、 TensorRT The network structure is reconstructed , Combine some operations that can be combined , in the light of GPU The characteristics of are optimized . Most deep learning frameworks are not targeted at GPU Performance optimization , And NVIDIA ,GPU Producers and porters , Naturally, it is launched for itself GPU Acceleration tool TensorRT. A deep learning model , Without optimization , For example, a convoluted layer 、 A bias layer and a reload layer , These three layers need to be called three times cuDNN Corresponding API, But in fact, the implementation of these three layers can be combined ,TensorRT Will merge some networks that can be merged .
边栏推荐
- Explanation and application of instr() function in Oracle
- asp. Net upload image path and image name
- Installing MySQL under Linux
- 小艺人黄鑫洋受邀参加巴黎时装周儿童单元武汉站
- How to choose an account opening broker? Is it safe to open an account online?
- Airflow2 configuration windows azure SSO details based on oauth2 protocol
- Redis persistence problem and final solution
- How redis solves cache avalanche, breakdown and penetration problems
- Uvcgan: unt vision transformer cycle-consistent Gan for unpropared image-to-image translation
- 22/02/14 study notes
猜你喜欢

图像翻译:UVCGAN: UNET VISION TRANSFORMER CYCLE-CONSISTENT GAN FOR UNPAIRED IMAGE-TO-IMAGE TRANSLATION

App automated testing appium tutorial 2 - ADB command

asp. Net error "/" server error in the application. String or binary data would be truncated. The statement...

城联优品向英德捐赠抗洪救灾爱心物资

Devops foundation chapter Jenkins deployment (II)

你了解TCP协议吗(一)?

Doris学习笔记之介绍、编译安装与部署

In flood fighting and disaster relief, the city donated 100000 yuan of love materials to help Yingde

Two tips for block level elements

MySQL installation and environment variable configuration
随机推荐
B_ QuRT_ User_ Guide(26)
MySQL single table access method
Vagrant installation
【尚品汇】项目笔记
asp. Net datalist when there are multiple data displays
你了解TCP协议吗(一)?
AI首席架构师8-AICA-高翔 《深入理解和实践飞桨2.0》
sql主從複制搭建
Explanation and application of instr() function in Oracle
App automated testing appium tutorial 2 - ADB command
抗洪救灾,共克时艰,城联优品捐赠10万元爱心物资驰援英德
Leetcode摆动序列系列
安装nrm后,使用nrm命令报错internal/validators.js:124 throw new ERR_INVALID_ARG_TYPE(name, ‘string‘, value)
sql主从复制搭建
IO error in Oracle11g: got minus one from a read call
cuda和cudnn和tensorrt的理解
2022第六季完美童模 佛山赛区 初赛圆满落幕
Estimation of SQL execution cost by MySQL query optimizer
Prometheus + grafana + MySQL master-slave replication + host monitoring
Usage record of Xintang nuc980: self made development board (based on nuc980dk61yc)