当前位置:网站首页>Understanding of CUDA, cudnn and tensorrt
Understanding of CUDA, cudnn and tensorrt
2022-06-28 08:15:00 【The mountain of ignorance, the valley of despair, the slope of 】
cuda Reference resources :https://www.zhihu.com/question/409350643/answer/1361111350
cuda
cuda yes Compute Unified Device Architecture Abbreviation . It is called unified computing architecture in Chinese . It's to make nvidia gpu An integration technology that can perform general-purpose computing tasks . We can usually use cuda Framework has c,c++,fortran,python,java Of , It can provide a good acceleration function for the work of large data throughput . In a nutshell , Just to make GPU You can not only work with your own scenes , But to use their own advantages , Complete the task of general computing . It is mainly used in addition to daily video coding and decoding , Out of game , It can be applied to computing acceleration . Take the planetary model simulation I've been in contact with ,GPU Acceleration can greatly accelerate the physical computing process we simulate , Accelerate scientific research output .
cuda and cudnn
First ,CUDA yes C Language in GPU Programming expansion package ,CUDNN Is a library that encapsulates convolution and other operators , It's not a level thing . secondly , The relationship between the two ,CUDA Can be used to implement cudnn Various interfaces defined , In the early CUDNN It should be used internally CUDA Realized , But with the development of NVIDIA software Ecology ,CUDNN The team will definitely choose to use the lower level , Closer to hardware , More difficult tools to build Kernel, such as PTX, For example, write assembly directly (SASS). If you don't believe it, you can try it yourself CUDA Realization CUDNN The interface of , See how poor the performance can be . Of course you can CUDA Everyone who writes well must know CUDA Limitations . Last , The position of the two in the ecology . In the beginning CUDA It can be said that it is something NVIDIA uses to fight the world , To a large extent, it has established its position in high-performance computing, especially neural network high-performance computing . because CUDA In the contradiction between exposing hardware features and maintaining software commonality, we found a delicate , The balance that most people can accept . But with the development of technology in recent years , Things have changed again ,CUDA Still shouldering the important task of software ecological universality , And high-performance tasks , More needs to be done by CUDNN,CUBLAS These high-performance software libraries undertake . In NVIDIA's vision , Mature operators , Like convolution , Such as full connection , Users can use the library to get the best performance directly , For new operators or operators unique to each user , Users can still use CUDA It is relatively easy to implement a version with acceptable performance by yourself . Finally through TensorRT, TensorFlow Such a framework links the two .
cuda、cudnn and tensorrt The relationship between
CUDA yes NVIDIA Launched for home GPU The framework of parallel computing , in other words CUDA Only in NVIDIA Of GPU Up operation , And only when the computing problem to be solved is a large number of parallel computing can play CUDA The role of .CUDA Its main function is to connect GPU and Applications , It is convenient for users to pass CUDA Of API Dispatch GPU Calculate .
cuDNN(CUDA Deep Neural Network library): yes NVIDIA The acceleration library for deep neural network is built , It's a deep neural network GPU Acceleration Library . It can optimize the calculation of model training , Re pass CUDA call GPU Carry out operations .
Of course, you can also use it directly CUDA, Not through cuDNN , But the computational efficiency will be much lower . Because your model training calculation is not optimized .
TensorRT It's an acceleration package made by NVIDIA for its own platform , Only responsible for the reasoning of the model (inference) The process , Generally do not use TensorRT To train the model , It is used to accelerate the running speed of the model during deployment .
TensorRT Two things have been done , To speed up the model .
1、TensorRT Support INT8 and FP16 The calculation of . Deep learning network in training , Usually use 32 Bit or 16 Bit data .TensorRT In the reasoning of the network, the accuracy is not so high , Achieve the purpose of accelerating inference .
2、 TensorRT The network structure is reconstructed , Combine some operations that can be combined , in the light of GPU The characteristics of are optimized . Most deep learning frameworks are not targeted at GPU Performance optimization , And NVIDIA ,GPU Producers and porters , Naturally, it is launched for itself GPU Acceleration tool TensorRT. A deep learning model , Without optimization , For example, a convoluted layer 、 A bias layer and a reload layer , These three layers need to be called three times cuDNN Corresponding API, But in fact, the implementation of these three layers can be combined ,TensorRT Will merge some networks that can be merged .
边栏推荐
- 【js】-【节流、防抖函数】
- asp. Net to search products and realize paging function
- Uvcgan: unt vision transformer cycle-consistent Gan for unpropared image-to-image translation
- B_ QuRT_ User_ Guide(26)
- Introduction to Devops Basics
- Explanation and application of instr() function in Oracle
- 新唐NUC980使用记录:自制开发板(基于NUC980DK61YC)
- [JS] - [DFS, BFS application] - learning notes
- 关于在cmd中MySQL不能插中文数据的原因
- B_ QuRT_ User_ Guide(30)
猜你喜欢

asp. Net datalist when there are multiple data displays

【学习笔记】拟阵

sql主從複制搭建

2022第六季完美童模 佛山赛区 初赛圆满落幕

图像翻译/Transformer:ITTR: Unpaired Image-to-Image Translation with Transformers用Transfor进行非配对图像对图像的转换

Introduction to kubernetes (I)

Airflow2.1.1 ultra detailed installation document

The maximum number of Rac open file descriptors, and the processing of hard check failure

Airflow2.1.1 summary of the pits stepped on in actual combat!!

Do you know TCP protocol (2)?
随机推荐
B_QuRT_User_Guide(28)
城联优品向英德捐赠抗洪救灾爱心物资
B_QuRT_User_Guide(26)
你了解TCP協議嗎(二)?
Devops foundation chapter Jenkins deployment (II)
Little artist huangxinyang was invited to participate in the Wuhan station of children's unit of Paris Fashion Week
你了解TCP协议吗(一)?
asp. Net datalist to display product information and pictures
Leetcode摆动序列系列
22/02/14 study notes
Jenkins' common build trigger and hook services (V)
Airflow2.x distributed deployment DAG execution failure log cannot be obtained normally
Oracle view all tablespaces in the current library
asp. Net error "/" server error in the application. String or binary data would be truncated. The statement...
Upgrade HDP spark to spark 2.4.8 without upgrading ambari
How to choose an account opening broker? Is it safe to open an account online?
The preliminary round of the sixth season of 2022 perfect children's model Foshan competition area came to a successful conclusion
【学习笔记】最短路 +生成树
券商注册开户靠谱吗?安全吗?
设置网页的标题部分的图标