当前位置:网站首页>Best practices for auto plug-ins and automatic batch processing in openvinotm 2022.1
Best practices for auto plug-ins and automatic batch processing in openvinotm 2022.1
2022-06-23 13:36:00 【Intel edge computing community】
OpenVINOTM 2022.1 in AUTO Best practices for plug-ins and automated batch processing
OpenVINOTM 2022.1 Is from OpenVINOTM 2018 One of the biggest updates since it was first released in , See 《OpenVINO Usher in the most significant update to date ,2022.1 New features are the first to see !》. Among the many new features ,AUTO Plug ins and automatic batch processing (Automatic-Batching) Is one of the most important new features , It helps developers improve the performance and efficiency of reasoning computation without complex programming .
What is? AUTO plug-in unit ?
AUTO plug-in unit [1], Full name Automatic device selection (Automatic device selection), It's built on CPU/GPU Virtual plug-ins on top of plug-ins , Pictured 1-1 Shown . stay OpenVINOTM In the document ,“ equipment (device)” Is used for reasoning Intel processor , It can be supported CPU、GPU、VPU( Visual processing unit ) or GNA( Gauss neural accelerator coprocessor ) Or a combination of these devices [3].

chart 1-1 OpenVINOTM Runtime Supported device plug-ins [3]
AUTO plug-in unit benefits Yes :
- First, check all available computing devices on the runtime platform , Then select the best computing device for reasoning , And according to the deep learning model and the characteristics of the selected equipment Use... In the best configuration it .
- send GPU Achieve faster first inference delay :GPU The plug-in needs to compile the online model at run time before starting reasoning —— You may need to 10 Seconds or so to complete , It depends on the performance of the platform and the complexity of the model . When selecting independent or integrated GPU when ,“AUTO” The plug-in will first use CPU Reasoning , To hide this GPU Model compilation time .
- Easy to use , Developers just need to compile_model() Methodical device_name Parameter specified as “AUTO” that will do , Pictured 1-2 Shown .

chart 1-2 Appoint AUTO plug-in unit
Automatic batch processing (Automatic Batching) [2], Also called automatic batch execution (Automatic Batching Execution), yes OpenVINOTM Runtime One of the supported devices , Pictured 1-1 Shown .
Generally speaking , Batch size (batch size) The greater the reasoning calculation , The better the reasoning efficiency and throughput . Automatic batch execution combines multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request .
Automatic batch processing does not need to be manually specified by the developer . When compile_model() Methodical config Parameter set to {“PERFORMANCE_HINT”: ”THROUGHPUT”} when ,OpenVINOTM Runtime Meeting Auto start Automatic batch execution , Pictured 1-3 Shown , Let developers enjoy the improvement of computing device utilization and throughput with the least coding work .

chart 1-3 Automatically start automatic batch execution
Do it yourself AUTO Plug in features
Reading is learning , Practice is also learning , And it's more effective learning . This article provides a complete experimental code , For readers to practice , While learning and summarizing .
Github Address : https://github.com/yas-sim/openvino-auto-feature-visualization
First step , Clone the code repository to the local .
git clone https://github.com/yas-sim/openvino-auto-feature-visualization.git
The second step , stay openvino-auto-feature-visualization Path execution :
python -m pip install --upgrade pip
pip install -r requirements.txt
The third step , Download the model and complete the transformation
omz_downloader --list models.txt
omz_converter --list models.txt
Here we are , The experimental environment is set up . All configuration and setting parameters of the experimental program are hard coded in the source code , You need to Manually modify Source code to change the test configuration , Pictured 1-4 Shown .

chart 1-4 Manually modify the configuration in the source code
AUTO Plug ins automatically switch computing devices
GPU The plug-in needs to be in the GPU Before starting the reasoning on IR The model is compiled as OpenCL Model . This model compilation process may take a long time , for example 10 second , Will delay the application from starting reasoning , Make the user experience when the application starts bad .
To hide this GPU Model compilation delay ,AUTO The plug-in will be in GPU Model compilation is performed using CPU Perform reasoning tasks ; When GPU After the model is compiled ,AUTO The plug-in will automatically transfer the inference computing device from CPU Switch to GPU, Such as chart 1-5 Shown .
chart 1-
5 AUTO Plug ins automatically switch computing devices
Observe the behavior of automatically switching computing devices
AUTO The plug-in will be based on the device priority [1]: dGPU > iGPU > VPU > CPU, To choose the best computing device . When the auto plug-in selects GPU As the best equipment , Reasoning device switching will occur , To hide the first inference delay .
Please note that , The reasoning delay before and after device switching is different ; Besides , It is inferred that the delay fault may occur at the moment of device switching , Pictured 1-6 Shown .
As shown in the figure 1-6 Shown , Set up auto-test-latency-graph.py The configuration parameter is :
cfg['PERFORMANCE_HINT'] = ['THROUGHPUT', 'LATENCY'][0]
And run the command :
python auto-test-latency-graph.py
Open at the same time Windows Task manager , Observe CPU and iGPU Utilization ratio .

chart 1-6 config={“PERFORMANE_HINT”:”THROUGPUT”} The execution of
PERFORMANCE_HINT Set up
Such as 1.1.2 Section ,AUTO The execution behavior of the plug-in depends on compile_model() Methodical config Parametric PERFORMANCE_HINT Set up , As shown in the table 1-1 Shown :
surface 1-1 PERFORMANCE_HINT Set up
PERFORMANCE_HINT | Application scenarios | Whether to start Auto Batching? |
'THROUGHPUT' | Non real-time large-scale reasoning and computing tasks | yes |
'LATENCY' | Real time or near real time application tasks | no |
Set up auto-test-latency-graph.py The configuration parameter is :
cfg['PERFORMANCE_HINT'] = ['THROUGHPUT', 'LATENCY'][1]
And run the command :
python auto-test-latency-graph.py
Open at the same time Windows Task manager , Observe CPU and iGPU Utilization ratio , The operation result is as shown in the figure 1-7 Shown .

chart 1-7 config={“PERFORMANE_HINT”:”LATENCY”} The execution of
Through the experiment , We can find out , According to config Parameter setting , bring AUTO Plug ins can work in different modes :
- stay Latency Pattern , It doesn't start automatically Auto Batching, After performing the device switching ,GPU The reasoning delay on is very small , Without shaking .
- stay THROUGHPUT Pattern , Auto start Auto Batching, After performing the device switching ,GPU The reasoning delay on is large , And it shakes .
Next , This article will discuss Auto Batching Influence on the behavior of reasoning and calculation .
Do it yourself Auto Batching Characteristics of
Such as 1.1.2 Section , Automatic batch execution combines multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request , Pictured 1-8 Shown .

chart 1-8 Auto Batching Implementation process of
Auto Batching When a specified number of asynchronous inference requests or timers are collected ( Default timeout =1,000 millisecond ) Start the batch inference calculation (batch-inference), Pictured 1-9 Shown .

chart 1-9 Start batch inference calculation
Auto Batching When prohibited
Auto Batching When prohibited , All inference requests are processed separately .
Please configure and run auto-test.py.
Device: AUTO
Config: { 'PERFORMANCE_HINT': 'LATENCY'}
niter: 20 , interval: 30 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS1
Number of infer requests: 1
The operation result is as shown in the figure 1-10 Shown , It can be seen that each inference request is processed separately .

chart 1-10 Auto Batching The result of the operation when it is prohibited
Auto Batching When enabled
Auto Batching When enabled , Asynchronous inference requests will be bound and processed as multi batch inference requests . When the reasoning is done , The results are distributed to each asynchronous inference request and returned . It should be noted that : Batch inference computation does not guarantee the inference order of asynchronous inference requests .
Please configure and run auto-test.py.
Device: GPU
Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT', 'ALLOW_AUTO_BATCHING': 'YES'}
niter: 200 , interval: 30 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS64
Number of infer requests: 16
The operation result is as shown in the figure 1-11 Shown , Visible every 16 Reasoning requests are combined into a batch for batch reasoning calculation , The order of reasoning is not guaranteed .

chart 1-11 Auto Batching Operation results when enabled
Auto Batching It will lead to a longer reasoning delay
Due to the long default timeout ( Default timeout = 1,000ms), In the case of low reasoning request frequency, a long reasoning delay may be introduced .
because Auto Batching Will wait for the specified number of inference requests to enter or the timeout timer timed out , In the case of low reasoning frequency , It cannot collect enough inference requests to start the batch inference computation within the specified timeout , therefore , The submitted reasoning request will be postponed , Until the timer times out , This will introduce greater than timeout Reasoning delay set .
To solve the above problems , The user can go through AUTO_BATCH_TIMEOUT The configuration parameter specifies the timeout time , To minimize this impact .
Please use AutoBatching Default timeout, function auto-test.py.
Device: GPU
Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT'}
niter: 20, interval: 300 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS64
Number of infer requests: 64
The operation result is as shown in the figure 1-12 Shown , Because every time I can't timeout A specified number of inference requests were collected within the time , This leads to a high delay in reasoning requests .
![]()
chart 1-12 timeout=1000ms Running results
Please configure AutoBatching Of timeout=100ms, And then run auto-test.py.
Device: GPU
Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT', 'AUTO_BATCH_TIMEOUT': '100'}
niter: 20 , interval: 300 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS64
Number of infer requests: 16

chart 1-13 timeout=100ms Running results
The operation result is as shown in the figure 1-13 Shown , timeout=100ms Within time , Only one inference request can be collected .
Auto Batching Best practices
in summary ,Auto Batching Best programming practices for :
- Remember , By default Auto Batching It will not be enabled .
- Only in {'PERFORMANCE_HINT': 'THROUGHPUT', 'ALLOW_AUTO_BATCHING': 'YES'} when ,Auto Batching To enable .
- If your application can Submit reasoning requests continuously at a high frequency , Please use automatic batch processing
- Warning : If your application intermittently submits reasoning requests , The last inference request may have an unexpected long delay .
- If the reasoning rhythm or frequency is low , That is, the reasoning frequency is much lower than AUTO_BATCH_TIMEOUT( The default is 1,000 millisecond ), Do not turn on automatic batch processing .
- • You can use AUTO_BATCH_TIMEOUT Parameter to change the timeout setting of automatic batch processing , To minimize unwanted long delays , The unit of parameter value is “ms”.
- If you know the optimal batch size for your workload , Please use PERFORMANCE_HINT_NUM_REQUESTS Specify the appropriate batch quantity , namely {'PERFORMANCE_HINT_NUM_REQUESTS':'4'}. meanwhile , With GPU For example ,AUTO The plug-in will use the available memory in the background , Model accuracy, etc. to calculate the optimal batch size
summary
This section gives AUTO Plug ins and Auto Batching Quick summary of , As shown in the table 1-2 Shown .
surface 1-2 AUTO Plug in and automatic batch execution quick summary table
Automatic device selection | Automatic Batching | |
describe | - Enumerate the available devices on the system , Select the best equipment and use it for reasoning - By using CPU Start reasoning to hide GPU Model compilation time , After compiling, switch to GPU | Combine multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request |
advantage | - Developers do not need to do detailed hardware configuration - Applications can take advantage of the best performance of the system - Shorter first inference delay :AUTO Plug ins can be hidden GPU Model compilation time | - Equipment utilization and efficiency will be improved - Developers can enjoy multi batch throughput with minimal programming effort |
shortcoming | * Not for people who need consistent and predictable performance * The reasoning performance will be different before and after device switching ( for example ,“CPU”->“GPU”) * Reasoning performance may decline at the time of device switching ( In the order of a few seconds ) | * Only in GPU Available on the * Default timeout =1,000 millisecond . This can cause unexpected long latency performance problems . |
Default Enable ? | Not started by default . You need to specify “AUTO” As the plug-in name . | Enabled by default Limited to GPU |
how start-up | Appoint “AUTO” As the plug-in name | ALLOW_AUTO_BATCHING=YES Is the default value . 1. ALLOW_AUTO_BATCHING=YES, device=GPU, PERFORMANCE_HINT=THROUGHPUT 2. Appoint “BATCH:GPU” As equipment name |
additional Be careful | Default device selection priority :dGPU > iGPU > VPU > CPU Important note : If AUTO The plug-in can choose “GPU” As the final computing device , also PERFORMANCE_HINT=THROUGHPUT Set , Automatic batch processing will be enabled . | How to disable Auto Batching? Use compile_model() or set_property(): 1. Set up ALLOW_AUTO_BATCHING = NO, perhaps 2 Appoint PERFORMANCE_HINT = LATENCY |
边栏推荐
- What is the reason why maxcompute is sometimes particularly slow to execute SQL queries
- 首次曝光!唯一全域最高等级背后的阿里云云原生安全全景图
- [deeply understand tcapulusdb technology] transaction execution of document acceptance
- Homekit and NFC support: smart Ting smart door lock SL1 only costs 149 yuan
- Go write permissions to file writefile (FileName, data, 0644)?
- 在线文本过滤小于指定长度工具
- How about stock online account opening and account opening process? Is it safe to open a mobile account?
- Quarkus+saas multi tenant dynamic data source switching is simple and perfect
- Tuikit audio and video low code solution navigation page
- 64 channel telephone +2-channel Gigabit Ethernet 64 channel PCM telephone optical transceiver voice telephone to optical fiber
猜你喜欢

Go write file permission WriteFile (filename, data, 0644)?
![[Yunzhou said live room] - digital security special session will be officially launched tomorrow afternoon](/img/56/a6a9fbba0a9fc212883b469bb857c5.png)
[Yunzhou said live room] - digital security special session will be officially launched tomorrow afternoon

Cloud native essay deep understanding of ingress

Quarkus+saas multi tenant dynamic data source switching is simple and perfect

父母-子女身高数据集的线性回归分析

Go write permissions to file writefile (FileName, data, 0644)?

Broadcast level E1 to aes-ebu audio codec E1 to stereo audio XLR codec

CRMEB 二开短信功能教程

Wallys/DR6018-S/ 802.11AX MU-MIMO OFDMA / 2* GE PORTS/WIFI 6e / BAND DUAL CONCURRENT

支持HomeKit、NFC:智汀智能门锁SL1仅需要149元
随机推荐
Groovy map operation
Esp32-c3 introductory tutorial problem ⑦ - fatal error: ESP_ Bt.h: no such file or directory ESP not found_ bt.h
前AMD芯片架构师吐槽,取消 K12 处理器项目是因为 AMD 怂了!
[website architecture] the unique skill of 10-year database design, practical design steps and specifications
MIT 6.031 Reading5 : Version Control学习心得
32-way telephone +2-way Gigabit Ethernet 32-way PCM telephone optical transceiver supports FXO port FXS voice telephone to optical fiber
windows 安装 MySQL
"Four highs" of data midrange stability | startdt Tech Lab 18
Go寫文件的權限 WriteFile(filename, data, 0644)?
How to use sed -i command
Principle analysis of three methods for exchanging two numbers
Filtre de texte en ligne inférieur à l'outil de longueur spécifiée
Hanyuan hi tech 8-way telephone +1-way 100M Ethernet RJ11 telephone optical transceiver 8-way PCM telephone optical transceiver
.Net怎么使用日志框架NLog
CDH mail alarm configuration
CRMEB 二开短信功能教程
Wallys/DR6018-S/ 802.11AX MU-MIMO OFDMA / 2* GE PORTS/WIFI 6e / BAND DUAL CONCURRENT
华三交换机配置SSH远程登录
LM05丨曾经的VIX(二代产品)
4E1 PDH optical transceiver 19 inch rack type single fiber transmission 20km E1 interface optical network optical transceiver