当前位置:网站首页>How deci and Intel can achieve up to 16.8x throughput improvement and +1.74% accuracy improvement on mlperf
How deci and Intel can achieve up to 16.8x throughput improvement and +1.74% accuracy improvement on mlperf
2022-06-23 13:36:00 【Intel edge computing community】
author : Amos Gropp — Deci AI and Guy Boudoukh — Intel Labs
double translate : Li Yi Wei
MLPerf Submission overview
MLPerf From the academic community , Industry and research laboratories AI Non profit organizations established by leaders .MLPerf The goal is to learn hardware for machines , The training and reasoning performance of software and services provides standardized and unbiased benchmarks .MLPerf Tests will be conducted to continuously improve and develop these benchmarks , Each benchmark consists of a model 、 Data sets 、 Definition of quality objectives and delay constraints .
This year, ,Deci Cooperated with Intel to submit computer vision and NLP Category of joint proposals . For computer vision , We ask ResNet50 The class submits three models . Our submission is made under the offline scenario of the open Department .
Deci And Intel's submission results
We submitted submissions on two different hardware platforms : One is 12 nucleus Intel Cascade Lake CPU, The other is different 4 Nuclear and 32 Nuclear Intel Ice Lake CPU. Model in 32 On the batch size of Into the Optimized , And use Internet Er OpenVINO Toolkit quantification by INT8. And benchmark ResNet50 type Number (32 position ) comparison ,Deci The accuracy of this method has been improved 1.74%, Throughput improved 10 To 16.8 times , Specific take Depends on the hardware type , As shown in the table 1 Shown .
In order to further distinguish due to the use of AutoNAC Generated DeciNets And go ahead Change Into the , I We will compile 8 position ResNet-50 With me The models submitted by us are compared . This shows that Deci Of AutoNAC Technology The skill has been improved 2.8 Times to 4 times .

surface 1: Throughput of different hardware types [ sample / second ] result
surface 1: Offline solution — throughput by image/s. Two types of hardware were tested . The second column describes how to make use OpenVINO Compilation of ResNet-50 To 32 position . The third column Show ResNet-50, Compiled into 8 position , Marked as DeciNet The column of Show AutoNAC Generated 8 position Compilation of DeciNet Model .

How do we achieve these results :
The starting point for submission is ResNet-50, stay ImageNet On the accuracy of by 76.1%. The original purpose The goal is to maximize throughput , While maintaining the same accuracy . For better performance , We applied Deci Proprietary automatic neural architecture construction (AutoNAC) Technology Technique .Deci Of AutoNAC It is a kind of dependence Architecture optimization algorithms that depend on data and hardware . AutoNAC It can be a deep learning task 、 Any given combination of data sets and reasoning hardware automatically generates a first-class deep learning model . application AutoNAC It's one A seamless process , In the process , Users provide trained models 、 Training and testing data sets , And access the hardware platform on which the model should be deployed . then ,AutoNAC The new low delay will be calculated automatically 、 High throughput or low power model , To maintain the accuracy of the original model .AutoNAC Optimize Process diagram 1 Shown .

chart 1:Deci Of AutoNAC technological process
With the standard NAS Different technology , AutoNAC through Excessive use of baseline models starts the search process from a relatively good starting point . Including several A layer that has been trained .
AutoNAC Applied to the architecture discrete space set according to the allowed neural operations supported in the target hardware .( proper ) The search algorithm itself depends on the prediction model to determine the effective optimization steps . This algorithm leads to very fast convergence time , Usually better than known NAS Technology The technique is several orders of magnitude lower . Besides ,AutoNAC Main advantages of Point to First, it can Enough to consider all levels of the inference stack and optimize the baseline Architecture , While maintaining accuracy and considering the target hardware 、( Hardware relevant ) Compile and quantify .
last year ,AutoNAC A new series of image classification models has been found , be called DeciNets, It is in accuracy and The runtime performance is better than the well-known state-of-the-art model .
We submit to MLPerf Of DeciNets from AutoNAC Generate , Designed specifically for use in Intel Of Cascade Lake and Ice Lake CPU On When running For best performance .
Improve year by year
Two years ago. , namely 2020 year ,Deci.ai And Intel submit the model to the same MLPerf Category . When evaluating each kernel , We can see , Compared to previous submissions , Throughput performance improved by about 37%.

surface 2: Comparison of current and previous submission results — Throughput per kernel
Deep learning reasoning CPU What's the next step for
The model we submit will ResNet-50 Improved throughput performance 16.8 times , And the accuracy is improved 1.74%. This is through Intel OpenVINO The compiler and Deci Of AutoNAC Generated DeciNets The synergy between models .
It marks the Deci With Intel in CPU On Another important milestone in the ongoing collaboration to achieve deep learning and excellent reasoning . This significant improvement in accuracy and throughput has many direct implications .
Use DeciNets, In the past, due to too intensive resources, it was impossible to CPU On The tasks performed are now possible . Besides , These tasks will see significant performance improvements : By using DeciNets, Model in GPU and CPU Reasoning performance on The gap will be halved , Without sacrificing the accuracy of the model .
Deci Of AutoNAC Technology and Automatically generated DeciNets Approved Ready for deployment and commercial use , And can be easily integrated to support any type of hardware Computer Visual task .
Configuration details :
c6i.2xlarge
8 vcpu ( Intel Xeon platinum 8375C processor ), 16 GB Total memory , bios: SMBIOS 2.7, ucode: 0xd000331, Ubuntu 18.04.5 LTS, 5.4.0–1069-aws, gcc 9.3.0 compiler . The baseline :Resnet50, precision =76.4; DeciNet, precision =78.14
c6i.16xlarge
64 vcpu ( Intel Xeon platinum 8375C processor ), 128 GB Total memory , bios: SMBIOS 2.7, ucode: 0xd000331, Ubuntu 18.04.5 LTS, 5.4.0–1069-aws, gcc 9.3.0 compiler . The baseline :Resnet50, precision =76.4; DeciNet, precision =78.14
m5zn.6xlarge
24 vcpu ( Intel Xeon platinum 8252C processor ), 96 GB Total memory , bios: SMBIOS 2.7, ucode: 0x500320a, Ubuntu 18.04.5 LTS, 5.4.0–1069-aws, gcc 9.3.0 compiler . The baseline :Resnet50, precision =76.4; DeciNet, precision =78.14
Notices and disclaimers
Performance depends on usage 、 To configure And other factors . stay www.Intel.com/PerformanceIndex Learn more .
Performance results are based on tests as of the date shown in the configuration , May not reflect all publicly available updates . of Configuration details , See backup . No product or component is Absolutely safe .
Your fees and results may vary .
Intel technology may require supported hardware 、 Software Or service activation .
Internet Our company . Intel 、 The Intel logo and other Intel logos are trademarks of Intel Corporation or its subsidiaries . Other names and brands mentioned herein may be the property of other parties .
边栏推荐
- R language uses the polR function of mass package to build an ordered multi classification logistic regression model, and uses exp function and coef function to obtain the corresponding odds ratio of
- 理财产品长期是几年?新手最好买长期还是短期?
- < Sicily> 1000. number reversal
- Cifar announces the second stage pan Canadian AI strategy
- AAIG看全球6月刊(上)发布|AI人格真的觉醒了吗?NLP哪个细分方向最具社会价值?Get新观点新启发~
- 【网站架构】10年数据库设计浓缩的绝技,实打实的设计步骤与规范
- Go写文件的权限 WriteFile(filename, data, 0644)?
- In flinksql, the Kafka flow table and MySQL latitude flow table are left joined, and the association is made according to I'd. false
- 实战监听Eureka client的缓存更新
- 能把SAP系统玩成鸡肋的公司,太有才了!
猜你喜欢

#云原生征文#深入了解Ingress

DBMS in Oracle_ output. put_ How to use line

前AMD芯片架构师吐槽,取消 K12 处理器项目是因为 AMD 怂了!

How did Tencent's technology bulls complete the overall cloud launch?

【深入理解TcaplusDB技术】单据受理之事务执行

The two 985 universities share the same president! School: true

深入剖析MobileNet和它的变种

quartus调用&设计D触发器——仿真&时序波验证

父母-子女身高数据集的线性回归分析

Quickly understand the commonly used asymmetric encryption algorithm, and no longer have to worry about the interviewer's thorough inquiry
随机推荐
< Sicily> 1001. Rails
R language uses the polR function of mass package to build an ordered multi classification logistic regression model, and uses the summary function to obtain the summary statistical information of the
How to use sed -i command
RestCloud ETL解决shell脚本参数化
How did Tencent's technology bulls complete the overall cloud launch?
Germancreditdata of dataset: a detailed introduction to the introduction, download and use of germancreditdata dataset
父母-子女身高数据集的线性回归分析
Configure SSH Remote Login for H3C switch
DBMS in Oracle_ output. put_ How to use line
What a talented company that can turn SAP system into a chicken rib!
CRMEB 二开短信功能教程
Homekit and NFC support: smart Ting smart door lock SL1 only costs 149 yuan
Restcloud ETL resolves shell script parameterization
栈和队列的基本使用
你管这破玩意儿叫 MQ?
Analysis and solution of connection failure caused by MySQL using replicationconnection
R language uses the polR function of mass package to build an ordered multi classification logistic regression model, and uses exp function and coef function to obtain the corresponding odds ratio of
What is the version of version 1.54 when connecting to Oracle?
The two 985 universities share the same president! School: true
< Sicily> 1000. number reversal