当前位置：网站首页>What is the learning path for model deployment optimization?

What is the learning path for model deployment optimization?

2022-06-24 05:30:00 【Aceyclee】

The direction of model deployment optimization is actually quite broad . Complete the training from the model , Finally, deploy the model to the actual hardware , The whole process will involve many different levels of work , Each link has different requirements for technical points .

The deployment process can be roughly divided into the following steps ：

Model deployment process

One 、 Model transformation

After getting the model from the training framework , Convert to the corresponding model format according to the requirements . The choice of model format is usually based on the business side of the company SDK The needs of , Usually it is caffe A model or onnx Model , To facilitate the model to adapt between different frameworks .

The work of this link requires the corresponding training framework and caffe/onnx And so on .

frequently-used Pytorch and TensorFlow And other frameworks have very mature communities and corresponding blogs or tutorials ;caffe and onnx The model format also has many public documents for reference and learning .

Even if there is no article to refer to , Fortunately, both are open source , You can still find the answer by reading the source code and sample code .

Two 、 Model optimization

The model optimization here refers to the general optimization independent of the back end , Like constant folding 、 Arithmetic optimization 、 Dependency optimization 、 Function optimization 、 Operator fusion and model information simplification .

Some training frameworks will include some of the above optimization processes when the training model is exported , At the same time, if the model format is converted , Different IR The difference between representations may introduce some redundant or optimizable calculations , Therefore, some model optimization operations are usually carried out after model transformation .

The work of this phase requires the execution process of the calculation diagram 、 each op Calculation definition of 、 Have a certain understanding of the performance model of program operation , Only in this way can we know that if we optimize the model , How to ensure that the optimized model has better performance .

The deeper we get to know , The more potential performance of the model can be mined .

3、 ... and 、 The model of compression

Broadly speaking , Model compression is also part of model optimization . Model compression itself includes many methods , Like pruning 、 Distillation 、 Quantification and so on . The basic purpose of model compression is to obtain a smaller model , Reduce storage requirements while reducing the amount of Computing , In order to achieve the purpose of acceleration .

The work of this link requires the compression algorithm itself 、 Algorithm tasks and model structure design involved in the model 、 Have a certain understanding of the three aspects of the hardware platform computing process .

When the accuracy of the model decreases due to the model compression operation , Knowledge of model algorithms , Have a good understanding of the hardware calculation details of the model , In order to analyze the reasons for the decline in accuracy , And give targeted solutions .

More important for model compression is often engineering experience , Because when deploying the same model on different hardware back ends , Due to the difference of hardware calculation , The effect on accuracy is often different , This aspect can only be continuously improved by accumulating engineering experience .

OpenPPL It is also gradually opening up its own model compression tool chain , And the model algorithm mentioned above 、 Compression algorithm and hardware platform adaptation .

Four 、 Model deployment

Model deployment is the most complicated part of the whole process . In terms of Engineering , The main core task is model packaging 、 Model encryption , And carry on SDK encapsulation .

In an actual product , Multiple models are often used .

Model packaging refers to the pre-processing and post-processing involved in the model , And integrating multiple models , And add some other descriptive documents . The format of model packaging and the method of model encryption are specific SDK relevant . Skills and skills mainly involved in this link SDK Closer development .

In terms of function , The biggest impact on the final performance of the deployment is definitely SDK Back end libraries included in , That is, the inference base of the actual running model . Developing a high-performance inference base requires a wider range of skills , And professional .

The programming idea of parallel computing is universal on different platforms , But different hardware architectures have their own characteristics , The development ideas of inference base are also different , This requires a certain understanding of the architecture of the development backend .

Specific to the programming of different architectures , It is recommended to refer to the current open source reasoning libraries of major manufacturers for further study .

Try it ：「 from 0 To 1, Use OpenPPL Achieve one AI Application of reasoning 」

Join us ：「OpenPPL Recruitment ！」

welcome star：「openppl-public/ppl.nn」