当前位置:网站首页>Record in detail the implementation of yolact instance segmentation ncnn
Record in detail the implementation of yolact instance segmentation ncnn
2022-06-27 09:42:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery
link :https://zhuanlan.zhihu.com/p/128974102
This article reprints self-knowledge , The author has authorized , Do not reprint without permission .
0x0 YOLACT Instance segmentation
https://urlify.cn/rURFry
The end-to-end phase completes the instance segmentation
Fast ,550x550 Picture in TitanXP Claim to reach 33FPS
Open source code ,pytorch Dafa is good !
0x1 reason
Throughout the github, Whether it's ncnn still ncnn Derivative projects , classification , testing , location , feature extraction ,OCR, Style change ....
However , No instance partition is found , That someone sent a issue, And asked by name to do YOLACT Instance segmentation https://github.com/Tencent/ncnn/issues/1679
Well, then write a YOLACT Example , By the way, how to use ncnn Implement algorithms like this that require post-processing
0x2 pytorch test
YOLACT In the project YOLACT++ Model , Faster , better , however YOLACT++ It uses a classic operation that is not friendly to deployment deformable convolution
Pretend not to see , Let's go download YOLACT Model

newly build weights Folder , download yolact_resnet50_54_800000.pth
according to README instructions , Take a picture to see the effect
$ python eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.15 --top_k=15 --image=test.jpg

0x3 Remove post-processing Export onnx
Directly modifying eval.py Of evalimage, Replace the result display with onnx export
def evalimage(net:Yolact, path:str, save_path:str=None):
frame = torch.from_numpy(cv2.imread(path)).cuda().float()
batch = FastBaseTransform()(frame.unsqueeze(0))
preds = net(batch)
torch.onnx._export(net, batch, "yolact.onnx", export_params=True, keep_initializers_as_inputs=True, opset_version=11)
according to YOLACT issue Information in ,yolact.py At the beginning JIT You have to turn it off to export onnx
# As of March 10, 2019, Pytorch DataParallel still doesn't support JIT Script Modules
use_jit = False
YOLACT The post-processing part is very pythonic, This direct export does not work , Remove post-processing from the model , Easy to export and convert
Even if onnx Can export post-processing , It's not recommended either
The post-treatment part is not standardized , The implementation details of each project author are also different , Such as a variety of nms and bbox Calculation method ,ncnn It's hard to use a unified op Realization (caffe-ssd Because there is only one version , So there is implementation )
Post processing in onnx Will be converted into a big lump of glue op, Very trivial , It is inefficient to implement in the framework
onnx Most of the glue op,ncnn Does not support or has compatibility problems , such as Gather etc. , Cannot be used directly
therefore , Remove post-processing Export onnx, Is the correct conversion pytorch ssd And so on
open yolact.py, find class Yolact Of forward Method , hold detect Process removal , Return directly to the model pred_outs Output
# return self.detect(pred_outs, self)
return pred_outs;
Run the picture test again , Without post-processing yolact.onnx There is
$ python eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.15 --top_k=15 --image=test.jpg
0x4 simplify onnx
Directly derived onnx The model has a lot of glue op yes ncnn Don't support , use onnx-simplifier It's a routine operation
$ pip install -U onnx --user
$ pip install -U onnxruntime --user
$ pip install -U onnx-simplifier --user
$ python -m onnxsim yolact.onnx yolact-sim.onnx
There is a problem at this time
Graph must be in single static assignment (SSA) form, however '523' has been used as output names multiple times
Passing through github Look over issue, Confirm this is onnx bug
https://link.zhihu.com/?target=https%3A//github.com/onnx/onnx/issues/2613
fortunately onnx-simplifier Means have been provided to bypass
$ python -m onnxsim --skip-fuse-bn yolact.onnx yolact-sim.onnx
0x5 ncnn Model transformation and optimization
The previous simplification onnx When ,--skip-fuse-bn Skip the batchnorm Merge , But that's okay ,ncnn It also has this function
ncnnoptimize The tool implements the fusion of many operators , For example, the common convolution-batchnorm-relu wait
Last parameter 0 Express fp32 Model ,65536 Means reduced to fp16 Model , It can reduce the binary volume of the model
$ ./onnx2ncnn yolact-sim.onnx yolact.param yolact.bin
$ ./ncnnoptimize yolact.param yolact.bin yolact-opt.param yolact-opt.bin 0
0x6 Fine tune the model manually
Or this sentence , Not reporting an error does not necessarily mean that it can be used , First use netron Tool open param Look at the model structure

There are four outputs of this model , It's framed in red
Convolution Conv_263 1 1 617 619 0=32 1=1 5=1 6=8192 9=1
Permute Transpose_265 1 1 619 620 0=3
UnaryOp Tanh_400 1 1 814 815 0=16
Concat Concat_401 5 1 634 673 712 751 790 816 0=-3
Concat Concat_402 5 1 646 685 724 763 802 817 0=-3
Concat Concat_403 5 1 659 698 737 776 815 818 0=-3
Softmax Softmax_405 1 1 817 820 0=1 1=1
YOLACT The post-treatment of needs loc conf prior mask maskdim These things
At first, I can't see what these outputs correspond to , Let's see first shape
ncnn::Extractor ex = yolact.create_extractor();
ncnn::Mat in(550, 550, 3);
ex.input("input.1", in);
ncnn::Mat b620;
ncnn::Mat b816;
ncnn::Mat b818;
ncnn::Mat b820;
ex.extract("620", b620);// 32 x 138x138
ex.extract("816", b816);// 4 x 19248
ex.extract("818", b818);// 32 x 19248
ex.extract("820", b820);// 81 x 19248
Directly compile and run the discovery Concat layer crash, That is, the blue box in the figure ,Concat axis The parameter is negative 0=-3,ncnn Not yet
according to Concat Multiple inputs shape, It is found that the two-dimensional data is in h axis concat, Direct change to 0=0 Can replace
Concat Concat_401 5 1 634 673 712 751 790 816 0=0
Concat Concat_402 5 1 646 685 724 763 802 817 0=0
Concat Concat_403 5 1 659 698 737 776 815 818 0=0
b820 stay softmax Back , Be sure it is conf,shape 81x19248 Express 81 classification x 19248 individual prior
b816 shape 4x19248, Corresponds to each priorbox Of bbox Offset value
b818 shape 32x19248, according to YOLACT The post-processing of , It means maskdim, namely 32 The coefficient of a divided heat map
b620 shape 32x138x138, namely 32 A split heat map , There's a front. permute Layer is NCHW->NHWC Transformation prior No output in the model
ncnn Handle b620 NHWC shape inconvenient , Change it to extract permute Before NCHW data b619, That is, the green box in the figure outputs
ncnn::Extractor ex = yolact.create_extractor();
ncnn::Mat in(550, 550, 3);
ex.input("input.1", in);
ncnn::Mat maskmaps;
ncnn::Mat location;
ncnn::Mat mask;
ncnn::Mat confidence;
ex.extract("619", maskmaps);// 138x138 x 32
ex.extract("816", location);// 4 x 19248
ex.extract("818", mask);// maskdim 32 x 19248
ex.extract("820", confidence);// 81 x 19248
0x7 Generate prior
The original code is in yolact.py class PredictionModule make_priors, Add some print Get it all priorbox Generate rule hyperparameters
const int conv_ws[5] = {69, 35, 18, 9, 5};
const int conv_hs[5] = {69, 35, 18, 9, 5};
const float aspect_ratios[3] = {1.f, 0.5f, 2.f};
const float scales[5] = {24.f, 48.f, 96.f, 192.f, 384.f};
YOLACT Of prior The four values are center_x center_y box_w box_h, range 0~1
The author wrote a bug,box_h = box_w Fixed square , We also need to put this bug To reproduce
// make priorbox
ncnn::Mat priorbox(4, 19248);
{
float* pb = priorbox;
for (int p = 0; p < 5; p++)
{
int conv_w = conv_ws[p];
int conv_h = conv_hs[p];
float scale = scales[p];
for (int i = 0; i < conv_h; i++)
{
for (int j = 0; j < conv_w; j++)
{
// +0.5 because priors are in center-size notation
float cx = (j + 0.5f) / conv_w;
float cy = (i + 0.5f) / conv_h;
for (int k = 0; k < 3; k++)
{
float ar = aspect_ratios[k];
ar = sqrt(ar);
float w = scale * ar / 550;
float h = scale / ar / 550;
// This is for backward compatability with a bug where I made everything square by accident
// cfg.backbone.use_square_anchors:
h = w;
pb[0] = cx;
pb[1] = cy;
pb[2] = w;
pb[3] = h;
pb += 4;
}
}
}
}
}
0x8 YOLACT Whole process realization
Pretreatment part
data/config.py Yes ImageNet Of MEAN STD,BGR The order
# These are in BGR and are for ImageNet
MEANS = (103.94, 116.78, 123.68)
STD = (57.38, 57.12, 58.40)
YOLACT Actual input RGB, To change the order
const int target_size = 550;
int img_w = bgr.cols;
int img_h = bgr.rows;
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, target_size, target_size);
const float mean_vals[3] = {123.68f, 116.78f, 103.94f};
const float norm_vals[3] = {1.0/58.40f, 1.0/57.12f, 1.0/57.38f};
in.substract_mean_normalize(mean_vals, norm_vals);
Post processing part
This section and SSD Post processing is very similar ,sort nms These codes are boring ncnn/src/layer/detectionoutput.cpp
The only thing to pay attention to is bbox Generate and SSD Dissimilarity , Use center_x center_y box_w box_h Realization ,YOLACT The original code is layers/box_util.py decode function
YOLACT Yes fastnms Method layers/funstions/detection.py, Faster , But I think it's normal nms After all, it's off the shelf code , It works very well
// generate all candidates for each class
for (int i=0; i<num_priors; i++)
{
// find class id with highest score
// start from 1 to skip background
// ignore background or low score
if (label == 0 || score <= confidence_thresh)
continue;
// apply center_size to priorbox with loc
float var[4] = {0.1f, 0.1f, 0.2f, 0.2f};
float pb_cx = pb[0];
float pb_cy = pb[1];
float pb_w = pb[2];
float pb_h = pb[3];
float bbox_cx = var[0] * loc[0] * pb_w + pb_cx;
float bbox_cy = var[1] * loc[1] * pb_h + pb_cy;
float bbox_w = (float)(exp(var[2] * loc[2]) * pb_w);
float bbox_h = (float)(exp(var[3] * loc[3]) * pb_h);
float obj_x1 = bbox_cx - bbox_w * 0.5f;
float obj_y1 = bbox_cy - bbox_h * 0.5f;
float obj_x2 = bbox_cx + bbox_w * 0.5f;
float obj_y2 = bbox_cy + bbox_h * 0.5f;
// clip inside image
// append object candidate
}
// merge candidate box for each class
for (int i=0; i<(int)class_candidates.size(); i++)
{
// sort + nms
}
// sort all result by score
// keep_top_k
Split graph generation
maskmaps the truth is that 32 Zhang 138x138 Dimensional heat map , Each of the previous outputs object Have their own 32 individual float coefficient
object The split graph of is each heat graph * Corresponding coefficient , Sum up , Zoom in to original size , Two valued , Last crop inside Output box


unnatrual It's beautiful !
0x9 Add learning materials
alas ? There are also supplementary learning materials ?
ncnn The implementation code and the improved model have been uploaded to github
https://link.zhihu.com/?target=https%3A//github.com/Tencent/ncnn
The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world
download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~
边栏推荐
- R语言plotly可视化:plotly可视化基础小提琴图(basic violin plot in R with plotly)
- Video file too large? Use ffmpeg to compress it losslessly
- 借助原子变量,使用CAS完成并发操作
- 详解各种光学仪器成像原理
- js中的数组对象
- [vivid understanding] the meanings of various evaluation indicators commonly used in deep learning TP, FP, TN, FN, IOU and accuracy
- SVN版本控制器的安装及使用方法
- Your brain is learning automatically when you sleep! Here comes the first human experimental evidence: accelerate playback 1-4 times, and the effect of deep sleep stage is the best
- Quartz (timer)
- unity--newtonsoft.json解析
猜你喜欢
详细记录YOLACT实例分割ncnn实现
Decompile the jar package and recompile it into a jar package after modification
你睡觉时大脑真在自动学习!首个人体实验证据来了:加速1-4倍重放,深度睡眠阶段效果最好...
ucore lab4
Apache POI的读写
ucore lab4
高等数学第七章微分方程
有關二叉樹的一些練習題
Prometheus alarm process and related time parameter description
别再用 System.currentTimeMillis() 统计耗时了,太 Low,StopWatch 好用到爆!
随机推荐
ucore lab3
This application failed to start because it could not find or load the QT platform plugin
反编译jar包,修改后重新编译为jar包
Take you to play with the camera module
快速入门CherryPy(1)
邮件系统(基于SMTP协议和POP3协议-C语言实现)
Pakistani security forces killed 7 terrorists in anti-terrorism operation
C# Any()和AII()方法
main()的参数argc与argv
技术与业务同等重要,偏向任何一方都是错误
Preliminary understanding of pytorch
谷歌浏览器 chropath插件
Brief introduction to SSL encryption process
Unity - - newtonsoft. Analyse json
Privacy computing fat offline prediction
CPU设计(单周期和流水线)
【OpenCV 例程200篇】212. 绘制倾斜的矩形
使用Aspose.cells将Excel转成PDF
你睡觉时大脑真在自动学习!首个人体实验证据来了:加速1-4倍重放,深度睡眠阶段效果最好...
不容置疑,这是一个绝对精心制作的项目