当前位置：网站首页>Image segmentation based on deep learning: network structure design

Image segmentation based on deep learning: network structure design

2022-06-25 15:14:00 【m0_ sixty-one million eight hundred and ninety-nine thousand on】

Preface

This article is reproduced in Image segmentation based on deep learning ： Network structure design

The article summarizes the use of CNNs When image semantic segmentation , Innovation for network structure , These innovations mainly include the design of new neural architecture （ Different depths 、 Width 、 Connectivity and topology ） And the design of new components or layers . The former is to use existing components to assemble complex large-scale networks , The latter is more inclined to design the underlying components .

1. Image semantic segmentation network structure innovation

1.1 FCN The Internet

FCN Overall architecture diagram

Alone will FCN The network is listed because FCN Network is the first network to solve the problem of semantic segmentation from a new perspective . The previous image semantic segmentation network based on neural network uses the image block centered on the pixel to be classified to predict the label of the central pixel , It's usually used CNN+FC Strategies for building networks , Obviously, this method cannot take advantage of the global context information of the image , And the reasoning speed per pixel is very low ; and FCN The network abandons the full connection layer FC, All use convolution layer to build the network , Through transpose convolution and different layer feature fusion strategy , So that the network output is directly the prediction of the input image mask, Efficiency and accuracy have been greatly improved .

FCN Schematic diagram of feature fusion of different layers

Innovation points ： Fully convolutional network （ Not included fc layer ）; Transposition convolution deconv（ deconvolution ）; Jump connection of characteristic diagrams of different layers （ Add up ）

1.2 Codec structure （Enconder-decoder）

SegNet and FCN The idea of network is basically the same . The encoder part uses VGG16 Before 13 Layer convolution , The difference is Decoder part Upsampling The way .FCN By combining the feature map deconv The obtained result is added with the characteristic map of the corresponding size of the encoder to obtain the up sampling result ; and SegNet use Encoder part maxpool The index of Decoder Partial up sampling （ The original description ：the decoder upsamples the lower resolution input feature maps. Speciﬁcally, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling.）.

Innovation points ：Encoder-Decoder structure ;Pooling indices.

SegNet The Internet

SegNet And FCN Of Upsample Methods contrast

U-Net The network was originally designed for biomedical images , But because of its excellent performance , Today, UNet And its variants have been widely used in CV Each sub field .UNet Network by U Channel and short circuit channel （skip-connection） form ,U The channel is similar to SegNet Encoding and decoding structure , The coding part （contracting path） Feature extraction and capture context information , Decoding part （expanding path） The decoded feature map is used to predict the pixel label . The short channel improves the accuracy of the model and solves the problem of gradient disappearance , It should be noted that the short-circuit channel characteristic graph and the up sampling characteristic graph are spliced rather than added （ differ FCN）.

Innovation points ：U Type structure ; Short circuit channel （skip-connection）

U-Net The Internet

V-Net Network structure and U-Net similar , The difference is that the architecture adds jump connections , And use 3D The operator replaced 2D Operate to handle 3D Images （volumetric image）. And for the widely used subdivision indicators （ Such as Dice） To optimize .

V-Net The Internet

Innovation points ： amount to U-Net Online 3D edition

FC-DenseNet ( Hundred layer tiramisu network )（paper title: The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation） The network structure is composed of dense connection blocks （Dense Block） and UNet The architecture is built . The simplest version of the network consists of two down sampling paths and two up sampling paths . It also contains two horizontal jump connections , The feature map from the lower sampling path is spliced with the corresponding feature map in the upper sampling path . The connection modes in the upper sampling path and the lower sampling path are not exactly the same ： In the down sampling path , There is a jump splicing path outside each dense block , This leads to a linear increase in the number of characteristic graphs , There is no such operation in the upper sampling path .（ Many say , The abbreviation of this network can be Dense Unet, But there is a paper called Fully Dense UNet for 2D Sparse Photoacoustic Tomography Artifact Removal, It is a paper on removing artifacts in photoacoustic imaging , I've seen many blogs cite the illustrations in this paper to talk about semantic segmentation , It's not the same thing at all, okay =_=||, You can tell by yourself .）

FC-DenseNet（ Hundred layer tiramisu network ）

Innovation points ： The fusion DenseNet And U-Net The Internet （ From the perspective of information exchange , Dense connections are indeed more powerful than residual structures ）

Deeplab Series network is an improved version based on codec structure ,2018 year DeeplabV3+ The network VOC2012 and Cityscapes Excellent performance on data sets , achieve SOTA level .DeepLab The series has V1、V2、V3 and V3+ There are four papers . Briefly summarize the core contents of some papers ：

1) DeepLabV1： Fusion of convolutional neural network and probability graph model ：CNN+CRF, The accuracy of segmentation and positioning is improved ;

2) DeepLabV2：ASPP（ Expand space pyramid pool );CNN+CRF

3) DeepLabV3： improvement ASPP, More 1*1 Convolution and global average pooling （global avg pool）; The effects of cascade and parallel cavity convolution are compared .

Cascade cavity convolution

Parallel cavity convolution （ASPP）

4) DeepLabV3+： Add the idea of codec Architecture , Add a decoder module to expand DeepLabv3; The depth separable convolution is applied to ASPP And decoder module ; Will improve Xception As Backbone.

DeepLabV3+

in general ,DeepLab Core contribution of the series : Cavity convolution ;ASPP;CNN+CRF（ only V1 and V2 Use CRF, Should be V3 and V3+ The problem of fuzzy segmentation boundary is solved by depth network , The effect is better CRF Better ）

PSPNet（pyramid scene parsing network） By aggregating the context information of different regions , It improves the ability of the network to use global context information . stay SPPNet, The feature maps of different levels generated by pyramid pooling are finally flatten and concate get up , Then it is sent to the full connection layer for classification , Eliminated CNN It requires that the input size of image classification is fixed . And in the PSPNet in , The strategy used is ：poolling-conv-upsample, Then the feature map is obtained by splicing , Then label prediction .

PSPNet The Internet

Innovation points ： Multi scale pooling , Use the global prior knowledge to better understand the complex scene

RefineNet By refining the intermediate activation mapping and connecting it hierarchically to the combined multi-scale activation , At the same time, prevent sharpness loss . The network consists of independent Refine Module composition , Every Refine The module consists of three main modules , namely : Residual convolution unit (RCU), Multi resolution fusion (MRF) And chain remaining pool (CRP). The overall structure is a little similar U-Net, But a new combination is designed at the jump connection （ It's not simple concat）. Personally think that , In fact, this structure is very suitable for your own network design idea , You can add many other CV Used in the question CNN module, Moreover, with U-Net For the overall framework , Not too bad .

RefineNet The Internet

Innovation points ：Refine modular

1.3 Reduce the computational complexity of the network structure

There is also a lot of work to reduce the computational complexity of semantic segmentation networks . Some ways to simplify the deep network structure ： Tensor decomposition ; passageway / Network pruning ; Sparse connection . There are also some uses NAS（ Neural architecture search ） Replace manual design to search the structure of modules or the whole network , Of course AutoDL The required GPU Resources will dissuade a large number of people . therefore , Some people also use random search to search for much smaller ASPP modular , Then build the whole network model based on small modules .

Network lightweight design is a consensus in the industry , It is impossible for mobile terminal deployment to have one for each machine 2080ti, In addition, power consumption 、 Storage and other problems will also limit the popularization and application of the model . however 5G If it can be popularized , All data can be processed in the cloud , It will be very interesting . Of course , In the short term （ Decade ）,5G I don't know if it's feasible to deploy in all directions .

1.4 Network structure based on attention mechanism

Attention mechanism can be defined as ： Use subsequent layers / Feature map information to select and locate the most judgment in the input feature map （ Or significance ） Part of . It can be considered simply as a way of weighting the characteristic graph （ The weight is calculated by the network ）, According to the action mode of weight , The mechanism of attention can be divided into （CA） And spatial attention mechanism （PA）.FPA（Feature Pyramid Attention, Feature pyramid attention ） Network is a semantic segmentation network based on attention mechanism , It combines the attention mechanism with the spatial pyramid , To extract precise features for pixel level marking , Instead of using expanded convolution and artificially designed decoder network .

1.5 Network structure based on confrontation learning

Goodfellow Et al. 2014 In, a confrontational method was proposed to learn the depth generation model , Generative antagonistic network （GANs） Two models need to be trained at the same time ： Capture the generation model of data distribution G, And a discriminant model for estimating the probability of samples from training data D.

● G It's a generative network , It receives a random noise z（ random number ）, Generate an image from this noise

● D It's a discriminant network , Judge whether a picture is “ Actual ”. Its input parameter is x（ A picture ）, Output D（x） representative x For the probability of a real picture , If 1, On behalf of 100% It's a real picture , And the output is 0, It means that it can't be a real picture .

G The training procedure is to D Maximize the probability of error . It can be proved that in any function G and D In the space of , There is only one solution , bring G Reproduce the training data distribution , and D=0.5. During training , Generation network G The goal is to generate real pictures as much as possible to cheat the discrimination network D. and D The goal is to try to identify G The generated false image and real image . such ,G and D Constitute a dynamic “ The game process ”, The final equilibrium point is the Nash equilibrium point .. stay G and D The case defined by neural network , The whole system can be trained with back propagation .

GANs Schematic diagram of network structure

suffer GANs inspire ,Luc Et al. Trained a semantic segmentation network （G） And a confrontation network （D）, Confrontation network differentiation comes from ground truth Or semantic segmentation network （G） Split graph of .G and D Keep learning by playing games , Their loss function is defined as ：

GANs loss function

Review the original GAN Loss function ：

GANs The loss function of reflects the idea of a zero sum game , The original GANs The loss function is as follows ：

The loss is calculated at D（ Judging device ） Output , and D The output of is usually fake/true The judgment of the , Therefore, it can be considered that the binary classification cross entropy function is adopted in the whole . from GANs The form of the loss function of , The training is divided into two parts ：

First of all maxD part , Because training is usually to keep G（ generator ） Constant training D Of .D The goal of training is to distinguish fake/true, If we 1/0 representative true/fake, For the first item E Because the input is sampled from real data, we expect D(x) Tend to be 1, That is, the first item is bigger . Similarly, the second item E The input is sampled from G Generate the data , So we expect D(G(z)) Tend to be 0 Better , In other words, the second item is bigger . So this part is expected to make the whole bigger , That is to say maxD The meaning of the . This section only updates D Parameters of .

The second part is to keep D unchanged （ No parameter update ）, Training G, At this time, there is only the second item E It works , Here comes the key , Because we have to confuse D, So this will be label Set to 1( We know it is fake, That's why it's called confusion ), hope D(G(z)) The output is close to 1 Better , That is, the smaller this item, the better , This is it. minG. Of course, the discriminator can't be so easy to fool , So at this time, the discriminator will produce relatively large error , The error will update G, that G It will get better , I didn't lie to you this time , I can only work harder next time （ To quote https://www.cnblogs.com/walter-xh/p/10051634.html）. At this time, only update G Parameters of .

Look at it from another Angle GANs, Judging device （D） Equivalent to a special loss function （ It is composed of neural network , Different from traditional L1、L2、 Cross entropy equal loss function ）.

in addition GANs Special training methods , There are gradients that disappear 、 Problems such as mode crash （ At present, there seems to be a solution ）, But its design idea is indeed a great invention in the era of deep learning .

1.6 Summary

Most of the image semantic segmentation models based on deep learning follow the following rules - Decoder architecture , Such as U-Net. The research results in recent years show that , Expansion convolution and characteristic pyramid pool can improve U-Net Style network performance . In the 2 In the festival , Let's summarize , How to apply these methods and their variants to medical image segmentation .

2. Application of network structure innovation in medical image segmentation

This part introduces some network structure innovations in 2D/3D Application research results in medical image segmentation .

2.1 Segmentation method based on model compression

In order to realize real-time processing of high-resolution 2D/3D Medical images （ for example CT、MRI And histopathological images ）, Researchers have proposed a variety of methods to compress the model .weng Used by others NAS Technology is applied to U-Net The Internet , Got in CT,MRI And better organs on ultrasound images / Small networks with tumor segmentation performance .Brugger By using group normalization （group normalization ） and Leaky-ReLU（leaky ReLU function）, Redesigned U-Net framework , So that the network can 3D The storage efficiency of medical image segmentation is higher . Some people have also designed expanded convolution with less parameters module. Other methods of model compression include weight quantization （ Sixteen 、 Octet 、 Binary quantization ）、 Distillation 、 Pruning and so on .

2.2 code - Segmentation method of decoding structure

Drozdal This paper presents a simple method to apply before sending the image into the segmentation network CNN To normalize the original input image , The segmentation of single microscope image is improved 、 The liver CT、 prostate MRI Segmentation accuracy .Gu A method of preserving context information by using extended convolution in backbone network is proposed .Vorontsov A graph to graph network framework is proposed , Will have ROI The image is converted to no ROI Image （ For example, an image with a tumor is converted to a healthy image without a tumor ）, Then the tumor removed by the model is added to the new health image , So as to obtain the detailed structure of the object .Zhou Et al. Proposed a pair of U-Net Method for rewiring jump connection of network , And low dose in the chest CT Nodule segmentation in scanning , Nuclear segmentation in microscope images , abdomen CT The performance of liver segmentation in scanning and polyp segmentation in colonoscopy video were tested .Goyal take DeepLabV3 Applied to skin mirror color image segmentation , To extract the skin lesion area .

2.3 Segmentation method based on attention mechanism

Nie An attention model is proposed , Compared with baseline Model （V-Net and FCN）, It can segment the prostate more accurately .SinHa This paper proposes a network model based on multi-layer attention mechanism , be used for MRI Image abdominal organ segmentation .Qin An extended convolution module was proposed by et al , To keep 3D More details of medical images . There are many other medical image segmentation papers based on attention mechanism .

2.4 Segmentation network based on confrontation learning

Khosravan Put forward from CT Confrontation training network for pancreatic segmentation in scanning .Son Retinal image segmentation using generated countermeasure network .Xue The full convolution network is used as the segmentation network in the generation confrontation framework , Realized from MRI Image segmentation brain tumor . There are other successful applications GANs To medical image segmentation , Don't list one by one .

2.5 be based on RNN Segmentation model

Recursive neural network （RNN） It is mainly used to process sequence data , Long and short term memory network （LSTM） yes RNN An improved version of ,LSTM By introducing self ring （self-loops） So that the gradient flow can be maintained for a long time . In the field of medical image analysis ,RNN It is used to model the time dependence in image sequence .Bin A method of combining full convolution neural network with RNN Fused image sequence segmentation algorithm , The information in the time dimension is included in the segmentation task .Gao Used by others CNN and LSTM To the brain MRI The time relationship in the slice sequence is modeled , In order to improve the 4D Segmentation performance in image .Li Wait for someone to use U-Net Obtain the initial segmentation probability map , After use LSTM from 3D CT The pancreas is segmented in the image , Improved segmentation performance . Other uses RNN There are many papers on medical image segmentation , I'm not going to introduce you one by one .

2.6 Summary

This part is mainly about the application of segmentation algorithm in medical image segmentation , So there are not many innovations , Mainly for different formats （CT still RGB, Pixel range , Image resolution and so on ） And the characteristics of different parts of the data （ noise 、 Object form and so on ）, Classical networks need to be improved for different data , To adapt to the input data format and characteristics , This can better complete the segmentation task . Although deep learning is a black box , But on the whole, the design of the model still has rules to follow , What strategies solve what problems 、 Cause any problems , The choice can be made according to the specific segmentation problem , To achieve the best segmentation performance .

reference ：

1 Deep Semantic Segmentation of Natural and Medical Images: A Review
2 NAS-Unet: Neural architecture search for medical image segmentation. IEEE Access, 7:44247–44257, 2019.
3 Boosting segmentation with weak supervision from image-to-image translation. arXiv preprint arXiv:1904.01636, 2019
4 Multi-scale guided attention for medical image segmentation. arXiv preprint arXiv:1906.02849,2019.
5 SegAN: Adversarial network with multi-scale L1 loss for medical image segmentation.
6 Fully convolutional structured LSTM networks for joint 4D medical image segmentation. In 2018 IEEE
7 https://www.cnblogs.com/walter-xh/p/10051634.html

原网站

版权声明
本文为[m0_ sixty-one million eight hundred and ninety-nine thousand on]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202200507586904.html