当前位置:网站首页>Reppoints: Microsoft skillfully uses deformation convolution to generate point sets for target detection, full of creativity | iccv 2019
Reppoints: Microsoft skillfully uses deformation convolution to generate point sets for target detection, full of creativity | iccv 2019
2022-06-24 07:39:00 【VincentLee】
RepPoints It's very ingenious , Using point sets rich in semantic information to represent targets , And skillfully using deformable convolution to achieve , The overall network design is very complete , Worth learning undefined
source : Xiaofei's algorithm Engineering Notes official account
The paper : RepPoints: Point Set Representation for Object Detection
- Address of thesis :https://arxiv.org/abs/1904.11490
- Paper code :https://github.com/microsoft/RepPoints
Introduction
classical bounding box It's good for calculation , But it doesn't take into account the shape and attitude of the target , And the features obtained from the rectangular region may be seriously affected by the background content or other objects , Low quality features will further affect the performance of target detection . In order to solve bounding box The problem is , The paper proposes that RepPoints This new method of target representation , Can carry on the fine-grained localization ability as well as the better classification effect .
Pictured 1 Shown ,RepPoints It's a set of points , It can adaptively surround the target and contain the semantic features of the local region .RepPoints The training is driven by target location and target classification , Be able to restrain RepPoints Surround the target tightly and guide the detector to classify the target correctly . This adaptive representation is differentiable , It can be used continuously in multiple stages of the detector , And it doesn't need any extra settings anchor To generate a large number of initial boxes .
The RepPoints Representation
As mentioned above ,bounding box It's just a coarse-grained representation of the target location , Only the rectangular space of the target is considered , No consideration of shape 、 Gesture and semantic rich local areas , The semantic rich local area can help the network better positioning and feature extraction . In order to solve the above shortcomings ,RepPoints Using a set of adaptive sampling points to represent the target :
$n$ To represent the total number of sampling points for the target , The default setting is 9.
RepPoints refinement
Adjust gradually bounding box Location and feature extraction are multi-stage An important means of detector success , about RepPoints, Adjustment can be simply expressed as :
${(\Delta xk, \Delta y_k)}^{n}{k=1}$ It is the offset value of the predicted new sampling point relative to the old sampling point , The sample points are all adjusted the same size , Don't like bouning box In that case, we need to solve the problem of the size inconsistency between the center point coordinates and the border length .
Converting RepPoints to bounding box
In order to take advantage of bounding box Training and verification of annotation information RepPoint-based The performance of the detection algorithm , Use the preset transformation method $\mathcal{T}=\mathcal{R}_P\to \mathcal{B}_P$ take RepPoints Into a pseudo prediction box , There are three ways of transformation :
- $\mathcal{T}=\mathcal{T}_1$: Min-max function, For all the RepPoints Conduct min-max Operation to get the prediction box $\mathcal{B}_p$
- $\mathcal{T}=\mathcal{T}_2$:Partial min-max function, Yes, part of it RepPoints Conduct min-max Operation to get the prediction box $\mathcal{B}_p$
- $\mathcal{T}=\mathcal{T}_3$:Moment-based function, adopt RepPoints The center point position and the size of the prediction box are calculated by the mean and standard deviation of $\mathcal{B}_p$, Dimensions are learned through globally shared parameters $\lambda_x$ and $\lambda_y$ Multiply to get
These functions are differentiable , It can be added to the detector for end-to-end Training for . Experimental verification , this 3 The results of the two transformation methods are good .
RPDet: an Anchor Free Detector
This paper is based on RepPoints Designed anchor-free Target detection algorithm RPDet, There are two identification phases . Because the deformable convolution can sample multiple irregular distribution points for convolution output , So deformable convolution is very suitable for RepPoints scene , It can guide the sampling points according to the feedback of the recognition results .
Center point based initial object representation
RPDet Take the center point as the initial target representation , And then gradually adjust to the final RepPoints, The center point can also be considered special RepPoints. When two targets exist in the same position of the feature graph , This kind of method based on center point usually has the problem of recognizing target ambiguity . Previous methods set multiple preset values at the same location anchor To solve this problem , and RPDet The use of FPN To solve this problem :
- Targets of different sizes are created by different users level It's the feature that's responsible for identifying
- Small objects correspond to level The characteristic graph of is generally large , It reduces the possibility that the same object has the same location
The paper statistics found that , When using the above FPN After restraint ,COCO Only 1.1.% There are the above problems .
Utilization of RepPoints
Pictured 2 Shown ,RepPoints yes RPDet The basic goal representation method of , From the center point , The first group RepPoints Get... By the offset of the regression center point . The second group RepPoints Represents the final target location , By the first group RepPoints Optimize and adjust to get .RepPoints There are two main goals that drive our learning :
- Pseudo prediction box and GT The distance loss between the upper left corner and the upper right corner of the box
- Subsequent target classification losses
The first group RepPoints Guided by distance loss and classification loss , The second group RepPoints Use only distance loss for guidance , The main purpose is to learn more accurate target positioning .
Backbone and head architectures
FPN The backbone network contains 5 Layer feature pyramid level, from stage3( Down sampling 8 times ) To stage7( Down sampling 128 times ).Head The structure of is shown in the figure 3,Head In different level China is a shared , Contains two independent subnets , They are responsible for positioning (RepPoints Generation ) And classification :
- Locate the subnet first using 3 individual 256-d $3\times 3$ Convolution feature extraction , Every convolution is connected to group normalization layer , Then connect two small networks to calculate two groups RepPoints Offset value .
- The classification subnet first uses 3 individual 256-d $3\times 3$ Convolution feature extraction , Every convolution is connected to group normalization layer , Then the first set of subnet outputs will be located RepPoints Enter the offset value of into 256-d $3\times 3$ Further feature extraction from deformable convolution , Finally, the classification results are output .
Even though RPDet Two stage positioning is adopted , But its performance is even better than that of single-stage RetinaNet higher , Mainly anchor-free This design reduces the computation of classification layer , It covers a small amount of consumption caused by the extra positioning phase .
Localization/class target assignment
Positioning consists of two stages , The first stage is to get the first set from the center point RepPoints, The second stage starts with the first group RepPoints Adjust to get the second group RepPoints, Positive samples are defined differently in different stages :
- For the first stage , The characteristic points are considered to be positive samples, which need to satisfy :1) The feature pyramid where the feature point is located level be equal to $s(B)=\lfloor log_2 (\sqrt{W_Bh_B}/4)\rfloor$.2) The mapping position of the center point of the target on the feature graph corresponds to the feature point .
- For the second stage , Only the pseudo prediction frame generated in the first stage corresponding to the feature point is consistent with the target's IoU Greater than 0.5 Is considered to be a positive sample . With the current anchor-based The method is a bit similar , Think of the output of the first stage as anchor.
Since the classification of targets only considers the first group RepPoints, therefore , The first set of characteristic points RepPoints The resulting pseudo prediction is based on the IoU Greater than 0.5 That is to say, it is a positive sample , Less than 0.4 It's the background class , Others ignore .
Experiments
Compare the performance of different pseudo prediction box generation methods .
And others SOTA Test methods compare performance .
Conclusion
RepPoints It's very ingenious , Using point sets rich in semantic information to represent targets , And skillfully using deformable convolution to achieve , The overall network design is very complete , Worth learning .
If this article helps you , Please give me a compliment or watch it ~undefined More on this WeChat official account 【 Xiaofei's algorithm Engineering Notes 】
边栏推荐
- [pointnet] matlab simulation of 3D point cloud target classification and recognition based on pointnet
- Canal installation configuration
- Accessing user interface settings using systemparametersinfo
- Unexpected token u in JSON at position 0
- [frame rate doubling] development and implementation of FPGA based video frame rate doubling system Verilog
- buuctf misc [UTCTF2020]docx
- Global and Chinese market of anion sanitary napkins 2022-2028: Research Report on technology, participants, trends, market size and share
- Win11 points how to divide disks? How to divide disks in win11 system?
- Global and Chinese markets for maritime transport of perishable goods 2022-2028: Research Report on technology, participants, trends, market size and share
- Session & cookie details
猜你喜欢

More than 60 million shovel excrement officials, can they hold a spring of domestic staple food?

Unexpected token u in JSON at position 0
![[image segmentation] retinal vessel segmentation based on morphology with matlab code](/img/e3/0805df81a597346ea7c2d2da20ac96.png)
[image segmentation] retinal vessel segmentation based on morphology with matlab code
![[GUET-CTF2019]zips](/img/79/22ff5d4a3cdc3fa9e0957ccc9bad4b.png)
[GUET-CTF2019]zips
![[mrctf2020] thousand layer routine](/img/8e/d7b6e7025b87ea0f43a6123760a113.png)
[mrctf2020] thousand layer routine

伦敦金的资金管理比其他都重要

学会使用楼宇控制系统BACnet网关没那么难

Dichotomous special training

How to turn on win11 notebook power saving mode? How to open win11 computer power saving mode

When MFC uses the console, the project path cannot have spaces or Chinese, otherwise an error will be reported. Lnk1342 fails to save the backup copy of the binary file to be edited, etc
随机推荐
jarvisoj_ level2
Global and Chinese market of offshore furnaces 2022-2028: Research Report on technology, participants, trends, market size and share
Hubei College Upgraded to undergraduate - Hushi family planning department
[MRCTF2020]千层套路
[Proteus] Arduino uno + ds1307+lcd1602 time display
[DDCTF2018](╯°□°)╯︵ ┻━┻
MFC multithreaded semaphore csemaphore critical area and mutually exclusive events
buuctf misc [UTCTF2020]docx
(cve-2020-11978) command injection vulnerability recurrence in airflow DAG [vulhub range]
PCL 计算多边形的面积
Virtual machine security disaster recovery construction
学会使用楼宇控制系统BACnet网关没那么难
get_ started_ 3dsctf_ two thousand and sixteen
[WUSTCTF2020]alison_likes_jojo
Global and Chinese market of basketball uniforms 2022-2028: Research Report on technology, participants, trends, market size and share
Tencent cloud security and privacy computing has passed the evaluation of the ICT Institute and obtained national recognition
捏脸师: 炙手可热的元宇宙造型师
PCL point cloud random sampling by ratio
[frame rate doubling] development and implementation of FPGA based video frame rate doubling system Verilog
PCL 点云按比率随机采样